Analytic R&D – Blazing New Trails

Posted on: October 13th, 2015 by DSG Staff No Comments


Vinnie Dessecker, Teradata Data Strategy and Governance Center of ExcellenceBy Vinnie Dessecker, senior consultant, big data – Strategy and Governance Center of Excellence

There is an ever increasing need for businesses to engage in analytic innovation – exploring new, disparate and more data to gain insights to new products, services or other opportunities for an organization and its customers. Analytic innovation is really about seeing where the data takes you – determining in a scientific manner what actions can be predicted based on past performance. Much of the demand for analytic innovation is being driven by the era of big data – the availability of new data sources provides new and previously unimagined insights.

So, what does it really mean to be innovative? What constitutes analytic R&D and how does this discovery capability relate to the other components of the data strategy? Innovation is synonymous with risk taking as an idea or hypothesis must, by definition, be replicable at an economical cost and must satisfy a specific need – stated or unstated. To create an environment where innovation can flourish, an organization must create a culture that encourages exploration and risks, and accepts and actually welcomes failures.

“We made too many wrong mistakes.”

Yogi Berra

As I hike the hills of Southern California, I’m accustomed to following the trail maps, but sometimes on a leisurely hike it’s worthwhile to venture off the beaten path. Frequently, that detour leads to a dead end – perhaps a good place to have a snack, but ultimately I’ve got to get back on the trail. Sometime it leads to the discovery of a beautiful canyon or cave I didn’t know was there, or it could yield a short cut that makes my journey to the top a little faster or more interesting. This exploration is not an instance of deciding to take a hike without a trail map or to venture off the trail or out of the park (remember, there are lots of snakes in those hills)! Rather, it’s a desire to explore the unknown to see what I can discover beyond the obvious or familiar. If I discover a really interesting trail, I might add it to my hiking options and revisit it on a regular basis. If I don’t discover anything new or discover that a deviation leads to something undesirable, I simply won’t do that again and my discovery is at least that valuable to me.

Effective Analytic R&D

This is a very exciting time for analytics – the landscape is changing and evolving every day. Dynamic changes and big challenges exist for all organizations. We should strive not just for innovation, but business agility – the ability to make data-driven decisions faster and with more confidence. Ultimately, the goal is to increase revenue and decrease costs, while meeting the customer’s ever-increasing demands for personalized products and services.

In business, an analytic R&D environment is a key component of innovation. An analytic R&D environment supports rapid experimentation and evaluation of data with less formality of data management rules than applied to the production analytics. While this is a discovery zone, and by its nature meant to be less restricted by rules, the key to success is to apply the right amount of governance and structure. No matter what spontaneous choices I make on the hiking trail, I don’t disregard the basic rules of safety, environmental responsibility or common sense.

Effective R&D data management and governance practices allow for exploration but strive to create order from the chaos that can ensue and drive a culture of innovation. These practices consider the iterative and explorative nature of research and development, understanding that new discoveries are sometimes born from previously “failed” endeavors. In fact, the failures are required to develop the new insights and hypotheses – how many attempts did Thomas Edison make before he developed the light bulb?

An effective data strategy has a path for both production and R&D analytics. And, when the R&D effort yields gold, there must be a path back to the production environments and a way to incorporate the innovation into the pipeline of projects that make up the production portfolio. For instance, identify the business processes that need to be modified and the individuals that should be trained to make appropriate use of the data driven insight.

How do you strike a balance between too much control and too little? The goal should always be to preserve the value of the data and ensure that the customers (internal and external) have confidence in the data irrespective of the source or data type. Some of the questions that must be answered include:

  • Infrastructure and Platform – can the data be accessed irrespective of where it is stored? Can data from different platforms be analyzed without arduous and time-consuming data integration efforts?
  • Data Architecture – is the data (including unstructured and semi-structured data) understood within the context of the broader enterprise and the supported business initiatives? Has the data been modeled? Is it loosely coupled or tightly coupled data?
  • Data Quality – is the data fit for purpose? Can you measure and report on that data quality? For certain data types, is there a lower acceptable quality standard; e.g., social media records.
  • Master Data – does the data need to be mastered; i.e., a single “golden record” created? Does the data need to be combined with mastered data; e.g., is it necessary to integrate social media data with a customer record to analyze customer satisfaction?
  • Metadata – is it possible to report on data lineage, describing where the data originated and how it was transformed in its journey to analytics? How much definition needs to be applied to the data to facilitate effective self-service?
  • Data Integration – does the data used for R&D need to be integrated? Is the requirement for batch or real-time, or something in between? Can it be integrated at the time of the analytics; e.g., dynamically modeled? Is self-provisioning an option?
  • Data Security and Privacy – how much data security is required? Do the same privacy rules apply as those in the production analytics environments? How damaging is a data breach to the organization?
  • Program and Project Management – are there ways to fund and measure the R&D projects that are consistent with the goals for the program and the business initiatives supported? Are there appropriate gating processes in place; i.e., when do you know that the hypotheses is not providing the business value anticipated? How can you build on the previous “failures” when appropriate – does that include sharing the hypotheses, the data, or the techniques applied?

There are new data sources, new technologies, and new skills being developed to exploit these opportunities. But, as with most changes that we have seen over the last 30 years, the answer to addressing the opportunity leads back to traditional concepts and topics. We don’t simply throw out everything we have learned over the years and start again with each new technological advance. That would be a little like discovering a new potential path to the top of the hill and deciding that going forward we didn’t need the same things we used previously to climb the hill – throw out the shoes, the trail map, the water! Throw out the preparation and planning and production quality processes – just start moving! That’s not innovation or business agility, and it’s certainly not progress.

Innovation can flourish when we understand our data strategy – our vision for the organization required to meet the business initiatives – and apply the appropriate management and governance controls, building on what we know works, and leveraging new techniques and technologies.

Vinnie Dessecker, Teradata Data Strategy and Governance Center of ExcellenceVinnie Dessecker is a senior consultant for big data within Teradata’s Strategy and Governance Center of Excellence. Her work aligns business and technical goals for data and information initiatives, including master data management, metadata management, data quality, content/document management, and the analytic roadmap and business intelligence ecosystems.


In advance of the upcoming webinar Achieving Pervasive Analytics through Data & Analytic Centricity, Dan Woods, CTO and editor of CITO Research, sat down with Clarke Patterson, senior director, Product Marketing, Cloudera, and Chris Twogood, vice president of Poduct and Services Marketing, Teradata, to discuss some of the ideas and concepts that will be shared in more detail on May 14, 2015.


Having been briefed by Cloudera and Teradata on Pervasive Analytics and Data & Analytic Centricity, I have to say it’s refreshing to hear vendors talk about WHY and HOW big data is important in a constructive way, rather than platitudes and jumping into the technical details of the WHAT which is so often the case.

Let me start by asking you both in your own words to describe Pervasive Analytics and Data & Analytic Centricity, and why this an important concept for enterprises to understand?


During eras of global economic shifts, there is always a key resource discovered that becomes the spark of transformation for organizations that can effectively harness it. Today, that resource is unquestionably ‘data’. Forward-looking companies realize that to be successful, they must leverage analytics in order to provide value to their customers and shareholders. In some cases they must package data in a way that adds value and informs employees, or their customers, by deploying analytics into decisions making processes everywhere. This idea is referred to as pervasive analytics.

I would point to the success that Teradata’s customers have had over the past decades in terms of making analytics pervasive throughout enterprises. The spectrum in which their customer have gained value is comprehensive, from business intelligence reporting and executive dashboards, to advanced analytics, to enabling front line decision makers, and embedding analytics into key operational processes. And while those opportunities remain, the explosion of new data types and breadth of new analytic capabilities is leading successful companies to recognize the need to evolve the way they think about data management and processes in order to harness the value of all their data.


I couldn’t agree more. It’s interesting now that we’re several years into the era of big data to see how different companies have approached this opportunity, which really boils down to two approaches. Some companies have taken the approach of what can we do with this newer technology that has emerged, while others take the approach of defining a strategic vision for the role of the data and analytics to support their business objectives and then map the technology to the strategy. The former, which we refer to as an application centric approach, can result in some benefits, but typically runs out of steam as agility slows and new costs and complexities emerge; while the latter is proving to create substantially more competitive advantage as organizations put data and analytics – not a new piece of technology – at the center of their operations. Ultimately, these companies that take a data and analytic centric approach are coming to a conclusion that there are multiple technologies required, and their acumen on applying the-right-tool-to-the-right-job naturally progresses, and the usual traps and pitfalls are avoided.


Would you elaborate on what is meant by “companies need to evolve the way they think about data management?”


Pre “big data,” there was a single approach to data integration whereby data is made to look the same or normalized in some sort of persistence such as a database, and only then can value be created. The idea is that by absorbing the costs of data integration up front, the costs of extracting insights decreases. We call this approach “tightly coupled.” This is still an extremely valuable methodology, but is no longer sufficient as a sole approach to manage all data in the enterprise.

Post “big data,” using the same tightly coupled approach to integration undermines the value of newer data sets that have unknown or under-appreciated value. Here, new methodologies to “loosely couple” or not couple at all are essential to cost effectively manage and integrate the data.   These distinctions are incredibly helpful in understanding the value of Big Data, where best to think about investments, and highlighting challenges that remain a fundamental hindrance to most enterprises.

But regardless of how the data is most appropriately managed, the most important thing is to ensure that organizations retain the ability to connect-the-dots for all their data, in order to draw correlations between multiple subject areas and sources and foster peak agility.


I’d also cite that leading companies are evolving the way they approach analytics. We can analyze any kind of data now - numerical, text, audio, video. We are now able to discover insights in this complex data. Further, new forms of procedural analytics have emerged in the era of big data, such as graph, time-series, machine learning, and text analytics.

This allows us to expand our understanding of the problems at hand. Key business imperatives like churn reduction, fraud detection, increasing sales and marketing effectiveness, and operational efficiencies are not new, and have been skillfully leveraged by data driven businesses with tightly coupled methods and SQL based analytics – that’s not going away. But when organizations harness newer forms of data that adds to the picture, and new complimentary analytic techniques, they realize better churn and fraud models, greater sales and marketing effectiveness, and more efficient business operations.

To learn more, please join the Achieving Pervasive Analytics through Data & Analytic Centricity webinar on Thursday, May 14 the from 10 - 11:00am PT

Real-Time SAP® Analytics: a look back and ahead

Posted on: August 18th, 2014 by Patrick Teunissen 5 Comments


On April 8, I hosted a webinar and my guest was Neil Raden, an independent data warehouse analyst. The topic of the webinar was: “Accessing of SAP ERP data for business analytics purposes” – which was built upon Neil’s findings in his recent white paper about the complexities of the integration of SAP data into the enterprise data warehouse. The attendance and participation in the webinar clearly showed that there is a lot of interest and expertise in this space. As I think back about the questions we received, both Neil and I were surprised by the number of questions that were related to “real-time analytics on SAP.”

Something has drastically changed in the SAP community!

Note: The topic of real time analytics is not new! I won’t forget Neil’s reaction when the questions came up. It was like he was in a time warp back to the early 2000’s when he first wrote about that topic. Interestingly, Neil’s work is still very relevant today.

This made me wonder why this is so prominent in the SAP space now? What has changed in the SAP community? What has changed in the needs of the business?

My hypothesis is that when Neil originally wrote his paper (in 2003) R/3 was SAP (or SAP was R/3 whatever order you prefer) and integration with other applications or databases was not something that SAP had on the radar yet. This began to change when SAP BW became more popular and gained even more traction with the release of SAP’s suite of tools and modules (CRM, SRM, BPC, MDM, etc.) -- although these solutions still clearly had the true SAP ‘Made in Germany’ DNA. Then came SAP’s planning tool APO, Netweaver XI (later PI) and, the 2007 acquisition of Business Objects (including BODS) which all accelerated SAP’s application integration techniques.

With Netweaver XI/PI and Business Objects Data Services, it became possible to integrate SAP R/3 in real time, making use of advanced messaging techniques like Idoc’s, RFC’s, and BAPI’s. These techniques all work very well for transaction system integration (EAI); however, these techniques do not have what it takes to provide real-time data feeds to the integrated data warehouse. At best a hybrid approach is possible. Back in 2000 my team worked on such a hybrid project at Hunter Douglas (Luxaflex). They combined classical ABAP-driven batch loads for managerial reports with real time capabilities (BAPI calls) for their more operational reporting needs. That was state-of-art in those days!

Finally, in 2010 SAP acquired Sybase and added a best of breed Data Replication software tool to the portfolio. With this integration technique, changed data is captured directly from the database taking the loads off of the R/3 application servers. This offers huge advantages, so it makes sense that this is now the recommended technique for loading data into the SAP HANA appliance.

“What has changed is that SAP has put the need for real-time data integration with R/3 on the (road) map!”

The main feature of our upcoming release of Teradata Analytics for SAP Solutions version 2.2 is a new data replication technique. Almost designed to prove my case, 10 years ago I was in the middle of working on a project for a large multinational company. One of my lead engineers, Arno Luijten, came to me with a proposal to try out a data replication tool to address the latencies introduced by the extraction of large volumes of changed data from SAP. We didn’t get very far at the time, because the technology and the business expectations were not ready for it. Fast forward to 2014 and we’re re-engaged with this same customer …. Luckily this time the business needs and the technology capabilities are ready to deliver!

In the coming months my team and I would like to take you on our SAP analytics journey.

In my next posts we will dive into the definition (and relativity) of real-time analytics and discuss the technical complexities of dealing with SAP including the pool and cluster tables. So, I hope I got you hooked for the rest of the series!

The integration issue that dare not speak its name ….

Posted on: March 25th, 2014 by Patrick Teunissen 2 Comments


Having worked with multinational companies running SAP ERP systems for many years, I know that they (nearly) always have more than one SAP system to record their transactional data. Yet it is never discussed -- and it seems to be the 'Macbeth' of the SAP world, a fact that should not be uttered out loud…

My first experience with SAP's software solutions dates back to1989 whilst at Shell Chemicals in the Netherlands, exactly 25 years ago. What strikes me most after all these years is that people talk about SAP as if it is one system covering everything that is important to business.

Undoubtedly SAP has had a huge impact on enterprise computing. I remember at Shell, prior to the implementation of SAP that we ran a vast quantity of transaction systems. The purchasing and stock management systems for example, were stand alone and not integrated with the general ledger system. The integration of these transaction systems had to be done via interfaces some of which were manual (information had to be typed over) At the month end, only after all interfaces had run, would the ledger show the proper stock value and accounts payable. So thanks to SAP the number of transaction systems has been dramatically reduced.

But of course the Shell Refining Company had its own SAP system just like the businesses in the UK, Germany etc etc. So in the late 80’s Shell ran multiple and numerous different SAP systems.

However this contradicts one of SAP’s key messages, their ability to integrate all sorts of transactional information to provide relevant data for analytical purposes in one hypothetical system (reference Dr. Plattner’s 2011 Beijing speech ).

I have always struggled with the definition of “relevant data” as I believe that what is relevant is dependent on 3 things: the user, the context and time. For an operator of a chemical plant for example, the current temperature of the unit and product conversion yields is likely to be “relevant” as this is the data needed to steer the current process. For the plant director the volumes produced and the overall processing efficiency of the last month maybe “relevant” as this is what his peers in the management team will challenge him on. SAP systems are as far as I know, not used to operate manufacturing plants, in which case the only conclusion can be that not all relevant data is in SAP. What you could say though, is that it is very likely that the “accounting” data is in SAP hence SAP could be the source for the plant’s management team reports.


However when businesses are running multiple SAP systems, as described earlier, the     conclusion cannot be that there is a (as in 1) SAP system in which all the relevant accounting data is processed. So a regional director responsible for numerous manufacturing sites may have to deal with data collected from multiple SAP systems when he/she needs to analyze the total costs of manufacturing of the last quarter.Probably because this does not really fit with SAP’s key message - one system for both transaction processing and analytics - they have no solution. I googled “analytics for multiple SAP systems” the results of which are shown above. As you can see other than the Teradata link there is no solution that will help our regional director. Even when the irrelevant words “analytics for” are removed only very technical and specific solutions are found.

Some people believe that this problem with analytics will be solved over time. Quite a few larger enterprises start with what I call re-implementations of the SAP software. Five years after my first exposure to SAP at Shell Chemicals in the Netherlands I became a member of the team responsible for the “re-implementation” of the software for Shell’s European Chemicals business. Of course there were cost benefits (less SAP systems = lower operational cost for the enterprise) and some supply chain related transactions could be processed more efficiently from the single system. But the region was still not really benefitting from it as the (national / legal) company in SAP is the most important object around which a lot has been organized (or configured) . Hence most multinational enterprises use another software product into which data is interfaced for the purpose of regional consolidation.

I was employed by Shell for almost 10 years. It is a special company and I am still in contact with a few people that I worked with. The other day I asked about the SAP landscape as it is today and was told that, 25 years after my first SAP experience they are still running multiple SAP systems and re-implementation projects. As I consider myself an expert in SAP I am sure I could have built a career on the re-implementation of the SAP systems.

The point that I want to make with this post is that many businesses need to take into account that they run multiple SAP systems, and more importantly that these systems are not automatically integrated. This fact has a huge impact on the analytics of the SAP data and the work required to provide an enterprise view of the business. So if you are involved in the delivery of analytical solutions to the organization then you should factor in “the Scottish play” issue into the heart of your design even if nobody else wants to talk about it.



2 This is why an appreciated colleague, a manufacturing consultant leader, always refers to SAP as the “Standard Accounting Package”.

3 In SAP the “Company” (T001-BUKRS) is probably the most important data object around which a lot has been organized (configured). Within SAP consolidation of these “companies’ is not an obvious thing to do. Extensions of the financial module (FI)designed to consolidate are difficult to operate and hardly ever used. Add to this the fact that almost every larger Enterprise has multiple SAP systems and the fact that consolidation takes place in “another” system is explained.

4 In 2007 SAP acquired OutlookSoft now known as SAP BPC (Business Planning & Consolidation) for this very purpose.