Making SAP data relevant in the world of big data

Posted on: May 4th, 2015 by Patrick Teunissen No Comments


Part one of series about an old “SAP”dog who learns a new trick

Reflecting back on the key messages from Teradata Universe 2015 in April it was impossible to escape the theme of deriving differentiated business value leveraging the latest data sources and analytic techniques. I heard from several customers how they improved their business by combining their traditional warehouse data (or ‘SAP data’ for us old dogs) with other data from across the enterprise and applying advanced analytic techniques. A special interest group dedicated a whole morning exploring the value of integrating ‘SAP data’ with ‘other data’. As I sat thru these sessions, I found it funny that companies that run SAP ERP always speak about their data in terms of SAP data and other data. It made me wonder what is ‘other data’ and what makes it so special?

In most cases, ‘other data’ is referred to as ‘Big Data’. The term is quite ubiquitous and was used to describe just about every data source. But it’s important to note, that throughout the sessions I attended, none of the companies referred to their SAP data as Big Data. Big Data was a term reserved for the (relatively) new sources of data like machine generated data from the Internet of Things, call center details, POS related data, and social media/web logs.

Although not “big”, customers told me they considered their SAP ERP applications to be complex fortresses of data. In comparison to traditional data warehouses or big data stores, SAP is very difficult to extract and integrate with their ‘other data’. Even SAP acknowledges this shortcoming as evidenced by their recent programs to ‘Simplify’ their complex applications. But I’d argue that while SAP ERPs maybe complex to run, the data that is processed in these applications is quite simple. SAP experts would agree that if you know where to look, the data is both simple and reliable.

Unfortunately these experts live in a world of their own which is focused entirely on data that flows thru SAP. But as evidenced by the customers at Teradata Universe the lion’s share of new IT projects/ business initiatives are focused on leveraging ‘big data’. Which means the folks who know SAP are rarely involved in the IT projects involving ‘big data’, and vice versa, which explains the chasm between SAP and ‘other data’. The ‘Big Data’ folks don’t understand the valuable context that SAP brings. And the ‘SAP data’ folks don’t understand the new insights that analytics on the ‘other data’ can deliver.

However, the tides are turning and the general consensus has evolved to agree that there is value in brining SAP data together with big data. SAP ERP is used primarily for managing the transactional processes in the financial, logistics, manufacturing, and administration functions. This means it houses high quality master data, attribute data, and detailed facts about the business. Combining this structured and reliable data up to multi-structured big data can add valuable confidence and context to the analytics that matter most to businesses today!

Here’s a recent example where a customer integrated the results of advanced text analytics with their SAP ERP data within their Teradata warehouse. The data science team was experimenting with a number of Aster machine learning and natural language processing techniques to find meaning and insight in field technician reports. Using one of Aster’s text analytic methods, Latent Dirichlet Allocation, they were able to identify common related word groups within their reports to identify quality events such as “broken again” or “running as expected”. However they discovered unexpected insight regarding equipment suppliers and 3rd party service providers also hidden in the field reports, such as “Supplier XYZ is causing problems” or “ABC is easy to work with”. They were then able to integrate all of these relatable word groups with context from the SAP ERP purchasing history data stored in the warehouse to provide additional insight and enrichment to their supplier scores.



PART FIVE: This is the last blog in my series about Near Real Time data acquisition from SAP. This final blog is co-written with Arno Luijten, who is one of Teradata’s lead engineers. He is instrumental in demystifying the secrets of the elusive SAP clustered and pooled tables.

There is a common misconception that the Pool and Cluster tables in SAP R/3 can only be deciphered by the SAP R/3 application server, giving them an almost mythical status. The phrase that is used all over the ‘help sites’ and forums is “A clustered and a pooled table cannot be read from outside SAP because certain data are clustered and pooled in one field”… which makes replicating these tables pretty pointless – right?

But what exactly are Pooled and Cluster tables in SAP R/3 anyway? We thought we would let SAP give us the answer and searched their public help pages (SAP Help Portal). But that yielded limited results, so we looked further -- (Googled cluster tables) and found the following explanation (Technopedia-link):

“Cluster tables are special types of tables present in the SAP data dictionary. They are logical tables maintained as records of the normal SAP tables, which are commonly known as transparent tables. A key advantage of using cluster tables is that data is stored in a compressed format, reducing memory space and the landscape network load for retrieving information from these tables.”

Reading further on the same page, there are six major bullet points describing the features - of which fiveof them basically tell you that what we did cannot be done. Luckily, we didn’t let this phase us!

We agree: the purpose of SAP cluster tables is to save space because of the huge volume of the data contained in these tables and the potential negative impact that this may have on the SAP R/3 application. We know this because the two most (in-) famously large Cluster tables are RFBLG and KOCLU which contain the financial transactions and price conditions. SAP’s ABAP programmers often refer to BSEG (for financial transactions) and KONV (for the price-conditions).

From the database point of view, these tables do not exist but are contained in the physical tables named RFBLG and KOCLU. Typically these (ABAP) tables contain a lot of data. There are more tables set up in this way, but from a data warehousing point of view these two tables are probably the most relevant. Simply skipping these tables would not be a viable option for most clients.

Knowing the importance of the Pool and Cluster table, the value of data replication, and the value of operational analytics, we forged ahead with a solution. The encoded data from the ABAP table is stored as a BLOB “Binary Large Object” in the actual cluster table. To decode the data in the BLOB we wrote a C++ program as a Teradata User Defined Function (UDF) which we call the “Decoder” and it is installed directly within the Teradata database.

There can be a huge volume of data present in the cluster tables (hence the usage of cluster logic) and as a result decoding can be a lot of work and can have an impact on the performance of the SAP system. Here we have an extra advantage over SAP R/3 because the Decoder effectively allows us to bypass the ABAP layer and use the power of the Teradata Database. Our MPP capabilities allow decoding to be done massively faster than the SAP application, so decoding the RFBLG/KOCLU tables in Teradata can save a lot of time.

Over the last few months I have written about data replication starting with a brief SAP history, I questioned real-time systems, and I have written about the benefits of data replication and how it is disruptive to analytics for SAP R/3.

In my last blog I looked at the technical complexities we have had to overcome to build a complete data replication solution into Teradata Analytics for SAP® Solutions. It has not been a trivial exercise - but the benefits are huge!

Our Data Replication capability enables operational reporting and managerial analytics from the same source; it increases flexibility, significantly reduces the burden on the SAP R/3 system(s), and of course, delivers SAP data in near-real time for analytics.

‘What’s next in analytics on SAP R/3’ – Introduction

Posted on: March 17th, 2015 by Patrick Teunissen No Comments


SAP’s ERP application R/3 is classified as a real time transaction processing system. However for most larger companies SAP R/3 is not suited for analytical purposes. These companies started with the building data warehouses. This is also where my “SAP analytics journey" started, in the early 1990ies as an employee of Shell. To load the data warehouses we wrote programs (SAP slang: ABAP) that selects and processes the data in batch. In fact my team has been doing something like that until last year at more than 50 companies. These batch programs are in most of the cases run overnight. So analytics for SAP could be a day behind the reality of the R/3 system.

For a long time my team wanted to solve that issue and build a solution that would keep the data warehouse in synch with the SAP R/3 transaction system. So we felt great when our first commercial release of the Teradata Analytics for SAP solutions that replicates data saw the light in the fall of 2014. I have written a series of blogs in which I have documented important learnings from our realization process.

In the first blog I explain why it took about 25 years to get from a batch oriented solution to something that makes it possible to replicate data from SAP R/3 to the data warehouse. Next (in post 2) I talk about the fact that near real time analytics requires more than a technical capability because of the limitations of SAP R/3. In post 3, I write about the advantage of the replication technique used . Obviously this makes it possible to run operational and managerial reports from the same source, but as always during innovative processes we have discovered (unexpected) other advantages. Then (in post 4) the technical complexities are discussed to complete the series. In post 5, I  zoom into the most complex technical issue of all, SAP’s clustered and pooled tables.

Anyway I have enjoyed working on this project very much, I hope you enjoy my posts and I welcome your feedback!

Patrick Teunissen

Real-Time SAP® Analytics: a look back and ahead

Posted on: August 18th, 2014 by Patrick Teunissen 1 Comment


On April 8, I hosted a webinar and my guest was Neil Raden, an independent data warehouse analyst. The topic of the webinar was: “Accessing of SAP ERP data for business analytics purposes” – which was built upon Neil’s findings in his recent white paper about the complexities of the integration of SAP data into the enterprise data warehouse. The attendance and participation in the webinar clearly showed that there is a lot of interest and expertise in this space. As I think back about the questions we received, both Neil and I were surprised by the number of questions that were related to “real-time analytics on SAP.”

Something has drastically changed in the SAP community!

Note: The topic of real time analytics is not new! I won’t forget Neil’s reaction when the questions came up. It was like he was in a time warp back to the early 2000’s when he first wrote about that topic. Interestingly, Neil’s work is still very relevant today.

This made me wonder why this is so prominent in the SAP space now? What has changed in the SAP community? What has changed in the needs of the business?

My hypothesis is that when Neil originally wrote his paper (in 2003) R/3 was SAP (or SAP was R/3 whatever order you prefer) and integration with other applications or databases was not something that SAP had on the radar yet. This began to change when SAP BW became more popular and gained even more traction with the release of SAP’s suite of tools and modules (CRM, SRM, BPC, MDM, etc.) -- although these solutions still clearly had the true SAP ‘Made in Germany’ DNA. Then came SAP’s planning tool APO, Netweaver XI (later PI) and, the 2007 acquisition of Business Objects (including BODS) which all accelerated SAP’s application integration techniques.

With Netweaver XI/PI and Business Objects Data Services, it became possible to integrate SAP R/3 in real time, making use of advanced messaging techniques like Idoc’s, RFC’s, and BAPI’s. These techniques all work very well for transaction system integration (EAI); however, these techniques do not have what it takes to provide real-time data feeds to the integrated data warehouse. At best a hybrid approach is possible. Back in 2000 my team worked on such a hybrid project at Hunter Douglas (Luxaflex). They combined classical ABAP-driven batch loads for managerial reports with real time capabilities (BAPI calls) for their more operational reporting needs. That was state-of-art in those days!

Finally, in 2010 SAP acquired Sybase and added a best of breed Data Replication software tool to the portfolio. With this integration technique, changed data is captured directly from the database taking the loads off of the R/3 application servers. This offers huge advantages, so it makes sense that this is now the recommended technique for loading data into the SAP HANA appliance.

“What has changed is that SAP has put the need for real-time data integration with R/3 on the (road) map!”

The main feature of our upcoming release of Teradata Analytics for SAP Solutions version 2.2 is a new data replication technique. Almost designed to prove my case, 10 years ago I was in the middle of working on a project for a large multinational company. One of my lead engineers, Arno Luijten, came to me with a proposal to try out a data replication tool to address the latencies introduced by the extraction of large volumes of changed data from SAP. We didn’t get very far at the time, because the technology and the business expectations were not ready for it. Fast forward to 2014 and we’re re-engaged with this same customer …. Luckily this time the business needs and the technology capabilities are ready to deliver!

In the coming months my team and I would like to take you on our SAP analytics journey.

In my next posts we will dive into the definition (and relativity) of real-time analytics and discuss the technical complexities of dealing with SAP including the pool and cluster tables. So, I hope I got you hooked for the rest of the series!

Garbage In-Memory, Expensive Garbage

Posted on: July 7th, 2014 by Patrick Teunissen 2 Comments


A first anniversary is always special and in May I marked my first with Teradata. In my previous lives I celebrated almost ten years with Shell and seventeen years creating my own businesses focused on data warehousing and business intelligence solutions for SAP. With my last business “NewFrontiers” I leveraged all twenty seven years of ERP experiences to develop a shrink wrapped solution to enable SAP analytics. 

Through my first anniversary with Teradata, all this time, the logical design of SAP has been the same. To be clear, when I say SAP, I mean R/3 or ‘R/2 with a mouse’ if you’re old enough to remember. Today R/3 is also known as the SAP Business suite, ERP or whatever. Anyway, when I talk about SAP I mean the application that made the company rightfully world famous and that is used for transaction processing by almost all large multinational businesses.

My core responsibility at Teradata is the engineering of the analytical solution for SAP. My first order of business was focusing my team on delivering an end-to-end business analytic product suite to analyze ERP data that is optimized for Teradata. Since completing our first release, my attention turned to adding new features to help companies take their SAP analytics to the next level. To this end, my team is just putting the finishing touches on a near real-time capability based on data replication technology. This will definitely be the topic of upcoming blogs.

Over the past year, the integration and optimization process has greatly expanded my understanding of the differentiated Teradata capabilities. The one capability that draws in the attention of types like me ‘SAP guys and girls’ is Teradata Intelligent Memory. In-memory computing has become a popular topic in the SAP community and the computer’s main memory is an important part of Teradata’s Intelligent Memory. However Intelligent Memory is more than “In-Memory” -- because with Intelligent Memory, the database addresses the fact that not all memory is created equal and delivers a solution that uses the “right memory for the right purpose”. In this solution, the most frequently used data – the hottest -- is stored In-Memory; the warm data is processed from a solid state drive (SSD), and colder, less frequently accessed data from a hard disc drive (HDD). This solution allows your business to make decisions on all of your SAP and non-SAP data while coupling in-memory performance with spinning disc economics.

This concept of using the “right memory for the right purpose” is very compelling for our Teradata Analytics for SAP solutions. Often when I explain what Teradata Analytics for SAP Solutions does, I draw a line between DATA and CONTEXT. Computers need DATA like cars need fuel and the CONTEXT is where you drive the car. Most people do not go the same place every time but they do go to some places more frequently than others (e.g. work, freeways, coffee shops) and under more time pressure (e.g. traffic).

In this analogy, most organizations almost always start building an “SAP data warehouse” by loading all DATA kept in the production database of the ERP system. We call that process the initial load. In the Teradata world we often have to do this multiple times because when building an integrated data warehouse it usually involves sourcing from multiple SAP ERPs. Typically, these ERPs vary in age, history, version, governance, MDM, etc. Archival is a non-trivial process in the SAP world and the majority of the SAP systems I have seen are carrying many years of old data . Loading all this SAP data In-Memory is an expensive and reckless thing to do.

Teradata Intelligent Memory provides CONTEXT by storing the hot SAP data In-Memory, guaranteeing lightning fast response times. It then automatically moves the less frequently accessed data to lower cost and performance discs across the SSD and HDD media spectrum. The resulting combination of Teradata Analytics for SAP coupled with Teradata’s Intelligent Memory delivers in-memory performance with very high memory hit rates at a fraction of the cost of ‘In-Memory’ solutions. And in this business, costs are a huge priority.

The title of this Blog is a variation on the good old “Garbage In Garbage Out / GIGO” phrase; In-Memory is a great feature, but not all data needs to go there! Make use of it in an intelligent way and don’t use it as a garbage dump because for that it is too expensive.

Patrick Teunissen is the Engineering Director at Teradata responsible for the Research & Development of the Teradata Analytics for SAP® Solutions at Teradata Labs in the Netherlands. He is the founder of NewFrontiers which was acquired by Teradata in May 2013.

1 Needless to say I am referring to SAP’s HANA database developments.

2 Data that is older than 2 years can be classified as old. Transactions, like sales and costs are often compared with the a budget/plan and the previous year. Sometimes with the year before that but hardly ever with data older than that.

The Flipside of Flexibility: Maintaining Master Data

Posted on: April 28th, 2014 by Guest Blogger No Comments


A lot of companies running SAP® solutions have difficulties properly maintaining their master data. For a large part the reason for this is that master data in SAP is dependent on the organizational structure of the company. If a company has a lot of different plants, purchasing departments, sales departments and other organizational elements, this fact is reflected in the master data. Makes sense! However, even if the organization itself is structured on a higher, less-detailed level, this does not always mean that the master data will get simpler.

An example could be when a company organizes its sales on a level that exceeds country borders, with one sales director and sales team covering an entire region encompassing a cluster of countries. This situation is quite common in Europe, where doing business “Europe-wide” has become much easier in recent decades due to the formation of the European Union and the Common Market. You might expect that the sales-related master data only needs to be maintained on this regional level. However, to understand why this is not always true, we need to dive a bit deeper into how companies are structured and how this structure is reflected in SAP.

The way the sales function is organized depends on the market, product and other factors. In practice this can lead to a variety of clusters. A grouping of countries could be based on a common spoken language: for example, the DACH countries – Germany, Austria and Switzerland, where the common language is German. Or sales regions modeled around economic, geographical or cultural groupings like the Benelux, which consists of Belgium, the Netherlands and Luxemburg.

The fact that companies organize themselves with regard to sales and logistics on this level does not necessarily mean that their legal structure can simply be altered to match. Still, at least in Europe, it is very common for companies operating in Europe to have legal entities in each individual country. For example, an ‘Ltd’ in the UK, a ‘bv’ in the Netherlands or a ‘bvba’ in Belgium. And these local legal and fiscal entities invoice their customers, have contracts, and most importantly have tax obligations and therefore need to keep their financial books in order.

Companies implementing SAP need to take of lot decisions when they customize their ERP system, one of them being the ‘real’ organizational structure in SAP. On the Sales and Distribution side, the Sales Organization is the highest level in SAP. The Sales Organization is the selling unit in a legal sense or could represent a regional subdivision. There is however one important thing to bear in mind: a Sales Organization is assigned to only one company or legal entity.

But what does this mean for an international company like Teradata which has legal entities in numerous European countries but organizes its sales function by combining countries? If Teradata was going to use an SAP ERP Sales & Distribution module, a different Sales Organization would need to be created for each country. This would have an impact on the master data, since it is used in the sales process: for example, materials need to be created on a Sales Organization level. This means that one product in the Teradata price list needs to be created in SAP one time on a general level e.g. description, stock keeping unit, and material type. Then this will also need to be created X times further with sales specific data; for example, the sales unit of measure. In theory this flexibility is great, if in one country bottles are sold in packs of six, then this could be the default for the Sales Organization for that country, whereas another country can default it to one bottle. In practice however if all the countries sell per bottle, it means an unwanted duplication of data.

Maintaining the master data can therefore become a very cumbersome job as an organization grows its product list and territories: each product will require its own master data in multiple instances. The maintenance is further hampered by the fact that it is impossible to see all the SAP master data in one place so comparisons can be made and errors easily spotted. From a maintenance perspective, it may be favorable, for example, to have one master data entry for each product, with exceptions for each country or legal entity.

We have worked with an international brewer that has a presence through a global network of distributors and breweries with operations in 71 countries around the world. They run 18 SAP ERP systems and other enterprise applications. Part of the project was to build a Master Data Quality Reporting solution to simplify the master data maintenance and reduce the errors that can inevitably occur when data is input manually.

The solution extracted all material masters, and programmed routines into reports which made it possible to easily identify master data issues. The result was that our customer was better able to maintain their master data despite having various Sales Organizations and numerous master data files for each product in their portfolio, helping them to be more efficient.

There are many ways to customize SAP which is one of the great benefits of the solution, the flexibility to arrange the software according to the unique configuration of your company. However, with this flexibility, a complexity issue often follows, which can make master data tricky to handle. Careful choices need to be made to ensure the customizations best match the organization both in terms of flexibility, and the ability to maintain master data.

A final thought: your master data is only as good as the rules that you apply to it -- which ultimately come down to good decision making. It’s key to carefully work out the ‘rule’ that ensures an optimal stock level for a particular product. For example, one will need to take into account seasonal variations, lead times, storage requirements and capital investment. And that’s where good analytics can really come in handy …. but that is for another blog!

Guest blogger Gert Jan Elemans is a Senior Software Engineer within Teradata Labs in the Netherlands. He has over ten years’ experience working with SAP R/3 and data warehouse environments and has won two awards for SAP implementations.

The integration issue that dare not speak its name ….

Posted on: March 25th, 2014 by Patrick Teunissen 2 Comments


Having worked with multinational companies running SAP ERP systems for many years, I know that they (nearly) always have more than one SAP system to record their transactional data. Yet it is never discussed -- and it seems to be the 'Macbeth' of the SAP world, a fact that should not be uttered out loud…

My first experience with SAP's software solutions dates back to1989 whilst at Shell Chemicals in the Netherlands, exactly 25 years ago. What strikes me most after all these years is that people talk about SAP as if it is one system covering everything that is important to business.

Undoubtedly SAP has had a huge impact on enterprise computing. I remember at Shell, prior to the implementation of SAP that we ran a vast quantity of transaction systems. The purchasing and stock management systems for example, were stand alone and not integrated with the general ledger system. The integration of these transaction systems had to be done via interfaces some of which were manual (information had to be typed over) At the month end, only after all interfaces had run, would the ledger show the proper stock value and accounts payable. So thanks to SAP the number of transaction systems has been dramatically reduced.

But of course the Shell Refining Company had its own SAP system just like the businesses in the UK, Germany etc etc. So in the late 80’s Shell ran multiple and numerous different SAP systems.

However this contradicts one of SAP’s key messages, their ability to integrate all sorts of transactional information to provide relevant data for analytical purposes in one hypothetical system (reference Dr. Plattner’s 2011 Beijing speech ).

I have always struggled with the definition of “relevant data” as I believe that what is relevant is dependent on 3 things: the user, the context and time. For an operator of a chemical plant for example, the current temperature of the unit and product conversion yields is likely to be “relevant” as this is the data needed to steer the current process. For the plant director the volumes produced and the overall processing efficiency of the last month maybe “relevant” as this is what his peers in the management team will challenge him on. SAP systems are as far as I know, not used to operate manufacturing plants, in which case the only conclusion can be that not all relevant data is in SAP. What you could say though, is that it is very likely that the “accounting” data is in SAP hence SAP could be the source for the plant’s management team reports.


However when businesses are running multiple SAP systems, as described earlier, the     conclusion cannot be that there is a (as in 1) SAP system in which all the relevant accounting data is processed. So a regional director responsible for numerous manufacturing sites may have to deal with data collected from multiple SAP systems when he/she needs to analyze the total costs of manufacturing of the last quarter.Probably because this does not really fit with SAP’s key message - one system for both transaction processing and analytics - they have no solution. I googled “analytics for multiple SAP systems” the results of which are shown above. As you can see other than the Teradata link there is no solution that will help our regional director. Even when the irrelevant words “analytics for” are removed only very technical and specific solutions are found.

Some people believe that this problem with analytics will be solved over time. Quite a few larger enterprises start with what I call re-implementations of the SAP software. Five years after my first exposure to SAP at Shell Chemicals in the Netherlands I became a member of the team responsible for the “re-implementation” of the software for Shell’s European Chemicals business. Of course there were cost benefits (less SAP systems = lower operational cost for the enterprise) and some supply chain related transactions could be processed more efficiently from the single system. But the region was still not really benefitting from it as the (national / legal) company in SAP is the most important object around which a lot has been organized (or configured) . Hence most multinational enterprises use another software product into which data is interfaced for the purpose of regional consolidation.

I was employed by Shell for almost 10 years. It is a special company and I am still in contact with a few people that I worked with. The other day I asked about the SAP landscape as it is today and was told that, 25 years after my first SAP experience they are still running multiple SAP systems and re-implementation projects. As I consider myself an expert in SAP I am sure I could have built a career on the re-implementation of the SAP systems.

The point that I want to make with this post is that many businesses need to take into account that they run multiple SAP systems, and more importantly that these systems are not automatically integrated. This fact has a huge impact on the analytics of the SAP data and the work required to provide an enterprise view of the business. So if you are involved in the delivery of analytical solutions to the organization then you should factor in “the Scottish play” issue into the heart of your design even if nobody else wants to talk about it.



2 This is why an appreciated colleague, a manufacturing consultant leader, always refers to SAP as the “Standard Accounting Package”.

3 In SAP the “Company” (T001-BUKRS) is probably the most important data object around which a lot has been organized (configured). Within SAP consolidation of these “companies’ is not an obvious thing to do. Extensions of the financial module (FI)designed to consolidate are difficult to operate and hardly ever used. Add to this the fact that almost every larger Enterprise has multiple SAP systems and the fact that consolidation takes place in “another” system is explained.

4 In 2007 SAP acquired OutlookSoft now known as SAP BPC (Business Planning & Consolidation) for this very purpose.