Technical complexities of data replication from SAP R/3 for Real-time Analytics

A series on ‘what’s next in analytics on SAP R/3’ – part 4

Engineers are people that want to fix a problem! As you may recall in my first Blog I wrote about Arno Luijten, one of the senior engineers in my team, who almost 10 years ago proposed data replication as the fix for many problems! I ended my previous Blog stating that, if data replication creates significant business value then why isn’t everybody doing it?

Well the first reason is market perception which is not based on facts free of emotion. When Arno first proposed his solution as a true engineer wanting to fix a problem; it was disregarded because clients and the SAP market didn’t allow anyone or anything to touch the ERP! But now, a decade later, SAP is “promoting” replication as the way to move data to HANA and this has changed the perception of replication in the SAP R/3 market! …. Cool 😉 ….

Beyond winning the hearts and minds there are also complex technical issues that need to be solved before the “technique” does what it should do, i.e. move data from SAP R/3 to the data warehouse in near real time. I will take a look at four of the technical hurdles we have overcome to create our out-of-the-box solution.

Standing up the technical infrastructure can be a labor intensive and tedious task
Underneath the covers, the SAP R/3 database contains at least 25,000 tables but most of these tables do not contain data relevant for analytical purposes. In fact only about 1 to 2% do, the rest are tables containing redundant data, aggregates or specific configuration needed to run the transactions. However 1 to 2% still means that up to 500 tables may be needed for replication. Having experimented with multiple standard data replication tools you can expect to set-up 4 to 5 tables per man day. This means you can expect it take up to a 100 days just to stand up the data acquisition layer with these standard tools.

The reason this takes so long is that the meta data of the tables is different between different SAP R/3 systems. Therefore the data replicator tool must be configured to deal with the exact SAP system the data is extracted from and a meticulous mirrored structure must be created on the target replication database. Additionally, it is common practice when data replication is used as a technique to feed analytical appliances to add (virtual) columns to these replicated tables. These columns are then used for reconciliation purpose or in the updating of the core of the Integrated Data Warehouse.

Our Teradata Analytics for SAP Solution replicator comes with these feature built in and allows us to configure ALL 500 tables (including the virtual columns) in one action, reducing the work required to less than a day.

Dealing with deleted records in SAP R/3
It is not like SAP had a master plan when the company built system R/*, nor was it consistent in the delete processes. For example: When a user wants to correct a financial transaction, like a ledger posting, the transaction must be reversed and put back in again. The result of such a correction is 3 records (+ – +). This is in stark contrast to the process of deleting a sales order which can be “hard” deleted and later a completely new sales order can be created even reusing the same keys.

Data replication software can deal with these deletes. In fact, the software can even remove the record from the replicated database as well so that it stays in synch with the SAP R/3 source database. However, this creates a problem for the integration into the Core database because you can’t select a record from the replication database to make an update to the record in Core (i.e. to delete it) when it’s no longer in the replication database. To overcome that problem our data replication software allows records to be marked (flagged) as deleted or “soft” deleted. In practical terms, a column is added to the replicated record in which an operational code like a “D” is stored. That would have solved the delete issue expect that SAP allows for the re-use of keys. In this special case the original “soft” deleted record and the “new” record would have the same key leading to a key violation error when the second record is inserted into the database.

The Teradata Analytics for SAP Solutions Data Replicator has been designed to deal with these SAP ‘delete issues.’

Integrating the raw replicated SAP data
If only 1 to 2 % of the tables contain data that is relevant from a reporting and analytical view point the next question that should spring to mind is: which 1 to 2%? We’ve spent the last 15 years figuring that out and that’s what enables us to deliver data acquisition in days instead of quarters. The truth is, even if we would publish the list of tables you still need the other half of the equation; putting the data in context (integration) so you can make use of the data for analytics. That requires 2 major pieces, a physical data model (we call it the Core database) and the transformation processes to load that data from the Replication database into the Core database. These non-trivial pieces are also delivered as part of our solution.

In summary, Teradata Analytics for SAP Solutions demystifies SAP ERP data by knowing which tables to select from SAP R/3 while simultaneously dealing with SAP’s intricate delete processes. And it delivers context with engineered data lineage, an integrated physical data model designed for analytical purposes, plus all of the scripts to transform the data.

This is the 4th and final technical complexity: decoding the mythical SAP pool and cluster tables. Please join me on my 5th and final installment where we complete the SAP journey by taming these ‘unicorn’ tables.

Category: Other Patrick Teunissen

About Patrick Teunissen

Patrick Teunissen is the Engineering Director at Teradata responsible for the Research & Development of the Teradata Analytics for SAP® Solutions at Teradata Labs in the Netherlands. He is the founder of NewFrontiers which was acquired by Teradata in May 2013

Leave a Reply

Your email address will not be published. Required fields are marked *