PART FIVE: This is the last blog in my series about Near Real Time data acquisition from SAP. This final blog is co-written with Arno Luijten, who is one of Teradata’s lead engineers. He is instrumental in demystifying the secrets of the elusive SAP clustered and pooled tables.
There is a common misconception that the Pool and Cluster tables in SAP R/3 can only be deciphered by the SAP R/3 application server, giving them an almost mythical status. The phrase that is used all over the ‘help sites’ and forums is “A clustered and a pooled table cannot be read from outside SAP because certain data are clustered and pooled in one field”… which makes replicating these tables pretty pointless – right?
But what exactly are Pooled and Cluster tables in SAP R/3 anyway? We thought we would let SAP give us the answer and searched their public help pages (SAP Help Portal). But that yielded limited results, so we looked further -- (Googled cluster tables) and found the following explanation (Technopedia-link):
“Cluster tables are special types of tables present in the SAP data dictionary. They are logical tables maintained as records of the normal SAP tables, which are commonly known as transparent tables. A key advantage of using cluster tables is that data is stored in a compressed format, reducing memory space and the landscape network load for retrieving information from these tables.”
Reading further on the same page, there are six major bullet points describing the features - of which fiveof them basically tell you that what we did cannot be done. Luckily, we didn’t let this phase us!
We agree: the purpose of SAP cluster tables is to save space because of the huge volume of the data contained in these tables and the potential negative impact that this may have on the SAP R/3 application. We know this because the two most (in-) famously large Cluster tables are RFBLG and KOCLU which contain the financial transactions and price conditions. SAP’s ABAP programmers often refer to BSEG (for financial transactions) and KONV (for the price-conditions).
From the database point of view, these tables do not exist but are contained in the physical tables named RFBLG and KOCLU. Typically these (ABAP) tables contain a lot of data. There are more tables set up in this way, but from a data warehousing point of view these two tables are probably the most relevant. Simply skipping these tables would not be a viable option for most clients.
Knowing the importance of the Pool and Cluster table, the value of data replication, and the value of operational analytics, we forged ahead with a solution. The encoded data from the ABAP table is stored as a BLOB “Binary Large Object” in the actual cluster table. To decode the data in the BLOB we wrote a C++ program as a Teradata User Defined Function (UDF) which we call the “Decoder” and it is installed directly within the Teradata database.
There can be a huge volume of data present in the cluster tables (hence the usage of cluster logic) and as a result decoding can be a lot of work and can have an impact on the performance of the SAP system. Here we have an extra advantage over SAP R/3 because the Decoder effectively allows us to bypass the ABAP layer and use the power of the Teradata Database. Our MPP capabilities allow decoding to be done massively faster than the SAP application, so decoding the RFBLG/KOCLU tables in Teradata can save a lot of time.
Over the last few months I have written about data replication starting with a brief SAP history, I questioned real-time systems, and I have written about the benefits of data replication and how it is disruptive to analytics for SAP R/3.
In my last blog I looked at the technical complexities we have had to overcome to build a complete data replication solution into Teradata Analytics for SAP® Solutions. It has not been a trivial exercise - but the benefits are huge!
Our Data Replication capability enables operational reporting and managerial analytics from the same source; it increases flexibility, significantly reduces the burden on the SAP R/3 system(s), and of course, delivers SAP data in near-real time for analytics.