BI

Garbage In-Memory, Expensive Garbage

Posted on: July 7th, 2014 by Patrick Teunissen 2 Comments

 

A first anniversary is always special and in May I marked my first with Teradata. In my previous lives I celebrated almost ten years with Shell and seventeen years creating my own businesses focused on data warehousing and business intelligence solutions for SAP. With my last business “NewFrontiers” I leveraged all twenty seven years of ERP experiences to develop a shrink wrapped solution to enable SAP analytics. 

Through my first anniversary with Teradata, all this time, the logical design of SAP has been the same. To be clear, when I say SAP, I mean R/3 or ‘R/2 with a mouse’ if you’re old enough to remember. Today R/3 is also known as the SAP Business suite, ERP or whatever. Anyway, when I talk about SAP I mean the application that made the company rightfully world famous and that is used for transaction processing by almost all large multinational businesses.

My core responsibility at Teradata is the engineering of the analytical solution for SAP. My first order of business was focusing my team on delivering an end-to-end business analytic product suite to analyze ERP data that is optimized for Teradata. Since completing our first release, my attention turned to adding new features to help companies take their SAP analytics to the next level. To this end, my team is just putting the finishing touches on a near real-time capability based on data replication technology. This will definitely be the topic of upcoming blogs.

Over the past year, the integration and optimization process has greatly expanded my understanding of the differentiated Teradata capabilities. The one capability that draws in the attention of types like me ‘SAP guys and girls’ is Teradata Intelligent Memory. In-memory computing has become a popular topic in the SAP community and the computer’s main memory is an important part of Teradata’s Intelligent Memory. However Intelligent Memory is more than “In-Memory” -- because with Intelligent Memory, the database addresses the fact that not all memory is created equal and delivers a solution that uses the “right memory for the right purpose”. In this solution, the most frequently used data – the hottest -- is stored In-Memory; the warm data is processed from a solid state drive (SSD), and colder, less frequently accessed data from a hard disc drive (HDD). This solution allows your business to make decisions on all of your SAP and non-SAP data while coupling in-memory performance with spinning disc economics.

This concept of using the “right memory for the right purpose” is very compelling for our Teradata Analytics for SAP solutions. Often when I explain what Teradata Analytics for SAP Solutions does, I draw a line between DATA and CONTEXT. Computers need DATA like cars need fuel and the CONTEXT is where you drive the car. Most people do not go the same place every time but they do go to some places more frequently than others (e.g. work, freeways, coffee shops) and under more time pressure (e.g. traffic).

In this analogy, most organizations almost always start building an “SAP data warehouse” by loading all DATA kept in the production database of the ERP system. We call that process the initial load. In the Teradata world we often have to do this multiple times because when building an integrated data warehouse it usually involves sourcing from multiple SAP ERPs. Typically, these ERPs vary in age, history, version, governance, MDM, etc. Archival is a non-trivial process in the SAP world and the majority of the SAP systems I have seen are carrying many years of old data . Loading all this SAP data In-Memory is an expensive and reckless thing to do.

Teradata Intelligent Memory provides CONTEXT by storing the hot SAP data In-Memory, guaranteeing lightning fast response times. It then automatically moves the less frequently accessed data to lower cost and performance discs across the SSD and HDD media spectrum. The resulting combination of Teradata Analytics for SAP coupled with Teradata’s Intelligent Memory delivers in-memory performance with very high memory hit rates at a fraction of the cost of ‘In-Memory’ solutions. And in this business, costs are a huge priority.

The title of this Blog is a variation on the good old “Garbage In Garbage Out / GIGO” phrase; In-Memory is a great feature, but not all data needs to go there! Make use of it in an intelligent way and don’t use it as a garbage dump because for that it is too expensive.

Patrick Teunissen is the Engineering Director at Teradata responsible for the Research & Development of the Teradata Analytics for SAP® Solutions at Teradata Labs in the Netherlands. He is the founder of NewFrontiers which was acquired by Teradata in May 2013.

Endnotes:
1 Needless to say I am referring to SAP’s HANA database developments.

2 Data that is older than 2 years can be classified as old. Transactions, like sales and costs are often compared with the a budget/plan and the previous year. Sometimes with the year before that but hardly ever with data older than that.

 

About one year ago, Teradata Aster launched a powerful new way of integrating a database with Hadoop. With Aster SQL-H™, users of the Teradata Aster Discovery Platform got the ability to issue SQL and SQL-MapReduce® queries directly on Hadoop data as if that data had been in Aster all along. This level of simplicity and performance was unprecedented, and it enabled BI & SQL analysts that knew nothing about Hadoop to access Hadoop data and discover new information through Teradata Aster.

This innovation was not a one-off. Teradata has put forward the most complete vision for a data and analytics architecture in the 21st century. We call that the Unified Data Architecture™. The UDA combines Teradata, Teradata Aster & Hadoop into a best-of-breed, tightly integrated ecosystem of workload-specific platforms that provide customers the most powerful and cost-effective environment for their analytical needs. With Aster SQL-H™, Teradata provided a level of software integration between Aster & Hadoop that was, and still is, unchallenged in the industry.

 

Teradata Unified Data Architecture™ image

Teradata Unified Data Architecture™

Today, Teradata makes another leap in making its Unified Data Architecture™ vision a reality. We are announcing SQL-H™ for Teradata, bringing the best SQL engine for data warehousing and analytics to Hadoop. From now on, Enterprises that use Hadoop to store large amounts of data will be able to utilize Teradata's analytics and data warehousing capabilities to directly query Hadoop data securely through ANSI standard SQL and BI tools by leveraging the open source Hortonworks HCatalog project. This is fundamentally the best and tightest integration between a data warehouse engine and Hadoop that exists in the market today. Let me explain why.

It is interesting to consider Teradata's approach versus alternatives. If one wants to execute SQL on Hadoop, with the intent of building Data Warehouses out of Hadoop data, there are not many realistic options. Most databases have a very poor integration with Hadoop, and require Hadoop experts to manage the overall system - not a viable option for most Enterprises due to cost. SQL-H™ removes this requirement for Teradata/Hadoop deployments. Another "option" are the SQL-on-Hadoop tools that have started to emerge; but unfortunately, there are about a decade away from becoming sufficiently mature to handle true Data Warehousing workloads. Finally, the approach of taking a database and shoving it inside Hadoop has significant issues since it suffers from the worst of both worlds – Hadoop activity has to be limited so that it doesn't disrupt the database, data is duplicated between HDFS and the database store, and performance of the database is less compared to a stand–alone version.

In contrast, a Teradata/Hadoop deployment with SQL-H™ offers the best of both worlds: unprecedented performance and reliability in the Teradata layer; seamless BI & SQL access to Hadoop data via SQL-H™; and it frees up Hadoop to perform data processing tasks at full efficiency.

Teradata is committed to being the strategic advisor of the Enterprise when it comes to Data Warehousing and Big Data. Through its Unified Data Architecture™ and today's announcement on Teradata SQL-H™, it provides even more performance, flexibility and cost-effective options to Enterprises eager to use data as a competitive advantage.