This is the second instalment of a post examining the rather gushing and uncritical – not to mention re-cycled - coverage of the SAP HANA "in-memory" database technology in Scott M Fulton's recent article entitled: "SAP's HANA: Accelerating Your Apps by 6 Orders of Magnitude". In the first installment I reviewed some of the challenges concerned with the development of in-memory database technology. In this second instalment I will discuss the economics of storing Enterprise Data Warehouse (EDW) scale data sets in memory.
Is memory getting cheaper faster than Data Warehouses are getting bigger?
Analyst firm Gartner reported at it's January BI Summit in London this year that the unit cost of memory is falling by 30% every 18 months. However, the unit cost of memory is still an order of magnitude greater than the unit cost of magnetic storage – and the gap between these two costs shows no sign of closing. Not only that, but data volumes continue to increase exponentially. Reliable, industry-wide metrics for average growth rates of data warehouses are hard to come by, but the consensus amongst the practitioners that I speak to is that data warehouse growth rates are approximately 40% per annum, so that a data warehouse that was 10 TB in size at the end of 2011 is likely to be 20 TB in size at the end of 2013. In the face of these sorts of growth rates, Teradata CTO Stephen Brobst has described it as "economically irrational" for organizations to attempt to store all of their analytic data in-memory - especially since, as we have discussed, a copy of those same data will also then have to be stored on high-performance, persistent storage. And note that since "high-performance storage" increasingly means Solid State Storage (SSD) - cheaper than memory, but considerably more expensive than magnetic storage – this strategy means storing all of the data redundantly on the two most expensive storage tiers available to the system designer.
A sure sign that SAP understands the economic implications of its in-memory strategy is the recent attempt made by Global Solutions President, Sanjay Poonen, to divert attention from the relatively high costs of HANA systems by instead labelling Teradata systems as expensive. In his article, Fulton quotes Poonen as suggesting that Teradata systems typically cost "$5 to $10 Million". In fact, $10 Million in today's money buys you a lot of Teradata Data Warehouse Appliance – enough at list price to comfortably store 300 TB of customer data - and the vast majority of our Hardware-and-Software sales come with a considerably lower price tag than this.
The 80/20 rule applies to storage access, too
So the first economic challenge for an in-memory Data Warehouse platform is that Data Warehouses appear to be getting bigger much faster than the unit cost of memory is getting smaller. And there is a second economic challenge, too, because it turns out that the "Paretto" or "80/20" rule applies to storage access, too.
What that means in practice is that 80% of the time, users are accessing just 20% of the data in the Data Warehouse. This 20% of the data is "hot" – meaning accessed frequently. Storing this sub-set of the data on a high-performance and relatively expensive storage medium makes perfect sense; it's storing the balance of the data in the same high-performance and relatively expensive storage medium that is "economically irrational".
The catch in all of this is that the temperature of data are changing continuously. Some of the changes in the temperature of the data are entirely predictable; in general newer data are hotter than older data. Some of the variation is more complex, but also somewhat predictable; financial data, for example, get progressively warmer at month- and quarter-end. And some of the variation is entirely unpredictable - for example, if your CEO suddenly decides to buy your closest rival and as a consequence you need to undertake a detailed analysis of the overlap between your current operations and theirs, this sub-set of your data will warm very rapidly indeed.
Which is why our own Teradata Virtual Storage (TVS) sub-system enables Teradata systems to automatically measure how frequently data are accessed at a very fine level of granularity - and to automatically migrate just the hottest data to high-performance Solid State Storage (SSD). Precisely because all data are not created equal, applying relatively expensive, high-performance storage technologies selectively and where they can deliver the most bang-for-the-buck is the economically rational approach to exploiting them.
SAP appears to be betting on the unit cost of memory falling faster than the rate at which analytic data sets are getting larger. And as we have seen, at least as far as Enterprise Data Warehousing is concerned, that looks like a losing bet. Since SAP is a well managed company, probably its management know this to be the case. All of which means that appearances are probably deceptive - and that SAP's management is actually targeting an altogether different market opportunity. More on this next time.
Director of Platform & Solutions Marketing