I have recently been accused by a commentator at SAP of engaging in “silly marketing”. I will plead guilty to the latter charge – the clue is in my title, after all – but not to the former.
The substance of the SAP argument can be summarised thus: it is economically rational to “store” some data in memory today (because in the case of the frequently accessed data it is the cost of access that we should care about, not the cost of storage); memory is getting cheaper; therefore we should store all data in-memory in the future.
Now some of this logic is impeccable. It is economically rational to store frequently accessed data in high-performance storage and/or memory today. Of course, since memory is not persistent, the appropriate test for SAP HANA should be whether the data is accessed frequently enough to justify “storing” it once in memory and once again in Solid State Storage, but let’s overlook this detail for now.
Some of this logic makes rather less sense to me, however. In fact, it looks a lot like a variation on the Politician's syllogism. To illustrate why, I introduce a (deliberately) “silly” proposal of my own: some offenders are likely to re-offend; they can be prevented from re-offending by incarcerating them indefinitely; therefore we should incarcerate all offenders indefinitely.
Quite apart from the implications for Civil Liberties of such an un-civil proposition, the issue with this logic is that it conflates individual behaviour with aggregate behaviour. SAP would have us treat all analytic data in the same way - as if they were uniform and homogenous.
Of course, analytic data are not uniform or homogenous. Our own research leads us to conclude that in a typical Data Warehouse, 80% of accesses are to just 20% of the “hot” data - and as many as 50% of accesses are to as little as 2% of the “white hot” data.
(The unfortunate complication in all of this is that it is not the same 2% of the data that are “white hot” all of the time; rather the relative temperature of the entire data-set in the Data Warehouse changes continuously in response to constantly changing and evolving business demand.)
I would argue that almost the whole industry now agrees that both the white hot and the hot data should be placed on high-performance “storage”; at Teradata we have already introduced Teradata Virtual Storage (TVS) to this end; Oracle includes “flash cache” in its Exadata database appliance and has also introduced its Exalytics appliance; and our friends at SAP never miss an opportunity to point out that they have put a new kid on the Information Management block that goes by the name of HANA. Quoting the Five Minute Rule – which is anyway predicted on a traditional storage + cache computing architecture - is moot, since the debate is not about whether these data should be “stored” on high-performance media, rather it is about (1) how they should get there and (2) if they should all stay there, all of the time.
For the most part, SAP takes the view – publically, at least - that they all should stay there, all of the time. The rest of us – IBM, Oracle, Teradata, etc., etc. – see levels of growth in our customer’s analytical data-sets that lead us to conclude that this approach is economically irrational (at least whilst the unit costs of memory and SSD are orders of magnitude greater than the unit costs of magnetic storage media), even if we disagree (vigorously!) about how the “hot” data should get to where they need to be.
So is this a zero-sum game? Do IBM, Oracle and Teradata have to be wrong for SAP to be right?
Well, maybe. And then again, maybe not.
When I am not writing “silly” blogs, one of my responsibilities is to understand the competitive landscape in which Teradata operates in EMEA. And I have to tell you that – today at least – that landscape does not really include HANA.
Which is odd, because to listen to SAP’s Executives is to be told that HANA is one of the most successful products in the company’s history. In which case, you might imagine that Teradata - the acknowledged industry leader, according to Gartner and others - would be competing with SAP for every single Data Warehouse opportunity in the region.
Since we are not - and since I assume that SAP, a well-run and much-respected company - is being truthful, the only reasonable interpretation of these two apparently orthogonal facts is that we are in fact operating in two distinct, albeit adjacent, markets.
For all of SAP’s declared ambitions to make HANA into a world-class Information Management platform - capable of supporting both online transaction processing and analytics, simultaneously - I think that the reality is that today SAP is focused on using HANA to support operational reporting on small, discrete, independent sub-sets of data sourced directly from operational systems. And since our own “Active Enterprise Intelligence” (read “Operational Intelligence” or whichever moniker you prefer for Business Activity Monitoring and Event-Based Analytics) business is based on loading data to the Integrated Data Warehouse in near real-time so that we can understand which events are truly significant – outliers, rather than just routine business noise - this discrete market is one that historically we have not played in. So right now we don’t bump into each other very often (unlike those other guys I mentioned, who show-up all of the time).
Now SAP has been clear that they want to compete with us – and Oracle and IBM - in the Data Warehouse space. And I don’t doubt that right now they are busy trying to figure out how they should go about adding the capabilities that will be required to do so – support for complex schemas; a robust, cost-based optimizer; mixed-workload management; autonomic systems management, etc., etc., etc. – into a HANA product that manifestly does not include them today. And jolly good luck to them, because the competition makes us all stronger - and the other guys could really use the help (sorry my friends and rivals at IBM and Oracle, but I just couldn’t resist).
But precisely because customers recognise that data volumes are exploding – and that hybrid storage models like TVS are the solution to the resulting Total Cost of Ownership (TCO) challenge - I will bet my opposite number at SAP £250 – to be donated to a charity of his or her choice in the event that I lose the wager – that within two years of today’s date, SAP will have announced support for, or plans to support, a hybrid storage of their own.
(Should I win, I ask that £250 is delivered to the London offices of the National Autistic Society. And since Hasso Plattner has already hinted strongly in his presentation at Sapphire 2012 that this is the direction that SAP are already working towards, I will win.)
Lastly, because I am just a “silly” marketing guy and old habits die hard, I will take this opportunity to remind the rest of you that Teradata has already been shipping hybrid systems, with fully automated data migration, for more than 18 months now - and to point out that we have big plans to extend this feature that we will be ready to announce in the near future.
Martin Willcox
Director of Platform & Solutions Marketing (EMEA)
Teradata Corporation
Paul Johnson
November 12, 2012
There are a few points worthy of note in both part 1 and part 2 of the SAP article to which Martin responds.
“HANA really solves the problem by keeping all of the data in-memory all of the time”. Are we really advocating systems with potentially multi-petabytes of RAM in which to store all of our enterprise analytic data? Try and get a CFO to sign that off!
“HANA will out-perform Teradata by a 1000X on any single query…”. Will that remain the true if the data required by the query is also in RAM on Teradata, as may be the case? Since when did we measure system performance on a single query anyway?
“But it is a product that was architected for weak single-core nodes with little memory.” Not only is that true but it was necessarily the case back in the 1980’s, when Teradata started life with x86 chips each accessing a single dedicated disk drive. SMP ‘nodes’ as we now know them were not in play until the NCR buyout in the mid-1990’s
“HANA is new and it is architected for the processor technology out today”. There is nothing new under the sun etc. In-memory databases have been around for decades: http://en.wikipedia.org/wiki/In-memory_database.
“Teradata will require a technology refresh”. Teradata has moved from the proprietary TOS operating system through NCR’s MP-RAS version of Unix and Windows (both 32 bit) and onto SUSE Linux (64 bit). The proprietary server and storage architecture has been ditched for open versions OEMd from the likes of Dell and LSI. Single core CPUs have been replaced with multi-core multi-threaded CPUs. ‘Not much RAM per node’ has been replaced with ‘lots of RAM per node’, should you desire. Hot standby nodes have been introduced. Capacity on demand (COD) has been introduced. High compute power/low IO bandwidth appliances have been introduced. Not quite as static and due a refresh as it might seem.
“it does not matter how often the table is accessed, it should be in-memory.” Is this really a silver bullet? If so, what problem is it solving? The argument being put forward seems to rest on economics only, as if the non-economic benefits of storing data in RAM are a given.
As Martin so correctly points out, a class-leading system consists of many components/capabilities such as “support for complex schemas; a robust, cost-based optimizer; mixed-workload management; autonomic systems management”.
I would also add high resilience/uptime, linear scalability and complex query support to that list – especially support for the kind of gnarly SQL that comes out of BI tools such as SAP’s own Business Objects!
The simplistic notion of storing all data in RAM, and seeing the world narrowly as a DIMM slot challenge, will never compensate for lack of capability in all of these other areas.
Teradata, SAP, HANA, in memory
November 13, 2012
[...] http://blogs.teradata.com/emea/lock-em-all-up-and-throw-away-the-key/ [...]