I have recently been accused by a commentator at SAP of engaging in “silly marketing”. I will plead guilty to the latter charge – the clue is in my title, after all – but not to the former.
The substance of the SAP argument can be summarised thus: it is economically rational to “store” some data in memory today (because in the case of the frequently accessed data it is the cost of access that we should care about, not the cost of storage); memory is getting cheaper; therefore we should store all data in-memory in the future.
Now some of this logic is impeccable. It is economically rational to store frequently accessed data in high-performance storage and/or memory today. Of course, since memory is not persistent, the appropriate test for SAP HANA should be whether the data is accessed frequently enough to justify “storing” it once in memory and once again in Solid State Storage, but let’s overlook this detail for now.
Some of this logic makes rather less sense to me, however. In fact, it looks a lot like a variation on the Politician's syllogism. To illustrate why, I introduce a (deliberately) “silly” proposal of my own: some offenders are likely to re-offend; they can be prevented from re-offending by incarcerating them indefinitely; therefore we should incarcerate all offenders indefinitely.
Quite apart from the implications for Civil Liberties of such an un-civil proposition, the issue with this logic is that it conflates individual behaviour with aggregate behaviour. SAP would have us treat all analytic data in the same way - as if they were uniform and homogenous.
Of course, analytic data are not uniform or homogenous. Our own research leads us to conclude that in a typical Data Warehouse, 80% of accesses are to just 20% of the “hot” data - and as many as 50% of accesses are to as little as 2% of the “white hot” data.
(The unfortunate complication in all of this is that it is not the same 2% of the data that are “white hot” all of the time; rather the relative temperature of the entire data-set in the Data Warehouse changes continuously in response to constantly changing and evolving business demand.)
I would argue that almost the whole industry now agrees that both the white hot and the hot data should be placed on high-performance “storage”; at Teradata we have already introduced Teradata Virtual Storage (TVS) to this end; Oracle includes “flash cache” in its Exadata database appliance and has also introduced its Exalytics appliance; and our friends at SAP never miss an opportunity to point out that they have put a new kid on the Information Management block that goes by the name of HANA. Quoting the Five Minute Rule – which is anyway predicted on a traditional storage + cache computing architecture - is moot, since the debate is not about whether these data should be “stored” on high-performance media, rather it is about (1) how they should get there and (2) if they should all stay there, all of the time.
For the most part, SAP takes the view – publically, at least - that they all should stay there, all of the time. The rest of us – IBM, Oracle, Teradata, etc., etc. – see levels of growth in our customer’s analytical data-sets that lead us to conclude that this approach is economically irrational (at least whilst the unit costs of memory and SSD are orders of magnitude greater than the unit costs of magnetic storage media), even if we disagree (vigorously!) about how the “hot” data should get to where they need to be.
So is this a zero-sum game? Do IBM, Oracle and Teradata have to be wrong for SAP to be right?
Well, maybe. And then again, maybe not.
When I am not writing “silly” blogs, one of my responsibilities is to understand the competitive landscape in which Teradata operates in EMEA. And I have to tell you that – today at least – that landscape does not really include HANA.
Which is odd, because to listen to SAP’s Executives is to be told that HANA is one of the most successful products in the company’s history. In which case, you might imagine that Teradata - the acknowledged industry leader, according to Gartner and others - would be competing with SAP for every single Data Warehouse opportunity in the region.
Since we are not - and since I assume that SAP, a well-run and much-respected company - is being truthful, the only reasonable interpretation of these two apparently orthogonal facts is that we are in fact operating in two distinct, albeit adjacent, markets.
For all of SAP’s declared ambitions to make HANA into a world-class Information Management platform - capable of supporting both online transaction processing and analytics, simultaneously - I think that the reality is that today SAP is focused on using HANA to support operational reporting on small, discrete, independent sub-sets of data sourced directly from operational systems. And since our own “Active Enterprise Intelligence” (read “Operational Intelligence” or whichever moniker you prefer for Business Activity Monitoring and Event-Based Analytics) business is based on loading data to the Integrated Data Warehouse in near real-time so that we can understand which events are truly significant – outliers, rather than just routine business noise - this discrete market is one that historically we have not played in. So right now we don’t bump into each other very often (unlike those other guys I mentioned, who show-up all of the time).
Now SAP has been clear that they want to compete with us – and Oracle and IBM - in the Data Warehouse space. And I don’t doubt that right now they are busy trying to figure out how they should go about adding the capabilities that will be required to do so – support for complex schemas; a robust, cost-based optimizer; mixed-workload management; autonomic systems management, etc., etc., etc. – into a HANA product that manifestly does not include them today. And jolly good luck to them, because the competition makes us all stronger - and the other guys could really use the help (sorry my friends and rivals at IBM and Oracle, but I just couldn’t resist).
But precisely because customers recognise that data volumes are exploding – and that hybrid storage models like TVS are the solution to the resulting Total Cost of Ownership (TCO) challenge - I will bet my opposite number at SAP £250 – to be donated to a charity of his or her choice in the event that I lose the wager – that within two years of today’s date, SAP will have announced support for, or plans to support, a hybrid storage of their own.
(Should I win, I ask that £250 is delivered to the London offices of the National Autistic Society. And since Hasso Plattner has already hinted strongly in his presentation at Sapphire 2012 that this is the direction that SAP are already working towards, I will win.)
Lastly, because I am just a “silly” marketing guy and old habits die hard, I will take this opportunity to remind the rest of you that Teradata has already been shipping hybrid systems, with fully automated data migration, for more than 18 months now - and to point out that we have big plans to extend this feature that we will be ready to announce in the near future.
Director of Platform & Solutions Marketing (EMEA)