Big Data: not unprecedented but not bunk either – Part III

Tuesday September 9th, 2014

In my last post in this series, I explained the five big challenges that organisations must address in order to successfully work with Big Data. Between them, these five challenges are combining to drive the most significant evolution in Enterprise Analytical Architecture since Devlin, Inmon, Kimball et. al. gave the world the Enterprise Data Warehouse. Contrary to some of the more breathless industry hype, thirty years of Information Management best-practice has not been rendered obsolete overnight. But we should increasingly regard the Data Warehouse as necessary, but no longer sufficient by itself.

Where data are re-used we need to minimize the Total Cost of Ownership by amortising the (considerable) acquisition and integration costs over multiple business processes, by bringing multiple Analytical Applications to one copy of the data, rather than the other way around. Where data supports mission-critical business processes, it needs to be accurate, reliable and certified (and one copy is better than two, because a man with one watch knows the time – but a man with two watches is never quite sure). And where we want to optimise end-to-end business processes (rather than merely spin the plates faster in a particular department), we need to integrate data to support cross-functional Analytics. These considerations – in large part the motivation for the original Data Warehouse concept in the first place – are dominant when we seek to operationalise Analytics (the final challenge of the five that I identified in my last post) by sharing actionable insights across the organisation and across functional, organisational and geographical boundaries. Because deploying an Integrated Data Warehouse is still the most rational way to address them, rumours of its demise have been very seriously exaggerated. And because parallel RDBMS platforms are still the only technologies with the proven elastic and multi-dimensional scalability required to support a complex mix of workloads, they are still the only game in town when it comes to bringing multiple Analytical Applications to one copy of the organisation’s (structured) data assets.

Big Data challenges one-through-four, however, increasingly require that we augment the Data Warehouse with new architectural constructs that in many cases are best deployed on new technologies. A “data platform” or “data lake”, for example – built on a technology with a lower unit cost of storage than a data warehousing platform, which is designed and optimised for high-performance sharing of data – can enable organisations to address the economic challenge of capturing large and noisy data sets of unproven value. Distributed filesystem technologies may be a more natural fit for capturing complex, multi-structured data – and “late binding” multiple, different schemas to them – than a Relational Database Management System (RDBMS). And technologies designed from the ground-up to support time series, path and graph Analytics can offer important ease-of-use and performance advantages for the complex analysis of interaction data modelled as a network or a graph.

Leading analyst firm Gartner has coined the term “Logical Data Warehouse” to describe the evolution from what we might term “monolithic” to more distributed Data Warehouse architectures. Whatever label we apply to this evolution – and at Teradata, we prefer “Unified Data Architecture” – it is clear that the future of enterprise Analytical Architecture is plural. We will increasingly need to deploy and integrate multiple analytic platforms, each optimized to address different combinations of Big Data challenges one-through-five which I outlined in my last post, and are laid out in the figure below.

Graphic_5 BD Challenges

Some Analysts and commentators predict that all of this means trouble for Teradata. Their logic goes something like this: Teradata led the industry when the dominant architectural pattern was the Integrated Data Warehouse; increasingly it won’t be the dominant, or at any rate, the only architectural pattern – and so Teradata will no longer continue to be a 500 pound Gorilla in the Analytics Jungle.

You wouldn’t expect me to agree with that particular assessment. And I don’t, for two reasons.

The first flaw in this argument is that it pre-supposes that the Integrated Data Warehouse architectural pattern is going away. And as we have already discussed, the new technologies and architectures are extending it, not replacing it.

The second flaw in this argument is that it ignores the fact that Teradata is leading the industry’s adaptation to the realities of the three “new waves” of Big Data, with new platform and integration technologies that are enabling leading organisations to actually deploy Logical Data Warehouse architectures – to “walk the walk” whilst our competitors merely “talk the talk”.

Before we get too caught up on technology, however, we should remember that Enterprise Architecture – good Enterprise Architecture, anyway – is conceptual, rather than physical. That being the case, just what does a “Logical Data Warehouse” architecture look like, what are its key components – and how does it address the five challenges that we have already described? I will try and tackle these questions in my next post.

Leave a Reply

Your email address will not be published. Required fields are marked *