Teradata’s SNAP Framework Combines All Data, All Analytics

Posted on: November 6th, 2013 by Teradata Aster No Comments

The latest major release of the Teradata Aster Discovery Platform is available now. I think it is one of the most exciting releases we've unveiled not only because of what it delivers, but also because of the foundations it lays for future innovation.

Besides all the new functionality that Teradata Aster 6.0 provides (which I discussed in a recent blog post), what I want to focus on in this blog is the new "spine" of the Teradata Aster platform – namely, a Unifying Discovery Framework we call SNAP™. But before I talk about what SNAP does, I want to talk a little bit about what problem it solves. 

As readers of this blog, you probably have some experience and interest in Big Data and by now you have realized that there's no single analytical technique or interface that satisfies every Big Data problem: SQL, MapReduce, Statistical Modeling, Text & Graph Analytics, each plays a crucial role in answering Big Data questions. In addition, there's also no single data type that can solve all analytical challenges; structured, multi-structured, log data, graph data, text data – everything is needed. A single analytical opportunity (e.g. Identifying customer churn through customer behavior) can leverage many, if not all of them.  

Thus, the key to achieving success with Big Data is the ability to combine multiple data types, together with multiple analytical techniques, all at the same time while applying them all to the same problem. 

The reality, however, is that there is no product out there that can do this properly. Most "big data" platforms are systems that focus on just one interface, e.g. SQL. Hadoop, which Teradata has warmly embraced for the right use cases, is a collection of open source products that are not integrated with each other. So more often than not, people end up picking one analytical engine and solving the problem with just that. But this is like saying that you are building a football team comprised of only quarterbacks. Sure, they'll play ball; but they stand no chance against a team that uses the right specialist for each position.

Enter the Teradata Aster 6.0 SNAP™ Framework. SNAP™ allows our Discovery Platform to plug-and-play different analytical engines, as well as different storage engines – while maintaining a common SQL-based interface. It allows analysts with SQL and BI/Visualization skills to solve analytical challenges by easily combining the right data with the right analytical tools, whether that's SQL, MapReduce, Statistics, Text, Path & Time Series or Graph. In fact, in Teradata Aster 6.0, Graph (SQL-GR in particular) is an all-new native analytical engine that integrates with SQL and the rest of the architecture using the SNAP™ framework.

On the storage side, SNAP™ allows our customers to combine, for the first time in a commercial platform, relational storage (row/column data store) with fully unstructured file system-based storage (Aster File System or AFS). Data can move from relational to AFS and back seamlessly; the same query can access data from either or both storage engines. SNAP™ makes every combination of analytics & data possible!

So SNAP™ enables some unique and important capabilities. But how does it work? It combines four main components.

  1. A query optimizer that understands SQL, MapReduce and Graph; and that can be extended to additional analytical engines in the future, in a plug-and-play manner. The optimizer is comfortable with both relational and file-based data, and also has internal APIs that allow for future data storage engine extensions;
  2. An execution engine that when given a diverse analytical plan from the optimizer knows how to combine the data with the analytics in an analytics & data agnostic manner;
  3. A unified SQL framework that ties together all the different analytical engines, whether MapReduce, Graph or future additions. This is critical because it enables SQL analysts, as well as Visualization tools that use SQL, to take advantage of powerful analytics (like Graph) without the need to hire an army of Java developers;
  4. A layer that integrates and manages resources, including storage resources, CPU, IO, memory etc.

I want to take a minute and compare SNAP to Hadoop YARN. Both have a good reason to exist, but they also have different goals. Essentially, YARN with HDFS is focused on point #4 above. So, YARN integrates everything at the resource management level. However, there is no integration at the interface level (like SNAP has with SQL); no common execution engine (to combine all data & analytics in the same execution plan); and query optimization across different engines (SQL, Graph, MapReduce) does not exist which means performance of complex analytics will suffer.

Lack of common interface is probably the most important difference between Teradata Aster & Hadoop, from a Discovery Platform perspective. E.g. a Graph query will require Java code and a Hive or Impala query will require SQL code. Different interfaces mean (a) less overall performance and hardware utilization; (b) more effort to set everything up and combine it together; and (c) different skills (and probably people) to use each interface. That being said, one can use Hadoop to build a "data lake" use case, where data is collected and transformed, just with layer #4 (YARN & HDFS), which is why Hadoop is prominently featured and promoted in the Teradata Unified Data Architecture for the specific use case.

Together, Teradata Aster 6.0 and SNAP take us one step closer to our vision: enable every type of analysis, on every type of data; all at the same time and all accessible to the average SQL analyst or Visualization tool user. We are not fully done yet, but we've come a really long way!

Leave a comment

*