Teradata Uses Open Source to Expand Access to Big Data for the Enterprise

By | September 30, 2015

By Mark Shainman, Global Program Director, Competitive Programs

Teradata’s announcement of the accelerated release of enterprise-grade ODBC/JDBC drivers for Presto opens up an ocean of big data on Hadoop to the existing SQL-based infrastructure. For companies seeking to add big data to their analytical mix, easy access through Presto can solve a variety of problems that have slowed big data adoption. It also opens up new ways of querying data that were not possible with some other SQL on Hadoop tools. Here’s why.

One of the big questions facing those who toil to create business value out of data is how the worlds of SQL and big data come together. After the first wave of excitement about the power of Hadoop, the community quickly realized that because of SQL’s deep and wide adoption, Hadoop must speak SQL. And so the race began. Hive was first out of the gate, followed by Impala and many others. The goal of all of these initiatives was to make the repository of big data that was growing inside Hadoop accessible through SQL or SQL-like languages.

In the fall of 2012, Facebook determined that none of these solutions would meet its needs. Facebook created Presto as a high-performance way to run SQL queries against data in Hadoop. By 2013, Presto was in production and released as open source in November of that year.

In 2013, Facebook found that Presto was faster than Hive/MapReduce for certain workloads, although there are many efforts underway in the Hive community to increase its speed. Facebook achieved these gains by bypassing the conventional MapReduce programming paradigm and creating a way to interact with data in HDFS, the Hadoop file system, directly. This and other optimizations at the Java Virtual Machine level allow Presto not only to execute queries faster, but also to use other stores for data. This extensibility allows Presto to query data stored in Cassandra, MySQL, or other repositories. In other words, Presto can become a query aggregation point, that is, a query processor that can bring data from many repositories together in one query.

In June 2015, Teradata announced a full embrace of Presto. Teradata would add developers to the project, add missing features both as open source and as proprietary extensions, and provide enterprise-grade support. This move was the next step in Teradata’s effort to bring open source into its ecosystem. The Teradata Unified Data Architecture provides a model for how traditional data warehouses and big data repositories can work together. Teradata has supported integration of open source first through partnerships with open source Hadoop vendors such as Hortonworks, Cloudera, and MapR, and now through participation in an ongoing open source project.

Teradata’s embrace of Presto provided its customers with a powerful combination. Through Teradata QueryGrid, analysts can use the Teradata Data Warehouse as a query aggregation point and gather data from Hadoop systems, other SQL systems, and Presto. The queries in Presto can aggregate data from Hadoop, but also from Cassandra and other systems. This is a powerful capability that enables Teradata’s Unified Data Architecture to enable data access across a broad spectrum of big data platforms.

To provide Presto support for mainstream BI tools required two things: ANSI SQL support and ODBC/JDBC drivers. Much of the world of BI access works through BI toolsets that understand ANSI SQL. A tool like QlikView, MicroStrategy, or Tableau allows a user to easily query large datasets as well as visualize the data without having to hand-write SQL statements, opening up the world of data access and data analysis to a larger number of users. Having robust BI tool support is critical for broader adoption of Presto within the enterprise.

For this reason, ANSI SQL support is crucial to making the integration and use of BI tools easy. Many of the other SQL on Hadoop projects are limited in SQL support or utilize proprietary SQL “like” languages. Presto is not one of them. To meet the needs of Facebook, SQL support had to be strong and conform to ANSI standards, and Teradata’s joining the project will make the scope and support of SQL by Presto stronger still.

The main way that BI tools connect and interact with databases and query engines is through ODBC/JDBC drivers. For the tools to communicate well and perform well, these drivers have to be solid and enterprise class. That’s what yesterday’s announcement is all about.

Teradata has listened to the needs of the Presto community and accelerated its plans for adding enterprise-grade ODBC/JDBC support to Presto. In December, Teradata will make available a free, enterprise class, fully supported ODBC driver, with a JDBC driver to follow in Q1 2016. Both will be available for download on Teradata.com.

With ODBC/JDBC drivers in place and the ANSI SQL support that Presto offers, anyone using modern BI tools can access data in Hadoop through Presto. Of course, certification of the tools will be necessary for full functionality to be available, but with the drivers in place, access is possible. Existing users of Presto, such as Netflix, are extremely happy with the announcement. As Kurt Brown, Director, Data Platform at Netflix put it, “Presto is a key technology in the Netflix big data platform. One big challenge has been the absence of enterprise-grade ODBC and JDBC drivers. We think it’s great that Teradata has decided to accelerate their plans and deliver this feature this year.”

Leave a Reply

Your email address will not be published. Required fields are marked *


*