Concurrent with acquiring Hadoop companies Hadapt and Revelytix last year, Teradata opened the Teradata Center for Hadoop in Boston. Teradata recently announced that a major new initiative of this Hadoop development center will include open-source contributions to a distributed SQL query engine called Presto. Presto was originally developed at Facebook, and is designed to run high performance, interactive queries against Big Data wherever it may live --- Hadoop, Cassandra, or traditional relational database systems.
Among those people who will be part of this initiative and contributing code to Presto include a subset of the Hadapt team that joined Teradata last year. In the following, we will dive deeper into the thinking behind this new initiative from the perspective of the Hadapt team. It is important to note upfront that Teradata’s interest in Presto, and the people contributing to the Presto codebase, extends beyond the Hadapt team that joined Teradata last year. Nonetheless, it is worthwhile to understand the technical reasoning behind the embrace of Presto from Teradata, even if it presents a localized view of the overall initiative.
Around seven years ago, Ashish Thusoo and his team at Facebook built the first SQL layer over Hadoop as part of a project called Hive. At its essence, Hive was a query translation layer over Hadoop: it received queries in a SQL-like language called Hive-QL, and transformed them into a set of MapReduce jobs over data stored in HDFS on a Hadoop cluster. Hive was truly the first project of its kind. However, since its focus was on query translation into the existing MapReduce query execution engine of Hadoop, it achieved tremendous scalability, but poor efficiency and performance, and ultimately led to a series of subsequent SQL-on-Hadoop solutions that claimed 100X speed-ups over Hive.
Hadapt was the first such SQL-on-Hadoop solution that claimed a 100X speed-up over Hive on certain types of queries. Hadapt was spun out of the HadoopDB research project from my team at Yale and was founded by a group of Yale graduates. The basic idea was to develop a hybrid system that is able to achieve the fault-tolerant scalability of the Hive MapReduce query execution engine while leveraging techniques from the parallel database system community to achieve high performance query processing.
The intention of HadoopDB/Hadapt was never to build its own query execution layer. The first version of Hadapt used a combination of PostgreSQL and MapReduce for distributed query execution. In particular, the query operators that could be run locally, without reliance on data located on other nodes in the cluster, were run using PostgreSQL’s query operator set (although Hadapt was written such that PostgreSQL could be replaced by any performant single-node database system). Meanwhile, query operators that required data exchange between multiple nodes in the cluster were run using Hadoop’s MapReduce engine.
Although Hadapt was 100X faster than Hive for long, complicated queries that involved hundreds of nodes, its reliance on Hadoop MapReduce for parts of query execution precluded sub-second response time for small, simple queries. Therefore, in 2012, Hadapt started to build a secondary query execution engine called “IQ” which was intended to be used for smaller queries. The idea was that all queries would be fed through a query-analyzer layer before execution. If the query was predicted to be long and complex, it would be fed through Hadapt’s original fault-tolerant MapReduce-based engine. However, if the query would complete in a few seconds or less, it would be fed to the IQ execution engine.
In 2013 Hadapt integrated IQ with Apache Tez in order avoid redundant programming efforts, since the primary goals of IQ and Tez were aligned. In particular, Tez was designed as an alternative to MapReduce that can achieve interactive performance for general data processing applications. Indeed, Hadapt was able to achieve interactive performance on a much wider-range of queries when leveraging Tez, than what it was able to achieve previously.
Figure 1: Intertwined Histories of SQL-on-Hadoop Technology
Unfortunately Tez was not quite a perfect fit as a query execution engine for Hadapt’s needs. The largest issue was that before shipping data over the network during distributed operators, Tez first writes this data to local disk. The overhead of writing this data to disk (especially when the size of the intermediate result set was large) precluded interactivity for a non-trivial subset of Hadapt’s query workload. A second problem is that the Hive query operators that are implemented over Tez use (by default) traditional Volcano-style row-by-row iteration. In other words, a single function-invocation for a query operator would process just a single database record. This resulted in a larger number of function calls required to process a large dataset, and poor instruction cache locality as the instructions associated with a particular operator were repeatedly reloaded into the instruction cache for each function invocation. Although Hive and Tez have started to alleviate this issue with the recent introduction of vectorized operators, Hadapt still found that query plans involving joins or SQL functions would fall back to row-by-row iteration.
The Hadapt team therefore decided to refocus its query execution strategy (for the interactive query part of Hadapt’s engine) to Presto, which presented several advantages over Tez. First, Presto pipelines data between distributed query operators directly, without writing to local disk, significantly improving performance for network-intensive queries. Second, Presto query operators are vectorized by default, thereby improving CPU efficiency and instruction cache locality. Third, Presto dynamically compiles selective query operators to byte code, which lets the JVM optimize and generate native machine code. Fourth, it uses direct memory management, thereby avoiding Java object allocations, its heap memory overhead and garbage collection pauses. Overall, Presto is a very advanced piece of software, and very much in line with Hadapt’s goal of leveraging as many techniques from modern parallel database system architecture as possible.
The Teradata Center for Hadoop has thus fully embraced Presto as the core part of its technology strategy for the execution of interactive queries over Hadoop. Consequently, it made logical sense for Teradata to take its involvement in the Presto to the next level. Furthermore, Hadoop is fundamentally an open source project, and in order to become a significant player in the Hadoop ecosystem, Teradata needs to contribute meaningful and important code to the open source community. Teradata’s recent acquisition of Think Big serves as further motivation for such contributions.
Therefore Teradata has announced that it is committed to making open source contributions to Presto, and has allocated substantial resources to doing so. Presto is already used by Silicon Valley stalwarts Facebook, AirBnB, NetFlix, DropBox, and Groupon. However, Presto’s enterprise adoption outside of silicon valley remains small. Part of the reason for this is that ease-of-use and enterprise features that are typically associated with modern commercial database systems are not fully available with Presto. Missing are an out-of the-box simple-to-use installer, database monitoring and administration tools, and third-party integrations. Therefore, Teradata’s initial contributions will focus in these areas, with the goal of bridging the gap to getting Presto widely deployed in traditional enterprise applications. This will hopefully lead to more contributors and momentum for Presto.
For now, Teradata’s new commitments to open source contributions in the Hadoop ecosystem are focused on Presto. Teradata’s commitment to Presto and its commitment to making meaningful contributions to an open source project is an exciting development. It will likely have a significant impact on enterprise-adoption of Presto. Hopefully, Presto will become a widely used open source parallel query execution engine --- not just within the Hadoop community, but due to the generality of its design and its storage layer agnosticism, for relational data stored anywhere.
Daniel Abadi is an Associate Professor at Yale University, founder of Hadapt, and a Teradata employee following the recent acquisition. He does research primarily in database system architecture and implementation. He received a Ph.D. from MIT and a M.Phil from Cambridge. He is best known for his research in column-store database systems (the C-Store project, which was commercialized by Vertica), high performance transactional systems (the H-Store project, commercialized by VoltDB), and Hadapt (acquired by Teradata). http://twitter.com/#!/daniel_abadi.