Bringing Open-Source Presto to the Enterprise

By | June 8, 2015

Teradata is making Presto enterprise-ready so global powerhouses can be as innovative as elite big data practitioners including organizations like Facebook, Netflix, Airbnb, Dropbox and Groupon.

By Justin Borgman 

Today, I’m proud to announce that Teradata has joined the open source community for Presto, the open-source distributed SQL query engine used by big data innovators to run interactive analytic queries against data lakes ranging in size from gigabytes to petabytes.

With this announcement, we are committed to making Presto enterprise-class with a multi-year roadmap of 100% open-source contributions, as well as the industry’s first commercial support offering for Presto deployments.

Facebook originally created Presto in 2012 and today uses it as their primary tool for interactive queries against internal data stores, including their 300PB Hadoop cluster. It’s reported that more than 1,000 Facebook employees use Presto daily to run more than 30,000 queries, scanning more than a petabyte each per day.   Needless to say, that is some impressive scale testing!  But Facebook isn’t the only one using Presto in production.  Dozens of other leading internet innovators like Netflix, Airbnb, Dropbox and Groupon have joined the Presto community, using Presto as an integral part of their analytics ecosystem.

What makes Presto unique? For starters, Presto was developed to satisfy response times ranging from sub-second to minutes, so it’s fast.  But beyond performance and its history being run at scale by Internet giants, part of what makes it so appealing is that Presto is not merely SQL for Hadoop.  Presto is actually SQL for all the data platforms in your data lake.  The list of platforms is ever-growing but already includes MySQL, Postgres, Kafka, and Cassandra.  This means that Presto allows querying data where it lives. A single query can combine data from multiple sources, allowing for analytics across your entire organization.

Furthermore, because Presto runs on any distribution of Hadoop, users don’t have to worry about being locked in to one Hadoop stack or another.  If you build your SQL application on top of Presto, and later decide to switch to a different underlying distribution of Hadoop, no problem – your work is now portable!

However, while we are enormous fans of Presto, we need to be honest with ourselves:  There is still a lot more work to be done.  That is precisely why we are announcing a multi-phased roadmap to enhance Presto with 100% open source contributions designed to improve SQL support, BI tool compatibility, manageability, and a whole host of enterprise features.  We think we can make a big impact, because the gaps in Presto happen to be Teradata’s strengths.  Needless to say, we’ve been building SQL engines for a very long time…and helping our customers become data-driven to gain the competitive advantage.

data points presto

Above is an illustration that shows our plan for advancing Presto over the next 18 months.  Phase 1, available today at is all about making Presto more accessible for enterprise users to experiment with.  We’ve built a very simple-to-use installer that will get Presto up and running on your Hadoop cluster in a matter of minutes.  Don’t have a Hadoop cluster lying around?  No problem, we’ve also made Presto available via a self-contained VM as well, so you can run a sandbox on your personal machine and try it out.  In addition, we have built some basic monitoring and management capabilities via command line so you can operate your cluster more easily.  And we’ve written a lot more documentation, including a quick-start guide.

Phase 2, coming up in 6 short months, includes integration with existing cluster management tools, starting with Ambari, as well as YARN integration for better resource management on your YARN-enabled Hadoop cluster.  And finally, with Phase 3, we are making major efforts to make Presto ready for BI tools, thereby enabling greater enterprise-wide adoption.

Who is doing all this work?  The SQL-on-Hadoop pioneers from Hadapt, acquired by Teradata last summer.  Since that time, Hadapt has become the Teradata Center for Hadoop, responsible for the company’s portfolio of Hadoop-related products, including this major commitment to Presto.  In fact, 16 former Hadapters are now contributors to the Presto project, committing both their expertise and Hadapt intellectual property for the benefit of the open source community.

In addition to accelerating the Presto roadmap alongside the other members of the Presto community, Teradata’s new support offering gives enterprise users the confidence and peace of mind to deploy Presto in production. There is great security in knowing that Teradata, the industry leader in data warehousing and analytics, is standing by, ready to help.  Our certified releases are carefully selected stable releases of Presto that are then tested and verified by over 300 additional Teradata-developed, end-to-end system tests.  After patches and fixes, this release becomes a pre-built RPM, allowing for easy installation.  All are available for download at

I encourage you to try out Presto today, and put the technology of internet giants to work for you.

Teradata – from data warehouses to data lakes, we’ve got you covered.

Justin Borgman, Teradata VP and GMJustin Borgman is vice president and general manager of the Teradata Center for Hadoop, where he is responsible for the company’s portfolio of Hadoop products.  Prior to joining Teradata, Justin was co-founder and CEO of Hadapt, the pioneering “SQL-on-Hadoop” company that democratized Big Data Analytics for the enterprise.  Prior to Hadapt, Justin led product development for COVECTRA, an anti-counterfeit technology firm.  He began his career as a software developer at MIT Lincoln Laboratory and Raytheon. Justin earned a BS in Computer Science from the University of Massachusetts at Amherst, where he was a Commonwealth Scholar, and an MBA from the Yale School of Management.

Leave a Reply

Your email address will not be published. Required fields are marked *