Care and Feeding of the Hadoop Elephant

By | July 25, 2013

As an active supporter of animal rescue organizations, I know full well how the nominal cost of getting a puppy – or most any other pet, for that matter – can be quickly overshadowed by the considerable time, money and effort that go into care and feeding once you bring your new family member home. Indeed, a good portion of shelter animals I’ve met got there after owners came to realize how the cost of acquisition and the cost of ownership are two very different things.

Scott’s dog Frankie.

That’s why the term “free like a puppy” has special resonance for me when I hear it used to describe how some open source software can seem alluringly affordable, until the other shoe drops as IT professionals realize the effort, complexity and dollars involved in implementing and maintaining the resource.  While the sky is the limit when it comes to my dog, IT professionals and executives don’t have the same latitude when dealing with their software implementations. The phrase came to mind most recently when I received a message earlier this month asking my response to some claims that open source Hadoop is poised to replace the data warehouse.

My answer involves both the opportunity and the limitations of open source stacks like Hadoop and the importance of continually optimizing deployments for the best total cost.   In the analytics space, our own customers and all successful enterprise-level data managers know it is not enough to simply store the data; true insights come from an elegant and ongoing process of gauging data value, temperature and relevance and positioning workloads accordingly and economically.  And it is important to offer the right solutions at the right total cost.

As I’ve said before in assessing Hadoop, this software stack is a wonderful innovation and addition to the landscape that can cheaply store massive amounts of structured or un-structured data. Hadoop and the extended stack are a great set of tools and technologies but they do not amount to a database or an application, and the capabilities don’t cover the entire spectrum of requirements for data management and real time analytic delivery. At the enterprise level this involves security, governance, data cleansing and other technical capabilities that Hadoop just doesn’t have. Indeed, the practical costs of deploying only Hadoop in this role turn out to be higher than with other architectural approaches.  It all comes down to the right tool for the right job at the right cost.  And flexibility and integration are foundational requirements.

Industry trends show data storage capacity is doubling every two years while infrastructure costs are going down. Thankfully this evolution is a tide that is lifting all boats, and the data warehouse community is by no means standing still.  This trend shows up in Teradata’s own improved costs and performance, which includes benefits from this rising tide as well as additional software enhancements.  Teradata is now delivering the same performance at half the price compared with just three years ago.  Despite this rapid and ongoing improvement, the demand for enterprise-level analytics continues to grow at a faster pace—partially fueled by increased affordability.

A lot of the credit goes to our Workload-Specific Platform Family, with scalable options ranging from entry level to active enterprise class solutions to meet a broad range of business and technical needs. And the recent release of Teradata Intelligent Memory reflects our ongoing commitment to optimizing storage of temperature-specific data. Our continued success comes not from rejecting Hadoop, but rather incorporating it along with Teradata Database and Teradata Aster Discovery Platform into our “best of breed” Unified Data Architecture platform.  With the help of our partner Hortonworks, we are working to solidify Hadoop as a valuable component in a seamless architecture to manage data of varied schema and at various stages in the data pipeline.

I’m sharing a lot of Teradata links with you for a reason. It’s to show that our three decade history of success and expertise in the data warehousing industry is built on a nimble and proactive embrace of new technologies. We never stand still, and we will continue to take full advantage of the latest trends and innovations…including Hadoop.

Scott Gnau

Leave a Reply

Your email address will not be published. Required fields are marked *