Without question, the best price per terabyte anywhere in the technology industry is the home PC. You can get a Dell® PC at about $400 and it comes with a terabyte disk drive. WOW! I found one PC for $319 per TB! Teradata, Oracle, IBM, and all the other vendors are headed for the scrap heap of history with those kinds of prices. I’m sending out my resume in the morning. . . How silly is that? Yet when comparing massively parallel database computers – the culmination of 50 years of data processing innovation-- many organizations overemphasize $/TB and disregard total value. They hammer the vendors to lower the price, lower the price, until – you guessed it – the vendors hit the right price by also lowering the value. This reached a crescendo over the last few years following the worldwide recession. Saving money became much more important than producing business value. I get it – a corporation runs on cost containment and revenue generation. As it turns out, a data warehouse is a vital tool enabling both business objectives – especially in hard economic times.
I understand why CFOs and procurement people obsess on dollars per terabyte. They can’t understand all the technical geek-speak but they do know that hollering about cost per terabyte makes vendors and CIOs scramble. OK, that seems worthwhile but there is a flaw in this thinking when $/TB is the first and foremost buying criteria.
By analogy, would you buy a car based on price alone? No. Even if you are strapped for money, you search for features and value in the collection of cars that are affordable. Price is one decision point, not THE decision maker. I always buy a little beyond my means to get the highest quality. Purchase price is a point in time angst but I have to live with that car for years. It’s never failed me and I am always satisfied years later.
$/TB as Proxy for All the Value
System price is crucial at the beginning of a purchasing process to select candidates, and again at the end when real money is being exchanged. In between, there is often an assumption that candidate systems can all do the same job. Well, no two vendor systems are identical, especially massively parallel data warehouses. Indeed, they vary dramatically. But let’s assume for a moment that two vendor products are 80% equivalent in the workloads they can do and the labor it takes to manage them.
What is always lost in these comparisons is the actual performance of the queries as measured at the business user’s desk. Massively parallel databases are highly differentiated. Some are quite slow when compared to others. Some are lightning fast on table scans then choke when complex joins are needed. Some can only handle a dozen users at a time. Many flounder running mixed workloads. Some are good enough at simple queries on simple database designs, but collapse when complex queries are required. If you are just starting out, simple queries may be OK. But to become an analytic competitor, really complex queries are inevitably de rigueur. Plus, any successful analytic database project will see major expansions of user demands and query complexity over the first 3-5 years, then incremental after that. Or is it the other way around --top quality analytic databases encourage users to ask more complex questions? Hmmm.
Performance Performance Performance
The primary purpose of databases has always been performance, performance, performance. Number two is high availability since performance is uninteresting when the system is offline. Over-emphasizing cost per terabyte drives out the value of performance. But if the buyer wants vendors to optimize for cost per terabyte, query performance and software features will be reduced to meet that goal.
This means having employees do extra work since the system is no longer doing it. This means user productivity and accuracy is reduced as dozens of data warehouse users take extra minutes to do what could have been done in seconds. It means not rerunning an analysis four times to improve accuracy because it takes too long. It means users interact less with the data and get fewer brilliant insights because it takes too long to try out ideas. And it means not getting that rush report to the executives by 11AM when they ask for it at 10:40. All of this angst is hard to measure but the business user surely feels it.
The better metric has always been price/performance. Let me suggest an even more rounded (wink) view of buying criteria and priority:
---No, today is not the day to delve deeply into the percentages on this chart. But suffice it to say they are derived from analyst house research and other sources I’ve witnessed over the years. And yes they vary a few percentage points for every organization. Instead of price, TCO is dramatically more important to the CIO and CFO “who has to live with this car for years.” Performance is vital to the business user – cut this back and you might ask “why pretend to have an analytic database since users will avoid running queries?” Features and functions are something the programmers and DBAs love and should not be overlooked.
Teradata – the Low Price Leader?
Changes in supplier costs and price pressures from the recent recession are producing bargains for data warehouse buyers. Take a look at Teradata list prices from 2Q2014.
Each Teradata platform described above includes Teradata quality hardware, the Teradata Database, utilities, and storage using uncompressed data. These are list prices so let the negotiations begin! With $3.8K per terabyte, anyone can afford Teradata quality now.
Obviously you noticed the $34K/terabyte systems. Need I say that these are the most robust, highest performing systems in the data warehouse market? Both Gartner’s Magic Quadrant and Forrester’s Data Warehouse Wave assessments rate Teradata the top data warehouse vendor as of 1Q14. These systems support large user populations, millions of concurrent queries per day, integrated data, sub-second response time on many queries, row level security, and dozens of applications per system. The Active Enterprise Data Warehouse is the top of the line with solid state disks, the fastest configuration, capacity on demand, and many other upscale capabilities. The Integrated Big Data Platform is plenty fast but not in the same class as the Active Enterprise Data Warehouse. There are a dozen great use cases for this cost conscious machine but 500 users with enormously complex queries won’t work on smaller configurations. But it quickly pays for itself.
Chant: Dollars per Terabyte, Dollars per Terabyte ...
The primary value proposition on the lips of the NoSQL and Hadoop vendors is always “cost per terabyte.” This is common with new products in new markets – we’ve heard it before from multiple of startup MPP vendors. It’s impossible to charge top dollar for release 1.0 or 2.0 since they are still fairly incomplete. So when you have little in the way of differentiated value, dollars per terabyte is the chant. But is five year old open source software really equivalent to 30 years of R&D investment in relational database performance? Not.
I looked at InformationWeek’s article on “10 Hadoop Hardware Leaders” (4/24/2014) which includes the Dell R720XD servers as a leader in Hadoop hardware. Pricing out an R720XD on the Dell website, I found a server with 128GB of memory and twelve 1.2TB disks comes in at $15,276. That’s $1060 per terabyte. Cool. However, Hadoop needs two replicas of all data to provide basic high availability. That means you need to buy three nodes. This makes the cost per terabyte $3182. Then you add some free software and lots of do-it-yourself labor. Seems to me that puts it in the same price band as the Integrated Big Data Platform. But the software on that machine is the same Teradata Database running on the Active Enterprise Data Warehouse. Sounds like a bargain to me!
Over reliance on $/TB does bad things to your business user’s productivity. Startups always make this a gut wrenching issue for customers to solve but as their products mature, that noise fades into the background. I recommend a well-rounded assessment of any vendor product that serves many business users and needs.
Ok, so now, I’m hooking up 50 terabytes of storage to my whiz bang 3.6Ghz Intel® home office Dell PC. I’m anxious to know how long it will take to scan and sort 20 terabytes. I’ll let you know tomorrow, or the next day, or whenever it finishes.
Dan Graham is responsible for strategy, go-to-market success, and competitive differentiation for the Active Data Warehouse platform and Extreme Performance Appliances at Teradata.
Latest posts by Dan Graham (see all)
- Data Lake Best and Worst Practices - November 10, 2014
- MongoDB and Teradata QueryGrid – Even Better Together - June 19, 2014
- How $illy is Cost per Terabyte? - May 16, 2014