If your organisation is looking to spend tens of millions of dollars on a big data project in the next 12 months, then maybe you should read on and save a few dollars.
Open-source analytics is absolutely booming at the moment with so many options coming into the market that I could spend most of my day looking at new offerings, trying to decipher where it fits into an architecture and why it is better than another approach. By the time I write this article I’m sure there will be another release of a new product that I will have to read about and synthesise.
But the trend I see in the market at the moment is that effectively gone are the days of the multi-million dollar analytics project which consumes vast volumes of an organisations IT budget (Just think of how much IBM Watson costs to build and train?) instead there are a lot more projects utilising open source platforms. Case in point is the obvious one Hadoop.
However putting Hadoop aside for a moment, there’s a vast array of open source or semi open-source technologies such as R, Lumify & Ikanow right through to cloud based analytics. Now some may not provide the full all-encompassing array of features and functions like Aster they will however provide you with 70-80% of what you actually need.
Then the next big question is support. Yes you take a risk in running open-source because you then need to consider a Total Cost of Ownership. However that’s why tech companies such as Hortonworks, Cloudera and RevolutionR (Now Microsoft) provide Enterprise strength editions of open source based technologies. The openness and community based approach of open source software combined with that peace of mind with regards to support and maintenance. Add to this Teradata’s offering in the Hadoop space by providing Hortonworks on top of Enterprise class Hardware and you have a very cost effective big data platform.
Thus we begin to see an analytics framework using a mix of technologies from open source to enterprise class all working together harmoniously. And a fabric of platforms that makes use of the hardware and software for which it was designed to run analytics on. Hadoop is great for large scale analytics especially across unstructured data sets, but not so good for predictive analytics which is where the discovery platforms and data warehouses play.
An analytics project is a mix of open-source & Enterprise class. It should not be expensive, but it should deliver on your investment nonetheless.
Finally you have your resources to consider. I mentioned earlier that open source technologies are booming and one example is R. Take a look at the Google scholar results from over the years:
What the above graph shows is that R and Stata are growing whilst traditional tools such as SAS and SPSS have hit their peak and declining. Often is the case the output of university educated graduates in the data analytics field nowadays have a strong view of using R. This obviously spells success for the R language over the longer term.
In summary, stop and take a look at what your organisation is spending first and consider a different mix of technologies to achieve the same result. Don’t put all your eggs into one technology basket, because if and when your $100M analytics project fails, you’ll be the one with egg on your face.
Ben Davis is a Senior Architect for Teradata Australia based in Canberra. With 18 years of experience in consulting, sales and technical data management roles, he has worked with some of the largest Australian organisations in developing comprehensive data management strategies. He holds a Degree in Law, A post graduate Masters in Business and Technology and is currently finishing his PhD in Information Technology with a thesis in executing large scale algorithms within cloud environments.
Latest posts by Ben Davis (see all)
- Mastering colours in your data visualisations - March 8, 2017
- Spotting the pretenders in Data Science - February 15, 2017
- Leveraging all Data in a Government/Client Engagement - November 15, 2016
- Can we defeat DDoS using analytics? - August 15, 2016
- The pitfalls of DIY Hadoop - August 8, 2016