When most people hear the term “Big Data” they envisage a data centre full of servers, all happily parallel-processing the world’s most important problems (like the data centre in the picture below: analysing particle collisions at the Large Hadron Collider at CERN).
Above: Data Centre at CERN
Well, the whole Big Data thing started with Google and was quickly adopted by similar companies with high-volume requirements (like Facebook and Yahoo) so no wonder the image in our mind is of ginormous data volumes being crunched by ginormous pools of computers.
But these days the technology behind Big Data is quickly becoming mainstream. Yes, not all the bugs have been ironed-out and it is still quite “clunky” when compared with mature technologies, but the adoption of Big Data technology is increasing even for crunching smaller problems. There are several reasons for this:
- Much of the software is developed by companies who use the software internally before releasing it to the public domain. By then it is highly functional and well tested. See for example Presto: developed originally by Facebook for their own use, it is now continuing development (by Facebook and Teradata) in the open domain
- The cost of the software is close enough to zero (at least in the pilot stage …)
- Running on a large number of parallel computers, these solutions are highly scalable. It is very easy to start small and grow quickly
- Finally, let’s admit it, the hype around Big Data attracts technologists to try-out these ‘cool’ new software gadgets
I recently worked with a company that needs to make real-time data available both internally and externally. The volumes are not high: thousands of events happen every day. They could buy an off-the-shelf streaming solution for a lot of money or develop an end-to-end solution based on Spark, Kafka and Hive. What’s more, they can speed up development and reduce maintenance costs by using Listener which envelopes Kafka, Cassandra, Elastic Search and Mesos thus deploying real-time streams with very little programming and in a very short time frame.
An arguably extreme example of using Big Data on Small Data is doing Social Network Analysis on a group of dolphins in Doubtful Sound (one of the fjords) in New Zealand. The analysis shows that the network is scale-free and illuminates other fascinating characteristics of this very small (64 individuals) group.
Above: Dolphins at Doubtful Sounds, NZ
Big Data is, therefore, no longer the domain of the Big Players: the technology is quickly getting acceptance and being adopted by medium and small players.
So, what is the smallest project using Big Data technologies you know of?
For a different perspective on this, see my current article titled “Is Big Data Getting Bigger”.
Ben Bor is a Senior Solutions Architect at Teradata ANZ, specialist in maximising the value of enterprise data. He gained international experience on projects in Europe, America, Asia and Australia. Ben has over 30 years’ experience in the IT industry. Prior to joining Teradata, Ben worked for international consultancies for about 15 years and for international banks before that. Connect with Ben Bor via Linkedin.