Recently, I developed a handful of demos using open source technologies for detecting and alerting fraudulent events, incidence of poor customer experience and arrival of target subjects in geo-fenced locations for marketing purposes. The use cases required detection of individual events from streaming data sources and processing complex set of rules for identifying events of interest to create alerts for enabling data-driven insights and actions.
I selected the Apache Hadoop technologies, namely, Kafka, Storm, HDFS and HBase as they were found to be the best fit for these use cases and the tools had been deployed in large scale operation by reputed multinational organisations. In addition, I found a vast array of pre-integrated libraries, examples of source code and “lessons learned” that were freely available on the Internet and therefore instrumental in improving my productivity.
When the rubber hit the road…
As I showed the demos to my colleagues and customers my decision to use these Big Data technology tools was put to test. Common questions raised during the demo were: “Why not use Apache Spark Streaming instead of Apache Storm?”; “What would you recommend as the open source technologies to bet on in our 3 year Big Data roadmap?”; “Why did you not use Apache Flink that integrates complex event processing, stream processing and machine learning”?
When all else fails – ask the experts
As Open Source technologies dominate the C-level agenda, questions such as these are not just limited to streaming and CEP technologies but expected to be widely common pre-requisite for developing Big Data architecture in the enterprise. Unfortunately, for every such question there are myriads of opinions with no new answer. With new Open Source projects unfolding every week it makes the task harder! So, I took advice from the experts on Big Data and Open Source, Think Big Analytics.
Find below my key takeaways from their advice.
There is no free lunch – the ‘free puppy’ still needs to be fed though!
Organisations fall into the trap of thinking that selecting and implementing open source is trivial or ‘free’. But open source is as free as a ‘free puppy’ – it comes with all kinds of hidden costs that keep popping up. Every technology comes at a cost that includes acquiring the skillsets to use it, develop for it, and maintain and operate it. Open source is not any different.
Tea leaves and the taste bud
Experimentation to compare open source technologies can open up opportunities, but there is not nearly enough time or resource in an organisation to try everything. While you may be able to install a large number of tools with the ease of few mouse clicks, determining their strengths and limitations can take weeks, even months. Have you acquired a taste for that tea yet?
After all you only need a wrench or two to fix the sink
There are more than a dozen query engines for SQL on Hadoop. You do not need that many and selecting the right one is crucial. The ideal approach is to adopt a relatively small number of technologies, be it for SQL engine or other big data technologies, and optimise how the organisation uses them to gain a return on investment. The shiny new wrench may look appealing but the guy who knew how to use it just left!
Stay the night or build your own blue print for long term living
Renting a room may be a good idea for a few nights of temporary stay here and there. But the idea of living in your dream home is quite different. It generally starts with a blue print and well defined plan and path to get there. The same is true for building a big data architecture for your enterprise that differentiates your organisation from competitors. After all you want your dream home to be different to Mr & Mrs Jones next door.
The problem with using the open source tool in response to an immediate need is that it is easy to end up with a multitude of tools that do not work well together. Instead of a “tool mentality,” organisations should take the approach of building a blueprint for big data. Your business objectives and requirements are critical to selecting technologies that meet your needs.
Anyway, these are just a few I picked up as I was venturing into the world of Open Source. If you are interested in gaining detailed understanding of the approach to selecting the right tool and technologies please download this paper.
Latest posts by Sundara Raman (see all)
- Making Smart City projects smarter with Smart Organisations - March 29, 2017
- IoT will accelerate industry convergence and structural disruption - October 25, 2016
- Internet of Things – Lessons from an IoT prototype project - August 22, 2016
- How Come NPS (Net Promoter Score) Data Doesn’t Rate Ben Affleck Movies? - August 17, 2016
- Which Open Source technologies are suitable for your Big Data roadmap? - June 27, 2016