It may be a stretch to call data science commonplace, but the question “what’s next” is often heard with regard to analytics. And then the conversation often turns straight to Artificial Intelligence and deep learning. Instead, a tough love review of the current reality may be in order.
The simple truth is that, as currently configured, data-centric companies will struggle to cross the divide between what is currently considered effective data science and a modality where analytics is an inherent part of the fundamental fabric of business operations that benefits from continuous improvement. Today data science is all too often a process where new insights and models get developed as a one-time effort or deployed to production on an ad hoc basis, and require regular babysitting for monitoring and updating.
This is not to imply that companies are not on the right path with their data science initiatives, but merely acknowledgement that the steps they have taken thus far have brought them to the edge of a chasm they will have to cross. To the credit of more progressive organizations, creating an industrial-caliber data lake to store a lot of data of varying forms is an essential, foundational step. Building on that, the development of systems of data democratization that provide ready access to data for those seeking insight is critical. There’s no doubt that companies that have achieved those two steps already reap benefits.
Nevertheless, that’s as far as most have come and, more significant for the future, that’s all they have prepared themselves to accomplish. Today many companies have the data and data scientists who are equipped to do analysis and build models that can be carefully engineered to plug into some usable business application. But every deployment of a model is a custom, fragile, one-off job and ensuring quality of models is done as a fragile, manual effort. Change the model and the whole thing needs to be rebuilt. Often useful analyses are often performed once but can’t be reproduced or even worse get recreated periodically but inconsistently. And if a new version model doesn’t work well it can be a painful struggle to restore a previous version, let alone have systematic testing of models to continuously improve them.
It’s not enough to know how to wrestle with raw data. Companies need an infrastructure capable of continuously testing and improving models, starting with governed understood analytic data sets as input. This is an environment in which normalized data lets one do any kind of data science at any time.
It’s been done before. Something similar took place in applications development and IT with the notion of DevOps, where the disparate realms of software engineers and IT operations staff now collaborate on a single process of software creation and deployment.
That same type of dexterity will be essential not only for future data-driven opportunities like AI to become reality in business, but it is critical to realizing ROI right now in today’s data environment. A company’s data science team may excel at finding the right signal in the data and can apply those findings to a process, but they are not equipped to maintain that data product once it is released in the wild. IT engineers expect something that is more refined and ready to deploy. Between the two is a gap.
What’s missing is the internalization of a new business discipline—analytics ops—that turns analytics from a lab-coated, sequestered science experiment into a consistent methodology for integrating data science teams, engineering teams, and a framework for building analytics models into something that it is readily and continually digestible at an operations level.
Analytics Ops is the difference between focusing on resource-intensive one-off victories and having a constant, adaptable source of nourishment. To get there, companies will require cross-functional teams with the right software and discipline to enable data scientists, engineers, product managers, and domain experts to all work together to create a continuous cycle that drives value to the business.
This next step starts by balancing spending and organization development so that there is some level of investment in Analytics Ops to bridge the divide from data science to IT engineering. Without this forward-thinking approach, companies are going to end up with really interesting analytics projects that work for a time, but eventually wither, become less relevant, and cannot evolve. Most frustrating of all, companies will not get the ultimate return in terms of implementation and deployment that they expect from their analytics investment.
The next step forward in analytics is not going to be driven by data scientists alone. It requires an investment in skills, practices and supporting technology to move analytics out of the lab and into the business. Analytics Ops will involve a conscious decision to continuously integrate, test, deploy, monitor and adapt analytics within an uninterrupted, ongoing cycle of improvement. Analytics, no matter how sophisticated, needs to be seen not as a project with an end, but something that is an integral part of the framework of the entire operation.
Ron Bodkin, President, Think Big Analytics
Ron founded Think Big to help companies realize measurable value from Big Data. Previously, Ron was VP Engineering at Quantcast where he led the data science and engineer teams that pioneered the use of Hadoop and NoSQL for batch and real-time decision making. Prior to that, Ron was Founder of New Aspects, which provided enterprise consulting for Aspect-oriented programming. Ron was also Co-Founder and CTO of B2B applications provider C-Bridge, which he led to a staff of 900 service consultants and a successful IPO. Ron graduated with honors from McGill University with a B.S. in Math and Computer Science. Ron also earned his Master’s Degree in Computer Science from MIT, leaving his PhD program after presenting the idea for C-bridge and placing in the finals of the 50k Entrepreneurship Award