Despite a continuing shortage of data science skills, data-driven teams do exist in businesses across many industries. Expectations are high and the promises of predictive analytics, prescriptive analytics and artificial intelligence (AI) has captured the imagination of many. Now that the role, the skill-sets and the responsibilities of data science are becoming better defined, how do we take the next step and show a true return on investment (ROI) for data science projects? How can we make positive, measurable improvements to revenue, profit and customer satisfaction using data science?
The answer could be in the automation of the production process. To this end, analytics ops is an emerging field within data science. Taking the best practices from software engineering and adapting them to the process of pushing predictive models into production in an automated way, analytics ops can free up valuable data science resources for the work they are best at: discovering new patterns in data, improving existing data models and building newer, more accurate ones.
In order to build and maintain hundreds – or even thousands – of predictive analytical models, the operationalisation of analytical models requires a repeatable process on an industrial scale. In addition, there is a requirement for a reliable architecture and robust pipelines to deploy predictive analytical models in production systems.
The critical ingredients you need within your organisation are:
- Experimentation – encourage fast prototyping
- Flexibility – carefully choose which toolkits you use
- Iteration – ideally you work within an Agile framework
- Production focus – everyone in the team works towards production
- Process engineering – integrate into or create new production procedures
Analytics ops – where to begin?
Creating the correct process and selecting the best software tools to enable automation is the key to the analytics ops process. Many of the following are already standard practice in a software engineering environment, but these standards and protocols need to be applied to the production and development of the data models that businesses increasingly come to rely on.
- A collaborative development environment tightly integrated with the data platform and operational environment allows code re-use and efficiency
- Strict version control, used to flag models currently in production and allow fast roll-back to previous version if problems occur
- Testing automation, including unit tests for programme code and accuracy tests for predictive models. Every model should be run through a set of standard tests in an automated way
- Standardised processes for promoting models from development (test) to production, and comparing new models with those already in production
- Trained models should be published to a central repository where they can be easily retrieved and compared
- A central logging and monitoring system to ensure KPIs for performance and accuracy are always met, flagging under-performing models for investigation
Is analytics ops the future?
Businesses can achieve striking productivity gains using an analytics ops approach to data science work. Using an analytics ops approach, you can say goodbye to the inefficient days of lone developers creating scripts on their laptops, maintaining production code manually with a method based on continuous integration, continuous development.
Integration and automation can bring incredible benefits to your business. Think Big Analytics has helped elevate many customers on their analytics journey, and can help with the process engineering and tool selection to make analytics ops a reality. Get in touch to see what we can do for you. FIND OUT MORE.
Christopher Hillman is a Principal Data Scientist in the International Advanced Analytics team at Teradata basedÊin London. He has over 20 years experience working with analytics across many industries including Retail, Finance, Telecoms and Manufacturing. Chris is involved in the pre-sale and start-up activities of Analytics projects helping customers to gain value from and understand Advanced Analytics and Machine Learning. He has spoken on Data Science and analytics at Teradata events such as Universe and Partners and also industry events such as Strata, Hadoop World, Flink Forward and IEEE Big data conferences. Currently Chris is also studying part-time for a PhD in Data Science at the University of Dundee applying Big Data analytics to the data produced from experimentation into the Human Proteome.