The Role of Domain Experts in Data Science

Monday March 23rd, 2015

During my 30 years of analytics career, prospective employers and clients have often asked me: ‘How can you help us with data-driven insights when you have not worked in this industry before? ‘.

Clearly, the description of data scientist as the mythical unicorn who has computer science skills, statistical knowledge and domain expertise (Figure 1) has had an impact. The proliferation of different analytics disciplines such as social network analysis, digital analytics, bio-informatics and supply chain analytics, lends weight to the argument that domain expertise definitely matters.

figure1

Source: Drew Conways Data Science Venn Diagram

There are also anecdotes on the web of data science projects that went pear shaped because the analysts were not subject matter experts. A deeper look into these anecdotes reveals that the issues are not due to a lack of domain expertise, but due to poor data science such as over-fitting of data, bad sampling methods and unnecessary data cleansing. Still the myth that domain expertise trumps all else continues!!

Data mining competitions such as Kaggle and KDD have demonstrated the opposite and shown how data science can be successfully outsourced to people without domain expertise. Many companies have run competitions on such diverse topics as optimizing flight routes, predicting ocean health and diabetic retinopathy detection. Data scientists with little or no expertise in the domain have responded brilliantly with useful solutions. Adam Kowalczyk and I won the KDD Cup on yeast gene regulation prediction with no background in biology. Some data scientists, such as David Vogel and Claudia Perlisch, have even won across multiple domains, indicating that data science skills are transferable across domains.

The counter argument to Kaggle’s success is that in these competitions, the domain experts have already generated the hypothesis by posing the right business question and preparing the data (Figure 2), and the competitors need only model and test. But, in the brave new world of massive data along with the mathematical tools and computing power to crunch these numbers, old world paradigm of hypothesizing before modeling is likely to be challenged. Google has shown a whole new way of understanding the world without any a priori models or theories with their approach to language learning.

figure2

Source: Dr. Bhavani Raskutti, Data Mining Lead, Pacific Brands, “Data Mining in Industry: Putting Theory into Practice”, guest lecture Royal Melbourne Institute of Technology, 2011.

So, if domain expertise is not necessary for the steps of posing the business question and analytical problem definition, what about data acquisition and data preparation?

In my experience, domain knowledge about data capture and transformation processes at the sensors can be acquired through exploration of the raw data. Often, good data scientists become subject experts just by playing with the data and asking questions to domain experts about the data anomalies. For instance, using just such a process, my analytics team in a manufacturing company identified a long standing, but previously undiscovered anomaly in the summarised sales and inventory feed from a large retailer. This anomaly materially affected the retail inventory reporting and had to be fixed programmatically. Subsequently, my data science team members were the acknowledged retail supply chain experts!!

Domain expertise is most relevant, perhaps, in the interpretation of insights, particularly those insights gained using unsupervised learning about the workings of complex physical processes. An example of just such a situation was the use of Aster discovery platform to perform root cause analysis of failures in a multiple aircraft fleet from aircraft sensor and maintenance data. While the analysis started with no a priori model, a post priori interpretation of the results from the path analysis and the subsequent follow-up to improve aircraft safety certainly required domain expertise.

Returning back to the original question: ‘How can you help us with data-driven insights when you have not worked in this industry before? ‘, my response is as follows.

  1. Machine learning (the intersection of computer science and statistics in Figure 1) brings a fresh perspective that leads to new insights and no prior domain knowledge can potentially be advantageous, especially in overcoming long standing domain bias.
  2. Provided the machine learners have curiosity and willingness to learn about the company and domain along with the humility to ask the domain experts about the subject, they will not only understand the domain, but through their questioning they will cross-pollinate the subject matter experts so the team as a whole is stronger.

So, when hiring a data scientist, focus on the machine learning aspect, particularly, the desire to play with the data using a number of different techniques and languages. Consider also the analytical skills to question and solve problems iteratively. Partner the data scientists with domain experts so cross-pollination can occur. This, to me, is a better pathway for bringing data science to a business than searching for the elusive unicorn depicted in Figure 1.

Bhavani Raskutti is the Domain Lead for Advanced Analytics Teradata ANZ . She is responsible for identifying and developing analytics opportunities using Teradata Aster and Teradata’s analytics partner solutions. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital.

The following two tabs change content below.
avatar

Bhavani Raskutti

Domain Lead for Advanced Analytics at Teradata
Bhavani joined the ANZ Teradata Advanced Analytics team in 2014 and is responsible for identifying and developing analytics opportunities using Teradata Aster and Teradata’s analytics partner solutions. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital. Bhavani has over 20 years’ experience in advanced analytics research and development as well as application deployment in diverse industries such as telecommunications, banking, retail and bio-informatics. Bhavani’s work on innovative data analysis techniques within the Telstra Research Laboratories resulted in four text mining patents and 40+ peer-reviewed international publications. Bhavani’s accolades include being winner of the 2002 Knowledge Discovery & Data Mining (KDD) cup which is the premier international data mining competition pre-Kaggle.
Category: Bhavani Raskutti Tags: , ,
avatar

About Bhavani Raskutti

Bhavani joined the ANZ Teradata Advanced Analytics team in 2014 and is responsible for identifying and developing analytics opportunities using Teradata Aster and Teradata’s analytics partner solutions. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital. Bhavani has over 20 years’ experience in advanced analytics research and development as well as application deployment in diverse industries such as telecommunications, banking, retail and bio-informatics. Bhavani’s work on innovative data analysis techniques within the Telstra Research Laboratories resulted in four text mining patents and 40+ peer-reviewed international publications. Bhavani’s accolades include being winner of the 2002 Knowledge Discovery & Data Mining (KDD) cup which is the premier international data mining competition pre-Kaggle.

One thought on “The Role of Domain Experts in Data Science

  1. avatarMark O'Reilly

    I agree entirely – there is a fundamental issue with unconscious bias that comes with being too close to the business and thinking that “we know all there is to know about this – why are you here ?”

    That’s the most obvious sign of short sighted management I have ever seen and is one of the tell-take signs of a business that desperately needs a strong leader so that s/he may rescue the business from itself and its lack of understanding.

    Bravo Bhavani – keep it up.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


*