If you’ve heard of data science (if you haven’t, where have you been and how did you find this blog?), you’ve probably heard of “fail fast”. The fail fast mentality is based on the notion that if an activity isn’t going to work, you should find out as quickly as possible, and stop doing it.
As the size, complexity and number of new data sources continues to increase, there is a corresponding increase in the value of discovery analytics. Discovery analytics is the method by which we uncover patterns in data and develop new use cases that lead to business value.
It is easy to see how discovery activities lead to a fail fast method. However, how can we learn from these failures, and how can we proceed without experiencing the same failures time and again?
Good failure, bad failure
There are two different types of failure possible in a data science project: good failures and bad failures. Good failures are a necessary part of the discovery process, and an important step in finding value in data. On the other hand, bad failures occur when they could have been avoided, and are basically of waste of everybody’s time. Examples of the cause of bad failures include:
- Poor specification – this is not specific to data science and applies to any project that isn’t specified properly in terms of expected results and appropriate timelines.
- Inappropriate projects for a data science methodology – it has become increasingly common to call all analytics data science. If a project can be solved using a standard data warehouse and business intelligence method, then you should probably just do that.
- Poor expectation management – many data science projects suffer from this. It is important to ensure stakeholders are aware what can and cannot be expected from the results.
- Data supply – a vital first step in any analytics project is to ensure that the necessary data feeds are available and accessible.
Let’s talk about publication bias. This phenomenon occurs in the publication of scientific papers, where it is usual to only publish studies that produce positive results. What is far less common is to publish a paper that highlights the amount of work you did in order to fail to produce anything of any worth! The problem is that this leads to teams making the same mistakes, or proceeding down the same creative cul-de-sacs as so many before them. Because of publication bias, we do not learn from each other’s mistakes.
Exactly that situation can occur in a data science team. Unless a true collaborative environment exists for discovery and predictive model development, the same failures will be made over and over again by different members of the team.
Move out of the cul-de-sac
In order to benefit from the fail fast approach, data science teams need to adopt a best practice method of sharing results, methodologies and discovery work – especially when their work is considered a failure. This can be done in many ways, but some of the more effective include regular discussion – similar to agile methodology’s stand-up meetings – and using appropriate software to aid the process.
Software tools exist to facilitate collaboration, issue tracking, continuous documentation, source control and versioning of programme code, as well as task tracking. These tools create a lineage of activities that is permanent and searchable.
If you want to hear more on this subject, why not come to see my presentation ‘My data scientists are failures’ at the Teradata PARTNERS conference in Anaheim this October.
Christopher Hillman is a Principal Data Scientist in the International Advanced Analytics team at Teradata based in London. He has over 20 years experience working with analytics across many industries including Retail, Finance, Telecoms and Manufacturing. Chris is involved in the pre-sale and start-up activities of Analytics projects helping customers to gain value from and understand Advanced Analytics and Machine Learning. He has spoken on Data Science and analytics at Teradata events such as Universe and Partners and also industry events such as Strata, Hadoop World, Flink Forward and IEEE Big data conferences. Currently Chris is also studying part-time for a PhD in Data Science at the University of Dundee applying Big Data analytics to the data produced from experimentation into the Human Proteome.