The Art of Data Science

Wednesday August 6th, 2014

I recently had the pleasure of reading “Surely You’re Joking Mr Feynman” aloud to my two children. It’s a very funny book about Richard Feynman, the legendary nobel prize winning physicist who, as well as being a brilliant scientist was also an amateur safe cracker, bongo drummer and artist.

Feynman had a famous problem solving technique:

  1. Write down the problem.
  2. Think very hard.
  3. Write down the solution.

If you’re good at thinking, the Feynman technique works well for theoretical problems, and it’s also a very good technique to use when working on data science problems in business. However, if the answer you’re looking for is hidden within terabytes of data, it’s a good idea to complement the methodology with some relevant data mining tools and techniques.

Feynman was famous for his creativity and original thinking. There is plenty of scope for this in data science as well. Ways in which creativity can be expressed in data science include:

  • applying techniques in unconventional settings.
  • combining techniques in unconventional ways.
  • developing new and unusual hypothesis.

For example, market basket analysis (or collaborative filtering) is traditionally used in the retail vertical to analyse shopping behaviour. It gives information such as “people who bought x … also bought y” and allows retailers to make recommendations such as “products you might like to add to your basket”.

But we can apply this technique in a plethora of other settings. If we are working with a government department which is looking to improve self –service rates on its website, we can treat webpages visited together in a single user session as items in a basket and then make “forms you may be looking for” type recommendations. If we are analysing text data we can treat key phrases which occur in the same document as items in a basket and get insights along the lines of “people who mention x … also mention y”.

This last example feeds into the second area in which there is scope for creativity in data science – combining techniques in interesting and novel ways. With so many techniques at our disposal: text analytics, social network analytics, path analyses, clustering and predictive modelling; the ways in which these can be combined to solve difficult problems is almost endless.

A typical workflow might involve: parsing weblogs > sessionising > path analysis > calculation of behavioural variables based on path analysis > including new behavioural variables in new or existing propensity models.

Having creative ideas is one thing, but having a platform capable of allowing the expression of that creativity is also important. This is where a platform like Teradata Aster comes to the fore. Aster 6 runs standard SQL, map-reduce and bulk synchronous processing (for social network analyses) in-database to enable advanced analytics at scale. The provisioning of these analytical functions via an easy to use SQL interface make it the perfect canvass on which to express your data science creativity.

Ross Farrelly is the Chief Data Scientist for Teradata ANZ, Ross is responsible for data mining, analytics and advanced modeling projects using the Teradata Aster platform. Previously Ross ran Datamilk, an independent bespoke data mining consultancy specialising in data mining and advanced predictive analytics. Ross is a six sigma black belt and has had many years of experience in a variety of statistical roles including Business Development Management at Minitab and as a SAS Analyst at New Frontier Publishing. Connect with Ross Farrelly on Linkedin.

The following two tabs change content below.
avatar

Ross Farrelly

Chief Data Scientist at Teradata
Ross Farrelly is the Chief Data Scientist for Teradata ANZ, Ross is responsible for data mining, analytics and advanced modeling projects using the Teradata Aster platform. Previously Ross ran Datamilk, an independent bespoke data mining consultancy specialising in data mining and advanced predictive analytics. Ross is a six sigma black belt and has had many years of experience in a variety of statistical roles including Business Development Management at Minitab and as a SAS Analyst at New Frontier Publishing.
Category: Ross Farrelly Tags: , , ,
avatar

About Ross Farrelly

Ross Farrelly is the Chief Data Scientist for Teradata ANZ, Ross is responsible for data mining, analytics and advanced modeling projects using the Teradata Aster platform. Previously Ross ran Datamilk, an independent bespoke data mining consultancy specialising in data mining and advanced predictive analytics. Ross is a six sigma black belt and has had many years of experience in a variety of statistical roles including Business Development Management at Minitab and as a SAS Analyst at New Frontier Publishing.

Leave a Reply

Your email address will not be published. Required fields are marked *


*