In the first installment of this blog series, we described how the quest for artificial intelligence (AI) gave us the discipline of machine learning – the study of how to enable an intelligent agent to learn from data to improve its performance. But what has any of that got to do with commercial analytics?
Learning and predicting from data pre-dates the study of AI – and dusting off centuries old, tried and tested mathematical techniques like linear regression and Bayesian statistics turned out to be (much) easier than some of the “hard” problems in AI.
Not only that, but as as more and more business processes and systems were computerized during the 70s and 80s – and as commercial databases begin to proliferate as a result – applying these methods to the data in those databases also turned out to have very valuable commercial applications, like forecasting demand for perishable products in grocery retail, for example, or identifying potentially fraudulent transactions in retail finance.
“Knowledge discovery in databases” started as an off-shoot of machine learning, with the first Knowledge Discovery and Data Mining workshop taking place at an AI conference in 1989 and helping to coin the term “data mining” in the process – a term that we will come back to a little later in this blog. And so, AI gave rise to the study of machine learning – which led in turn to data mining.
Supervised and unsupervised methods
Machine learning is often concerned with making so-called “supervised predictions”, i.e. in learning from a training set of historical data in which objects or outcomes are known and are labelled, so that the intelligent agent can differentiate between, say, a cat and a mat. Or so that it can learn to identify the signals in petabytes of sensor data that characterize the imminent failure of a train, a jet engine or a paper mill. The objective, in both cases, is to produce a model that can predict a target variable – whether an object is a cat or a mat, or whether a train will fail or not within the next 36 hours – from input data – images harvested from the Internet, or the readings from the temperature, pressure and vibration sensors on the train.
By contrast, data mining is often also concerned with the discovery of previously unknown patterns or structures in data. Retailers, for example, have long been interested in finding groups of customers who behave in similar ways and in “clustering” shopping missions, to understand consumer behavior and how stores are shopped. These are examples of the applications of “unsupervised methods”; we are still feeding the clustering algorithms historical data, but the data aren’t labelled – because we don’t know exactly which outcomes we are looking for. When one of us undertook our first customer behavioural segmentation project using an unsupervised approach, for example, we were not expecting to find a large group of consumers shopping our stores between 5pm and 9pm and whose baskets almost exclusively contained breath mints, flowers and chocolates – nor another, buying almost exclusively frozen products, apparently for immediate consumption. But there they were!
Four things to remember
We don’t want you to get too hung-up on history or terminology, but we do want you to understand four things.
Firstly, you simply can’t “machine learn everything” – not least because supervised methods pre-suppose that you have a relevant, labelled training data-set to learn from and because the results of an unsupervised analysis may be hard to interpret, or even irrelevant. But also because in many cases there are anyway better routes to goal. By and large the big web properties don’t try to “machine learn” how big or which colour to make the “buy it now button” – they mostly run multiple, concurrent A/B tests instead. It’s quicker and it’s easier. And the output is not a prediction that may – or may not – prove to be accurate; but instead is a measurement of whether treatment A is more effective than treatment B for a particular customer segment right now and that can be easily compared with other similar measurements.
Secondly, that “data mining” was once the cool new term – popularised, in part, by vendors and marketing departments who thought that “knowledge discovery in databases” wasn’t catchy enough – and who wanted to try and distinguish the application of these methods to commercial data captured in databases from dry, dusty and apparently far-off academic concerns about machine leaning and AI. Fast-forward three decades – and now vendor marketing departments are in many cases attempting to differentiate their offers from existing data mining technologies by applying the label “machine learning” to them, apparently without realising that the term pre-dates the term “data mining”. In a very real sense, the marketing hype has literally come full circle.
Thirdly, that data mining started as an off-shoot of machine learning – itself a product of the pursuit of AI – and that the fields remain closely linked and continue to share multiple techniques, algorithms, and researchers. So closely linked, in fact, that the two expressions are often used interchangeably – and in many situations are practically synonymous. When a mobile telecommunications company builds a model to predict which customers are likely to churn based on historical data that describes customers who have already recently cancelled their service, we can – and probably we should – call that “machine learning” (because we are using a computer to build a model from labelled historical data), even if we use a mathematical method, like linear regression, that pre-dates Turing, digital computers and the Dartmouth Conference. In practice, you will find plenty of practitioners describing the same activity as “data mining”, “data science” or just plain old “analytics”. You say tom-ay-to, and I say tom-ah-to.
Lastly, whilst you should absolutely embrace some of the newer machine learning techniques and technologies – as we’ll see later in this series of blogs, the deep learning family of methods in particular has already become the de facto solution for a whole range of high-value business problems – you would be unwise to throw out the more established methods and techniques in the process. Because as we’ll also see later, in many cases we may prefer a simple solution that is sufficiently accurate to a more complex one.
Martin Willcox –
Senior Director, Go to Market Organisation (Teradata)
Martin is a Senior Director in Teradata’s Go-To Market organisation, charged with articulating to prospective customers, analysts and media organisations Teradata’s strategy and the nature, value and differentiation of Teradata technology and solution offerings.
Martin has 21 years of experience in the IT industry and is listed in dataIQ’s “Big Data 100” as one of the most influential people in UK data-driven business. He has worked for 5 organisations and was formerly the Data Warehouse Manager at Co-operative Retail in the UK and later the Senior Data Architect at Co‑operative Group.
Since joining Teradata, Martin has worked in Solution Architecture, Enterprise Architecture, Demand Generation, Technology Marketing and Management roles. Prior to taking-up his current appointment, Martin led Teradata’s International Big Data CoE – a team of Data Scientists, Technology and Architecture Consultants tasked withassisting Teradata customers throughout Europe, the Middle East, Africa and Asia to realise value from their Big Data assets.
Martin is a former Teradata customer who understands the Analytics landscape and marketplace from the twin perspectives of an end-user organisation and a technology vendor. His Strata (UK) 2016 keynote can be found at: https://www.oreilly.com/ideas/the-internet-of-things-its-the-sensor-data-stupid and a selection of his Teradata Voice Forbes blogs can be found online, including this piece on the importance – and the limitations – of visualisation.
Martin holds a BSc (Hons) in Physics and Astronomy from the University of Sheffield and a Postgraduate Certificate in Computing for Commerce and Industry from the Open University. He is married with three children and is a lapsed supporter of Sheffield Wednesday Football Club. In his spare time, Martin enjoys playing with technology,flying gliders, photography and listening to guitar music.
Dr. Frank Säuberlich – Director Data Science & Data Innovation, Teradata GmbH
Dr. Frank Säuberlich leads the Data Science & Data Innovation unit of Teradata Germany. It is part of his repsonsibilities to make the latest market and technology developments available to Teradata customers. Currently, his main focus is on topics such as predictive analytics, machine learning and artificial intelligence.
Following his studies of business mathematics, Frank Säuberlich worked as a research assistant at the Institute for Decision Theory and Corporate Research at the University of Karlsruhe (TH), where he was already dealing with data mining questions.
His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International.
Frank Säuberlich has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).