6 Ways to Find out if there is any Science in your Analytics

Sunday March 29th, 2015

“All Science Analyses Data but not all Data Analysis is Science”

We are currently blessed with more data than ever before. Yet, most of the conversations are volume, speed, or technology oriented. Big data is an evolution of data, but there is no such thing as big science. Only Science.

Does Science matter? Yes, absolutely, definitely. New technology allows more and more analysts to process ever larger volumes of data, faster, and without barriers. This is a good thing as more models can be tested, more questions answered, and more phenomena explained than ever before.

It is also a dangerous thing for your business. All data processing lead to results, but only a scientific approach will reliably lead to accurate results. The difference can be great.

1. “Science is a Method, not a collection of facts or technologies”

According to the dictionary, Science is: the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment

The definition above says nothing of specific technologies, higher education degrees, or mathematical constructs that one often ascribes to the practice of science and the description of a scientist. In fact, science is a method, a way of life. An artisan (e.g., baker, cabinet maker) can be as much a scientist as a pathologist or particle physicist.

A Scientist is not defined by the technology she/he uses or by the amount of facts or mathematics she/he knows, but by the dedicated practice of the Scientific Method

Practically unchanged and in use for over 2500 years, the Scientific Method encompasses the aspects of Observation (Formulating a question, background resaerch and hypothesis formulation), Experimentation (Test, Validation) in a Systematic manner (iterative processes) resulting in a Theory and Explanation of the observed phenomenon.

2. “Data exploration feeds the input to the method, not its output”

Data mining, or exploration, is a starting point; it helps discover questions worth answering and patterns worth testing. What it does not guarantee is a definitive, or even valid, answer. The pitfall of bypassing the scientific method is that while such an approach can often sound convincing and be well presented, quantity of data is no substitute for systematic experimentation. Time gains resulting from circumventing the Scientific Method will be negated by the costs of realising the output is wrong, (much) later down the line.

3. “Domain Expertise helps frame the question, not the answer”

Science is technology and domain agnostic. Business domain expertise is necessary to devise a business question worth answering, filter out red herrings, and provide knowledge regarding state-of-the-art. The core of the scientific method, on the other hand, does not preoccupy itself with domain expertise. Its conclusions may be impractical or too expensive to implement, but the only recourse is to reframe the hypothesis.

Replacing the scientific method with current domain expertise ensures that outcomes are neither novel nor robust.

4. “Science is predictive”

The output of the scientific method is, and has to be, a prediction. The prediction is what is being tested, under the constraints set by the experiment and the hypothesis. Every time a correlation or an insight is presented, they explicitly or implicitly represent a prediction. Always.

5. “All results, positive and negative need to be reviewed and explained”

Failure is not a part of the scientific method. A rejected hypothesis is a perfectly valid outcome of the method, and the rejection of the hypothesis adds to the general body of knowledge that is being investigated. Consequently, all outcomes should be reported and explained. Non-reporting of rejected hypotheses can have dire effects, as is regularly observed, post hoc, in drug trials. Censorship (self or otherwise) of results leads to wrongful hypotheses being accepted and will force other people to retest disproved hypotheses again and again, wasting time and effort for all.

6. “Scientific results are temporary”

Circumstances change, behaviours change, technologies change. Why should insights be any different? The iterative nature of the scientific method illustrates a permanent consideration that hypotheses and theories are only valid up to the point when they are disproved.

New data, new methods, and new technologies bring greater experimental precision or novel information that can disprove long-standing theories. This is not only unavoidable; it is desirable because it allows our understanding to grow more precise and accurate.

Clément Fredembach is a data scientist with Teradata Australia and New Zealand Advance Analytics group. With a background in Colour Science, Computational Photography and Computer Vision, Clement has designed and built perceptual statistical experiments and models for the past 10 years.

The following two tabs change content below.
avatar

Clement Fredembach

Data Scientist at Teradata
Clement is a data scientist with Teradata Australia and New Zealand Advance Analytics group. With a background in Color Science, Computational Photography and Computer Vision, Clement has designed and build perceptual statistical experiments and models for the past 10 years. Clement strives to combine his psychometric, perceptual and statistical knowledge to deliver insights and their story that are understandable and actionable to non-technical audiences. Prior to joining Teradata, Clement collaborated with several Fortune 500 and academic institutions as a researcher, publishing and patenting large portions of of his research along the way. Clement holds an MSc in Communication Systems from EPFL (Switzerland) on Image Classification and a PhD from UEA (UK) on Computational Imaging. His interests range from behavioral psychology to graph theory and photography.
Category: Clement Fredembach Tags: ,
avatar

About Clement Fredembach

Clement is a data scientist with Teradata Australia and New Zealand Advance Analytics group. With a background in Color Science, Computational Photography and Computer Vision, Clement has designed and build perceptual statistical experiments and models for the past 10 years. Clement strives to combine his psychometric, perceptual and statistical knowledge to deliver insights and their story that are understandable and actionable to non-technical audiences. Prior to joining Teradata, Clement collaborated with several Fortune 500 and academic institutions as a researcher, publishing and patenting large portions of of his research along the way. Clement holds an MSc in Communication Systems from EPFL (Switzerland) on Image Classification and a PhD from UEA (UK) on Computational Imaging. His interests range from behavioral psychology to graph theory and photography.

One thought on “6 Ways to Find out if there is any Science in your Analytics

  1. avatarSrinivas

    Hi Clement

    It is a great article and well articulated in the world of data science.

    I have one reservation on ” Scientific results are temporary”. May be we can say ” Some scientific results are temporary”. for that reason all physics and chemistry subjects data is proven and stands hold good for foreseeable future.

    Similarly all data science results hold good for the historical data analytics. The name predictive itself connotes the nature of analytics has an accuracy issue as statistics inherently provides allowance for error. Hence the Data science is prone to margin of error as it is based on errors.

    What is your view?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


*