“All Science Analyses Data but not all Data Analysis is Science”
We are currently blessed with more data than ever before. Yet, most of the conversations are volume, speed, or technology oriented. Big data is an evolution of data, but there is no such thing as big science. Only Science.
Does Science matter? Yes, absolutely, definitely. New technology allows more and more analysts to process ever larger volumes of data, faster, and without barriers. This is a good thing as more models can be tested, more questions answered, and more phenomena explained than ever before.
It is also a dangerous thing for your business. All data processing lead to results, but only a scientific approach will reliably lead to accurate results. The difference can be great.
1. “Science is a Method, not a collection of facts or technologies”
According to the dictionary, Science is: “the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment”
The definition above says nothing of specific technologies, higher education degrees, or mathematical constructs that one often ascribes to the practice of science and the description of a scientist. In fact, science is a method, a way of life. An artisan (e.g., baker, cabinet maker) can be as much a scientist as a pathologist or particle physicist.
A Scientist is not defined by the technology she/he uses or by the amount of facts or mathematics she/he knows, but by the dedicated practice of the Scientific Method
Practically unchanged and in use for over 2500 years, the Scientific Method encompasses the aspects of Observation (Formulating a question, background resaerch and hypothesis formulation), Experimentation (Test, Validation) in a Systematic manner (iterative processes) resulting in a Theory and Explanation of the observed phenomenon.
2. “Data exploration feeds the input to the method, not its output”
Data mining, or exploration, is a starting point; it helps discover questions worth answering and patterns worth testing. What it does not guarantee is a definitive, or even valid, answer. The pitfall of bypassing the scientific method is that while such an approach can often sound convincing and be well presented, quantity of data is no substitute for systematic experimentation. Time gains resulting from circumventing the Scientific Method will be negated by the costs of realising the output is wrong, (much) later down the line.
3. “Domain Expertise helps frame the question, not the answer”
Science is technology and domain agnostic. Business domain expertise is necessary to devise a business question worth answering, filter out red herrings, and provide knowledge regarding state-of-the-art. The core of the scientific method, on the other hand, does not preoccupy itself with domain expertise. Its conclusions may be impractical or too expensive to implement, but the only recourse is to reframe the hypothesis.
Replacing the scientific method with current domain expertise ensures that outcomes are neither novel nor robust.
4. “Science is predictive”
The output of the scientific method is, and has to be, a prediction. The prediction is what is being tested, under the constraints set by the experiment and the hypothesis. Every time a correlation or an insight is presented, they explicitly or implicitly represent a prediction. Always.
5. “All results, positive and negative need to be reviewed and explained”
Failure is not a part of the scientific method. A rejected hypothesis is a perfectly valid outcome of the method, and the rejection of the hypothesis adds to the general body of knowledge that is being investigated. Consequently, all outcomes should be reported and explained. Non-reporting of rejected hypotheses can have dire effects, as is regularly observed, post hoc, in drug trials. Censorship (self or otherwise) of results leads to wrongful hypotheses being accepted and will force other people to retest disproved hypotheses again and again, wasting time and effort for all.
6. “Scientific results are temporary”
Circumstances change, behaviours change, technologies change. Why should insights be any different? The iterative nature of the scientific method illustrates a permanent consideration that hypotheses and theories are only valid up to the point when they are disproved.
New data, new methods, and new technologies bring greater experimental precision or novel information that can disprove long-standing theories. This is not only unavoidable; it is desirable because it allows our understanding to grow more precise and accurate.
Clément Fredembach is a data scientist with Teradata Australia and New Zealand Advance Analytics group. With a background in Colour Science, Computational Photography and Computer Vision, Clement has designed and built perceptual statistical experiments and models for the past 10 years.
Latest posts by Clement Fredembach (see all)
- Is Collaboration killing Creativity? - December 20, 2016
- What Buffy the Vampire Slayer tells us about a Trump presidency and Brexit - November 9, 2016
- The role of data in data storytelling - August 29, 2016
- Game Of Thrones – Who Dares Dies. Okay, So Who’s Next? - June 22, 2016
- Who’s next? Predicting Deaths in Game of Thrones – Part 2: Event-based survival modeling - April 20, 2016