Daily Archives: Wednesday March 15th, 2017

Here’s some data. Now amaze me, data scientist!

Wednesday March 15th, 2017

— Or why discovering insight is often inevitable.

Sure, give me your data, and there is a good chance I can “wow you”. Why am I so confident? Because I am an astronomer by trade, and I believe that the path to discovery in astronomy, and discovery in data science, share some fundamental underlying principles.

Major discoveries in astronomy (and many other branches of science) often occur when a previously unexplored area of observational parameter space is opened up by new instruments, or new ways of analysing data.

What do I mean by “observational parameter space”?

Let me give you an example: When Galileo first turned a telescope to the sky, he was seeing the universe in a way it never previously been seen before, and in doing so, he made arguably some of the most amazing and important discoveries in the history of humankind.


This is what it means to open new areas of observational parameter space – to move beyond the current limitations of data quality, precision, or type of information that is available. That is, to gain visibility to things that were previously invisible. The power of bringing new data, improved data or new analysis to bear for the purpose of discovery is manifest in the history of astronomy.

There are countless stories, too numerous to list here, of major unexpected discoveries that came about simply through recording new types of data (for example the discovery of Gamma ray bursts), analysing data in new ways (for example the discovery of pulsars), or by combining different types of data for the first time (for example the discovery of quasars).

In the world of data science, the situation is no different. When an organisation or government department records new types of data, or enables the combination of different types of data, or significantly improves the accuracy and reliability of existing data, there is a very high probability that new insights will be uncovered. This is simply the result of gaining visibility to things that were previously invisible. Astronomers are so confident in this path to discovery, that it often drives the design and construction of new telescopes.

However, data is a necessary but not sufficient condition for insight discovery: there are other key ingredients.

Discovery happens when data meets the prepared mind: there is no magic algorithm that will sift through the data and provide all the useful insights on a plate. Ultimately, there is no substitute for a deep knowledge of the business problems, and deep knowledge of the data.

An excellent example of precisely this point in the annals of astronomy is the Nobel Prize winning discovery of the cosmic microwave background – the fossil light left over from the big bang. The two researchers who discovered this fossil light thought it was just an annoying source of noise that was hindering their research, and they tried their hardest to avoid seeing it. It was a nearby research group that was able to interpret the “annoying noise in the data” as the smoking gun of the big bang, which ultimately led to the Nobel Prize winning discovery.

The moral of the story: developing intuition, understanding the business, and understanding the data, are of utmost importance. Without it, you may miss that “Nobel Prize winning” insight, no matter how ground-breaking your data!