By: Lorie Nelson, Senior Product Manager for Teradata’s Travel and Hospitality Data Model
As a child I was sure there must be a book of life that explains what we need to know about each other, the planet and our purpose in the world. I asked every adult I knew or came in contact with if they knew of such a book but no one had an answer.
Initially, I was disappointed until the idea came to me, “I will write my own book.” And, this began my first data collection project – the beginning of my fascination with data.
I had no idea what I was going to do with the data. But I was sure that if I just kept at it, I would recognize some clue or pattern… the key to understanding life, unlocking the secret object of my desire. Eventually I stopped collecting data on notecards to pursue other interests, but continued to collect data, as we all do, through personal experience.
The Data Living in Our Brains
Our brains are essentially “big data” platforms with their own unique wiring, algorithms if you will, for data collection, storage and retrieval. An article in Scientific American tells us, “…the brain’s memory storage capacity is something closer to around 2.5 petabytes…” The data are a combination of structured and unstructured data. Two-thirds of our brain is set to process visual data while the remainder is used to process our thoughts, perceptions, textural input and output etc. The amazing and curious thing is how all of this data comes together to make up our individual stories. Our personal stories also contain information gathered from our network of family, friends and affiliated communities. We are much smarter collectively than we are as individuals.
When we have a question or problem, our process is to examine our personal data collections, make discernments about the validity of that data and hopefully, find an answer to our problem. But, if we only search our own mind’s collection, the solution is often obscured. Sometimes the solution requires a “refresh” from our smart collective network of friends and colleagues, and we may also reach beyond our personal network to Google a question or post our problem on a group board like LinkedIn.
The Value of Going Beyond Our Own Data
This is exactly what innovative companies such as Amazon, Google, Microsoft, eBay and The New York Times, to name a few, have been doing. They are tapping into their own data stores and, with the help of data scientists and data artists, they are beginning to understand data in new ways by co-mingling their data with the vast stores of data from other sources outside of their own collections.
Businesses can discover valuable insight into their product development, delivery and marketing, for example, by combining their internal data with public data to determine sentiment from sources such as Twitter, Facebook, Yelp and Google Alerts.
Earlier this year, I landed on the Harvard Web site and discovered a treasure of knowledge and the object of desire for my inner 7-year-old. It was the Harvard Dataverse!
The Harvard Dataverse Project is an open source web application developed by the Data Science team at Harvard’s Institute for Quantitative Social Science (IQSS) and is dedicated to sharing, archiving, citing, exploring and analyzing research data across all research fields. Coding the Dataverse Network software began in 2006. “The Dataverse repository hosts multiple dataverses.” Datasets in each Dataverse contain descriptive metadata and data files (including documentation and code). They open doors to researchers, writers, publishers and affiliated institutions. Other universities and research institutions around the world have joined forces with Harvard and are creating their own dataverses.
A few years ago, while attending a data visualization workshop, my instructor, Jer Thorp, introduced me to another great source of data, The New York Times. Their database offers up over 50 years of articles, searchable to those who apply and use the NYT data structure format for submitting queries. Utilizing a programming language called Processing, I transformed my NYT result dataset into a beautiful radial diagram that showed me the concentration of my search phrase over time by as it related to certain keywords. What was more interesting to me through this visualization, and unlike the typical bar charts and pie charts we have all seen, was my ability to see the outliers as well. Sometimes it is the outlier that tells a more interesting story than the larger concentrations of occurrences.
Advances in the areas of big data and analytics coupled with existing technologies and access to internal and external data will contribute to exponential growth, true innovation and creativity in solving business and scientific questions. As Hans Rosling said, “let the dataset change your mindset.”
In my next blog, “The Data You Weren’t Looking For,” I will expand upon this topic to cover innovations in data discoveries and visualizations by current data scientists and data artists.