It's funny to me how perspectives have changed over the years. Granted, I haven't been around since the Eniac, but I have been working with computers since the early 1980s. Back then the perspective of big data could have been measured in megabytes. Few people could tell you what a gigabyte was, and fewer still knew of terabytes.
However, one company had the vision and named itself after that measure. That same start-up today might be called Zettadata, or Yottadata as the concept is so foreign to people today as a terabyte was to a data analyst in the 80s.
Big data today isn't necessarily about volume, but can also be about velocity. Again, going back to my early days using an acoustic coupler to my 300 baud modem to get to a BBS was cutting edge. You could read the screen faster than the data characters would appear. Today, we have broadband connections in our homes streaming 1080p HD movies.
Considering volume and velocity, Teradata systems are designed for that dimension of big data. This assumes the data is in a format which is appropriate for relational databases. However, things are evolving and additional dimensions exist, and this is where it gets interesting.
I’d also like to point out a slight misperception of big data being called unstructured. I challenge that as if it were truly unstructured, then it would be random. Simply put, all big data has a structure, though it most likely is being called unstructured or multi-structured as it doesn’t readily conform to traditional row and column formats.
The velocity of change needs to be considered in dealing with big data. A common relational format may not always be appropriate for data. Things like weblogs, call centre notes, XML and even sound and video files aren’t necessarily easy to load into a relational database. These formats change quite quickly, and it becomes difficult to establish standard patterns for extracting relational data.
Sticking with velocity, we have evolved past “human generated” data to the age of “machine generated” data. 30 years ago, it was common to have data entry clerks whose sole responsibility was to take written documents and type them into a computer system. The rate of input was limited to the number of people and the speed at which they could type. Today, we have machines with sensor data and RFID location tracking capable of generating millions of data points per minute. A modern aircraft can generate over a Terabyte (1.1 Trillion bytes) of data on a single flight.
As we move to the next age of big data, we see new techniques evolving to handle these new dimensions. Technologies such as AsterData and Hadoop have evolved to be able to work with these data types and find patterns or relationships in this data. And this is where I begin to get excited.
"The true business value of big data is all about relating it to the current opportunities."
Consider this example: searching free-form call centre notes to establish customer sentiment and potential signs of attrition are valuable insights to a bank. Combining this with the current marketing and retention offers and scoring from the existing relational data in the EDW can drive a more successful relationship and lower churn.
Or perhaps: using RFID positioning and tracking of a shopping trolley can show shopping patterns. Matching this with the customer segment and planogram (store/product layout) can show how effective targeted marketing campaigns are. It can also indicate where a change in product location might have negative impact to the most valued shoppers.
Big data is here. I would say it has always been here, but it is becoming a prominent challenge to companies. The companies, who can differentiate themselves by building and executing business improvement opportunities through the traditional and emerging big data sources, will distance themselves from their competition. Without a defined set of use cases and measures, then they run the risk of becoming a “me too” and will lag behind the competition.