Although definitions of Big Data typically embrace features other than size (and for no good reason these features always begin with a “V” like Velocity, Variety and Veracity) the name “Big Data” instantly brings to mind a lot of data.
So, how big can Big Data get?
In 2010 the list of largest databases in the world quotes the World Data Centre for Climate database as the largest in the world, at 220 Terabyte (possibly because of the additional 6 Petabyte of tapes they hold, albeit not directly accessible data). By the end of 2014, according to the Centre’s web site, the database size is close to 4 Petabyte (roughly 2 Petabytes of these are internal data).
Facebook claim upwards of 300 Petabyte of data in their (so called) data warehouse; however, as we all know, there is very little analysis done on these data – mainly due to the fact that much of it is pictures of cats :-).
These sizes are about to be dwarfed by new science projects running now or coming to life soon.
The Large Synoptic Survey Telescope (depicted below) is likely to break many data-volume records.
Above: The Large Synoptic Survey Telescope (source: AstronomyNow.com)
The 8.4 meter telescope (which is quite small compared with the planned European Extremely Large Telescope with a diameter of 40 meters) will boast a 3.2 Giga-pixel camera (which is the largest digital camera on earth) taking a photo of the sky every 15 seconds.
This generates 30 Terabytes of astronomical data per night.
In its planned 10 years of operation the telescope will generate over 60 Petabyte of raw data plus a (probably several times larger) amount of analysis data. For comparison, humanity has accumulated circa 300,000 Petabytes of data since time immemorial. This telescope alone will add 0.1% !
And it gets even bigger.
The Large Hadron Collider at CERN generates about 30 Petabytes per year (as a result of 600 million collisions per second generating data in their detectors. Interestingly, scientists had to sift through these data to find the handful of collisions that produced the Higgs Boson. They deservedly won the Noble prize for their efforts).
Above: The Large Hadron Collider (source: HowItWorksDaily.com)
The data is too large for a single data centre so CERN created the Global Computing Grid which divides the load between computer centres all over the world.
The Internet of Things promises ubiquitous sensors providing data continuously. Some of the data repositories involved are likely to break even these new records.
So, what is the biggest data set you know of? And what is the biggest single data set you are expecting to be involved in?
For a different perspective on this, stay tuned for my next article titled “Is Big Data Getting Smaller”
1 A Petabyte is 1,000 Terabyte, or 1,000,000 Gigabyte, or 1015 byte.
Ben Bor is a Senior Solutions Architect at Teradata ANZ, specialist in maximising the value of enterprise data. He gained international experience on projects in Europe, America, Asia and Australia. Ben has over 30 years’ experience in the IT industry. Prior to joining Teradata, Ben worked for international consultancies for about 15 years and for international banks before that. Connect with Ben Bor via Linkedin.