Daily Archives: July 10, 2017

Maybe you can’t machine learn everything – but does that mean you shouldn’t try?

July 10, 2017

Funny cat with a folder for presentations. Business scene.; Shutterstock ID 291812552; Job Name: Blog; Department: Design


I just got back from the EAGE, the 79th annual European conference and exhibition for Geoscience and Engineering. But what I actually want to tell you about is what I did on the weekend before the conference started. A few of us from Teradata joined a slightly more recent tradition – the Subsurface Hackathon  organised by Agile Scientific.

Agile Scientific – AKA Matt and Evan – have been punching way above their weight for years promoting open source and DIY alternatives to the closed, expensive and often old-fashioned commercial software solutions available for geoscience workflows. Lately they have been teaching geoscientists to code in Python; challenging them to try machine learning. And so unsurprisingly, the theme of this year’s hackathon was machine learning (ML).

Now, some of us at Teradata – including myself – have been a bit cynical about the current hype surrounding machine learning in general, and Deep Learning in particular – especially when applied to scientific or industrial domains.

I often say that just because we can do machine learning – easily, with the many libraries and toolkits available these days – it doesn’t mean we should. In the world of geoscience data, our data is actually pretty sparse. You could argue that to perform adequate machine learning for Geoscience we would need at least as much data as Google has images of cats, for example. What we do have is real world physics controlling the relationships between our data. So is machine learning really the most appropriate approach?

I’m not suggesting there isn’t a place for machine learning. Before the hackathon, I believed after using additional techniques to quality control, and engineer features from the data, features could be successfully input into machine learning algorithms to generate useful insights. However the old ‘garbage in, garbage out’ law still applies, even in the near-magical world of ML. You can’t machine learn everything.

Continually try to prove yourself wrong

But you know what, deciding not to try something because you think it will fail is not smart thinking. That isn’t agile thinking. Smart businesses – and smart entrepreneurs for that matter – test their theories as soon as they catch themselves making any assumption about viability. They set up a short project to try to prove the opposite of what they instinctively believe. In other words, as a smart business or smart entrepreneur you continually try to prove yourself wrong.

Think about it: it’s much cheaper and less risky to discover up front that your assumption is wrong. The alternative would be to continue along in blissful ignorance, only realising your error after having your business idea side-swiped by someone who didn’t hold your built-in biases – likely after serious investment in time and money.

And so we took part in the ML hackathon. When I saw teams at the hackathon taking something of a naive approach: throwing raw data into generic algorithms provided by Google and others, I was initially put off. But we watched with open minds what other teams were trying, and we learned some things.

Standard image-feature extraction techniques that find faces in photos can actually find (obvious) traps and (obvious) faults in (synthetic) seismic images. They can do that with only a couple of days training, and only a few hundred data sets.

Open machine learning libraries allowed a team to train a neural net to create (simple) geological models from (synthetic) seismic, and create (synthetic) seismic from (simple) geological models. They can do that with only a couple of days training, and a few data sets.

So did throwing raw data at openly available ML algorithms completely fail? No, it didn’t. But can it replace human interpreters today? Well, no. Not after only two days of training, and certainly not by only looking at a couple of hundred data sets.

But in time, with enough data, maybe machine learning could replace human interpreters. Outside the hackathon environment time is much less of a constraint, so we could easily solve the problem of limited training time. But limited training data? That could be the rub.

In time… maybe machine learning could replace human interpreters

If we want to be able to detect changes in reservoirs via seismic data as easily as Google can find pictures of cats, it follows that we’re going to need to train our models with as many seismic images as Google has pictures of cats. I just found over 2 billion cat images on Google in 0.81 seconds. That equates to an awful lot of seismic images.

So how are we going to arrange that? In the case of super majors doing this work themselves, with access to vast amounts of data, maybe this could be viable. But for most of us – including universities and research institutes – it will be very difficult to take part in the model training experiment without some major changes to how we share data.

More fundamentally – should we be using supervised ML image processing techniques on seismic data at all? The hackathon teams chose this as an easy-entry, ‘low-hanging fruit’ approach to using existing ML libraries on subsurface data. They replicated the old workflow of interpreting seismic sections visually and detecting structural features.

How can machine learning deliver an adequate answer if even the experts can’t agree?

Remember the old joke about the five geoscientists looking at the same seismic section? How many interpretations will there be? At least six, right? That implies supervised ML might not be the best approach: how can machine learning deliver an adequate answer if even the experts can’t agree on the interpretation? Perhaps the way forward is to stop turning seismic survey data into images for humans to visually interpret. Would a better route be to apply machine learning to (as raw as possible) measurement data?

As is often the way these days, there doesn’t seem to be one single answer – just a load more questions… I’m off to look at some more cat pics while I think about it.

Maybe you can't Machine Learn everything - JM - bottom image

McConnell Jane_Web_MG_7964

Jane McConnell – Practice Partner Oil and Gas, Industrial IoT Group, Teradata

Originally from an IT background, Jane specialised in Oil & Gas with its specific information management needs back in 2000, and has been developing product, implementing solutions, consulting, blogging and presenting in this area since.

Jane has done time with the dominant market players – Landmark and Schlumberger – in R&D, product management, consulting, and sales – before joining Teradata in 2012. In one role or another, she has influenced information management projects for most major oil companies across Europe. She chaired the Education Committee for the European oil industry data management group ECIM, has written for Forbes, and regularly presents internationally at Oil Industry events.

As practice Partner for Oil and Gas within Teradata’s Industrial IoT group, Jane is focused on working with Oil and Gas clients across the international region to show how analytics can provide strategic advantage, and business benefits in the multi-millions. Jane is also a member of Teradata’s IoT core team, setting the strategy and positioning for Teradata’s IoT offerings, and works closely with Teradata Labs to influence development of products and services for the Industrial space.

Jane holds a B. Eng. in Information Systems Engineering from Heriot-Watt University, UK. She is Scottish, and has a stereotypical love of single malt whisky.

Big Data and the Fight Against Climate Change

July 10, 2017


These are not great times for the battle against climate change. With America officially withdrawing from Paris Agreement, the global alliance lost one of its major champions. While this year’s World Environment Day (celebrated each year on 5 June) was rather muted, the determination to continue with the momentum remains steadfast. What’s important is that governments, academia, and industry need to work together to make a difference. It is naïve to believe that only industry and enterprises contribute to climate change. There is a great need for scientists to harness those elements of technology that have the potential to hasten our understanding not just of the drivers of climate change but also help illustrate relevant solutions so that requisite action can be taken

The Data Revolution

It’s well known that while big data has had a transformative effect on academic research as well as business applications, its application in climate change has till now remained nascent. The good news is that a global data revolution is unfolding and accelerating decision-making around climate change. Using big data and analytics solutions, scientists are now analysing a wider framework of data including, for the first time, privacy-protected digital data — such as mobile data or credit/debit card transactions, for example — to get important insights into human consumption patterns that in turn can be correlated to climate risk. The resultant data streams are, therefore, providing an unprecedented opportunity to scientists, academia, and private enterprise to catalyse climate innovation and influence decision-making for the public good.

Public-Private Partnerships and Data Philanthropy

A promising initiative related to this is the United Nations Global Pulse programme called ‘Data for Climate Change’. The UN Global Pulse is an agency that partners with data-rich organizations along with other UN agencies, governments and “problem owners” to grapple with challenges that could benefit from new insights that help discover, build, and test high-potential applications of this big data. The focus is on sectors such as food security, agriculture, employment, infectious disease, urbanization, and disaster response and others. Climate change is a new addition to this list. The ‘Data for Climate Action’ initiative launched in March this year is an ‘open innovation challenge’ to scientists and researchers from across the world to harness data science and big data from the private sector to fight climate change. The challenge aims to leverage private big data to identify revolutionary new approaches to ‘climate mitigation and adaptation’. 

This initiative stands apart due to two reasons. First is that the global challenge has attracted companies from across industries and countries to participate through acts of data philanthropy. Secondly, the data being generated is not as much around climate data as it is about human behaviour and its effect on climate change. The challenge offers researchers an opportunity to gain unprecedented access to national, regional, and global datasets — anonymized and aggregated to protect privacy — and robust tools to support their research. The results of this is expected to be made public later this year.

Addressing Global Warming Cheaply and Effectively

The issue from a policy and implementation perspective is to reduce the effects of global warming in a manner that is both cost- and means-effective. Scientists and administrators do not have the time or the resources to try out different approaches and then decide the best one. Big data, advanced analytic techniques, and algorithms are playing an important role in enabling this. For example, specific to a country or region, data can show where flooding is most likely to occur or the areas that are most prone to drought or other natural calamities and the timeliness of this information is a critical factor. This can enable the local government to make accurate resource allocation, thereby minimising wastage and over-spills. Agriculture is another area that has a significant cause-and-effect relationship with global warming. It stands to reason that everything we do directly correlates to furthering environmental impact with catastrophic social and economic effects. An analysis of global social media data on the issue of global warming and climate change is also proving very effective in terms of highlighting the various elements that country populations are sensitive to — whether it’s energy, climate, the state of the oceans, agriculture, forests and natural resources, etc. This provides an important public perspective to policymakers and academicians.

While traditional sources of climate data help describe how and to what extent the climate is changing, they do not always illustrate the solutions that are likely to be most effective in reducing emissions and helping build community resilience. New sources of big data from different industries and geographies, can be applied to construct a more complete picture, thereby significantly enhancing the understanding of the deeply interrelated relationships between human action and climate change. Combine this with increasing amounts of computing resources available at a much lower cost than ever before plus the ability to process them on distributed platforms in the cloud and the potential to exploit data and analytics in an important area like climate change is more real than ever before. With these new adaptative or mitigation interventions, innovations can be developed and will lead to, it is hoped, the beginning of a new and enabling ecosystem.

Click here to learn more about Teradata’s involvement in environmental sustainability and efficiency.

Rajesh ShewaniRajesh Shewani, Head, Technology and Solution Architecture, Teradata India

Rajesh Shewani heads Technology and Solutions at Teradata India. He comes with close to two decades of experience in the areas of Data Management, Advanced Analytics, Solution Architecture, Enterprise data warehousing and Business Intelligence to name a few.

At Teradata he is responsible for leading a team of experienced Business Analytics consultants, data scientists and solution architects focusing on Teradata Data Analytics solutions. Rajesh has worked in various areas of information management across different industries and functional domains, advising customers on devising corporate performance management and business analytics strategies that will help them achieve differential advantage.

Prior to Teradata, Rajesh was with IBM for over 14 years and performed various roles in technology leadership, architecture and consulting. In his most recent role at IBM he was Country Manager for technical sales for IBM Business Analytics portfolio that included Cognos, SPSS, OpenPages and Algorithmics, leading a team of analytics functional and technical sales architects. Rajesh is certified in technologies such as Cognos BI & FPM, SOA, WebSphere Application Server and DB2.

He holds a Masters in Business Marketing & Information Management from Narsee Monjee Institute of Management Studies.