The 6 Skills Required to be a Good Data Scientist

Friday March 28th, 2014

Presenting at Teradata Summit 2014 across Australia and New Zealand this week, Teradata CTO Stephen Brobst discussed the skills that are required to be a good data scientist :

Curiosity: Dives into the data head first.

Intuition: Has good business sense and will explore in directions that yield results but is not afraid to fail.

Data Gathering:  Knows how to find data and knows how to design experiments to obtain data when it is not available.

Statistics:  Understands causality versus correlation, expected value theory, statistical significance etc.  This is different from the base math, but covers the understanding of creating a viable sample size, understanding the basis of valid and sound experiments etc.

Analytic Modelling: Uses historical data to predict the future without over-fitting the data.

Communication: Ability to explain the results of data exploration without using math terms.

“So, where do you find people like this?  Applied physicists or applied science people are typically good candidates for these roles.  Social science people involved in field surveys also understand data and statistics, so hire them if the stats and maths geeks are not available.” said Brobst.

What can you do to make the Data Scientist successful ?

– Give them self-provisioning to data. Don’t put castle guards (DBAs) in place between the data and the data scientist.  Don’t put a ROI test in front of the test, as the test will determine whether there is a point in having an ROI discussion.  Provide data in un-modelled form for raw analysis. If you try to put these blocks in place, a black market in data will develop.

– Ensure data visualisation is available, as data scientists look for patterns in data.

– Data Scientists should be able to have a dedicated space (“data lab“) where they can load new data that is not yet integrated into the data warehouse.  This avoids people downloading data from the data warehouse to their local PC or server for their analysis.  This download approach has security and performance implications that are far worse than allowing controlled processing and data loading on the data warehouse.

David Stewardson is a Senior Consultant in the Teradata Solutions Group. He has a very strong technical background and business acumen with over 23 years’ experience in the Data Warehouse business, specialising in Business Intelligence. During his extensive career, he worked in 6 countries, across 8 different industries (including Mining, Finance and Insurance, Utilities and Telecoms) and has been responsible for managing teams of varying sizes from five up to 150 in previous Business Analysis, Project Manager, Program Manager and Program Director roles. Connect with David Stewardson on Linkedin.

The following two tabs change content below.

Leave a Reply

Your email address will not be published. Required fields are marked *