The Top 10 Requirements to be a Data Scientist

Wednesday July 17th, 2013

I recently had a conversation about the difficulty in hiring analytic professionals (or data scientists if you are based in California!) and it’s not surprising given the skills and behaviours being sought (this list was created by looking at relevant Job Descriptions on LinkedIn):

  1. Analytical skill-set
    1. Mathematics / statistics (including experimental design)
    2. Domain knowledge (i.e. Industry specific processes where analytic are applied)
    3. Technology / data
  2. Communication skills (story-telling)
  3. Curiosity (willingness to challenge the status quo)
  4. Collaboration
  5. Commercial acumen
  6. Customer-centric
  7. Problem-solving skills
  8. Proactive
  9. Strategic
  10. Willingness to spend lots of time justifying your existence in the organisation

Ok, so I added #10 based on my experience to bring it up to a round number!

So when you throw “Big Data” into the mix, is it any wonder there is a skills shortage? (McKinsey is May 2011 predicted a shortage in the United States alone of 140k to 190k by 2018, as well as 1.5m managers and analysts with the know-how to use the analysis of big data to make effective decisions).  

The good news is that, when building any team, every team member is unique and will have relative strengths in different areas – no one person has to be an expert in all aspects.  The team can be constructed to complement and leverage these different skill-sets. 

However, the worrying trend that I’ve seen recently is the ‘Dumbing Down’ of Analytics.  Organisations faced with this challenge believe it is the analytical skill-set requirement which can be relaxed – I totally disagree.

 “Say you is data analyst because use of Google Analytics like say you is doctor because watch House.” @BigDataBorat

If I was going to have a medical procedure I would happily trade-off the communication skills of the doctor to ensure they had the required medical skills!!

“Big Data”

So how does the advent of “Big Data” impact this challenge?

First up, I am not going to attempt to (re)define “Big Data”.  Doug Laney (@Doug_Laney) from Gartner penned the famous Three ‘Vs’ back in 2001 and I’ve seen plenty of others distort and misuse this definition to fit their particular narrative (perhaps the vendor sales equivalent should be the Three ‘B’s – Blah Blah Blah!! ©Mark Hunter 2013!)

I believe “Big Data” will cause the greatest disruption in the ‘Analytic Skill-set’ requirement – in particular the mathematics / statistics and technology components.

Mathematics / Statistics

Traditionally analytics was based on ‘relational’ data, where tools like SAS & SQL have been prevalent; however the move is now towards ‘non-relational’ analytics.  Examples include Graph Analysis (used for Network Analysis), Path Analysis (used to understand path across disparate time-based events, like path to purchase across multi-channel interactions) and Text Analytics (which isn’t new, but is becoming more mature).  The good news here is that analyst professionals should be able to brush up on these techniques pretty quickly as many of these concepts are covered at university.


The “Big Data” technology landscape is getting pretty complex, but I believe Teradata is leading the way in its vision for the logical data warehouse.  We recognise that Hadoop is an important component of the logical data warehouse as it delivers a commercially viable solution to the challenge of ‘keeping all data forever’.

Teradata’s Unified Data Architecture includes the Teradata Data Warehouse, Hadoop and the Teradata Aster discovery platform.  The good news for the analytic community is the level of integration between the components and the creation of the patented SQL/MR within the Teradata Aster platform.

The Teradata Data Warehouse and Teradata Aster both support Apache HCatalog, making it easier to share and reuse data stored in Hadoop.  But the real ‘secret sauce’ is the creation of SQL/MR within the Teradata Aster platform.  This brings Map:Reduce programming to the SQL-savvy analyst which is a game-changer as I don’t really want to be add hard-core JAVA programming to the list of requirements!!

So rather than ‘Dumbing Down’ the analytic skill-set, let’s look at adding incremental ‘non-relational’ analytical techniques and simplifying the extended technology landscape. 

Everything should be made as simple as possible… but not simpler” Albert Einstein.

Mark Hunter is a Financial Services Industry Consultant with Teradata Australia & New Zealand. Mark has 15 years of banking experience gained in the UK and Asia. He has extensive experience in developing analytical capability to drive data-led decisions. He has worked across the entire customer lifecycle with specialist knowledge in Marketing and Risk. You can also follow Mark on twitter @Mark_Hunter_Mel

The following two tabs change content below.

Leave a Reply

Your email address will not be published. Required fields are marked *