Daily Archives: September 25, 2017

Is failure good for your data scientists?

September 25, 2017


If you’ve heard of data science (if you haven’t, where have you been and how did you find this blog?), you’ve probably heard of “fail fast”. The fail fast mentality is based on the notion that if an activity isn’t going to work, you should find out as quickly as possible, and stop doing it.

As the size, complexity and number of new data sources continues to increase, there is a corresponding increase in the value of discovery analytics. Discovery analytics is the method by which we uncover patterns in data and develop new use cases that lead to business value.

It is easy to see how discovery activities lead to a fail fast method. However, how can we learn from these failures, and how can we proceed without experiencing the same failures time and again?

Good failure, bad failure

There are two different types of failure possible in a data science project: good failures and bad failures. Good failures are a necessary part of the discovery process, and an important step in finding value in data. On the other hand, bad failures occur when they could have been avoided, and are basically of waste of everybody’s time. Examples of the cause of bad failures include:

  • Poor specification – this is not specific to data science and applies to any project that isn’t specified properly in terms of expected results and appropriate timelines.
  • Inappropriate projects for a data science methodology – it has become increasingly common to call all analytics data science. If a project can be solved using a standard data warehouse and business intelligence method, then you should probably just do that.
  • Poor expectation management – many data science projects suffer from this. It is important to ensure stakeholders are aware what can and cannot be expected from the results.
  • Data supply – a vital first step in any analytics project is to ensure that the necessary data feeds are available and accessible.

Let’s talk about publication bias. This phenomenon occurs in the publication of scientific papers, where it is usual to only publish studies that produce positive results. What is far less common is to publish a paper that highlights the amount of work you did in order to fail to produce anything of any worth! The problem is that this leads to teams making the same mistakes, or proceeding down the same creative cul-de-sacs as so many before them. Because of publication bias, we do not learn from each other’s mistakes.

Exactly that situation can occur in a data science team. Unless a true collaborative environment exists for discovery and predictive model development, the same failures will be made over and over again by different members of the team.

Move out of the cul-de-sac

In order to benefit from the fail fast approach, data science teams need to adopt a best practice method of sharing results, methodologies and discovery work – especially when their work is considered a failure. This can be done in many ways, but some of the more effective include regular discussion – similar to agile methodology’s stand-up meetings – and using appropriate software to aid the process.

Software tools exist to facilitate collaboration, issue tracking, continuous documentation, source control and versioning of programme code, as well as task tracking. These tools create a lineage of activities that is permanent and searchable.

If you want to hear more on this subject, why not come to see my presentation ‘My data scientists are failures’ at the Teradata PARTNERS conference in Anaheim this October.

Find out more about the PARTNERS conference.

Hilman Chris_Web_MG_8878Christopher Hillman is a Principal Data Scientist in the International Advanced Analytics team at Teradata based in London. He has over 20 years experience working with analytics across many industries including Retail, Finance, Telecoms and Manufacturing. Chris is involved in the pre-sale and start-up activities of Analytics projects helping customers to gain value from and understand Advanced Analytics and Machine Learning. He has spoken on Data Science and analytics at Teradata events such as Universe and Partners and also industry events such as Strata, Hadoop World, Flink Forward and IEEE Big data conferences. Currently Chris is also studying part-time for a PhD in Data Science at the University of Dundee applying Big Data analytics to the data produced from experimentation into the Human Proteome.

Teradata Database 16.10 Now on Azure and AWS Marketplaces

September 25, 2017

newsGood news! We’ve just published important updates for both Azure and AWS Marketplaces.

This is the first public cloud update in which the Teradata team has aligned solution launch conventions across both Azure and AWS for simplicity.

There are many feature updates. Highlights pertaining to both Azure and AWS Marketplaces:

  • Added support for Teradata Database 16.10; Teradata Database 15.10 continues to be supported. Note that only the Sparse Maps portion of the MAPS or Multiple Hash Maps feature is currently available in the public cloud.
  • Added support for Teradata QueryGrid 2.n, replacing Teradata QueryGrid 1.n. QueryGrid Manager is now available with zero software cost. QueryGrid connectors must be ordered separately.
  • Added manual resizing capability (i.e., scale up/down) for a Teradata Database node.
  • Added support for Teradata Server Management in the Developer Tier.
  • Added the ability to enable Teradata Intelligent Memory (TIM) when deploying a Teradata ecosystem for the Advanced and Enterprise tiers.


Azure Marketplace-specific updates:

  • Increased Azure node limit support to 64 Nodes (with 33-64 nodes under controlled deployment).
  • Changed the 5TB storage configurations from 5 x 1023 GiB to 10 x 512 GiB for DS14_v2 and DS15_v2 VM sizes with premium storage.



See Teradata product listings on Azure Marketplace

See the latest information about Teradata software on Azure Marketplace

See the latest Teradata Database on Azure Getting Started Guide


AWS Marketplace-specific updates:

  • Added ability to enable Teradata Intelligent Memory when launching a Teradata. ecosystem or launching components separately for the Advanced and Enterprise tiers.
  • Changed the port for PUT from 8080 to 8443.
  • Removed the ability to create a new VPC when launching a Teradata Ecosystem.
  • Added ability to enter an existing placement group or configure separate placement groups for Teradata Database, Teradata Data Stream Controller, and Teradata Data Mover when launching a Teradata ecosystem.
  • Added support for using Teradata Access Module for AWS to export data from and import data to S3 storage.
  • Added Server Management to the Test/Dev Ecosystem CloudFormation Template.


See Teradata product listings on AWS Marketplace

See the latest information about Teradata software on AWS Marketplace

See the latest Teradata Database on AWS Getting Started Guide


The updated Data Stream Utility (DSU) capabilities on AWS enable some nifty backup and disaster recovery (DR) capabilities, including S3 – Multiple Buckets, Multiple Regions. DSU can now restore a save set from an S3 region or bucket that is not the same as the original backup.

Here’s the scenario: users may now configure multiple S3 buckets across more than one AWS region via the command line or BAR portlet. This allows users to enable a DR scenario that allows geographic separation of their DR systems, such as:

  • Base system backs up to AWS S3 in region X
  • AWS S3 can automatically replicate stored data between regions X and Y
  • Secondary system in second AWS region Y can have data loaded
  • In the event of a regional disaster, the secondary system can be brought online and customer can resume operations

In other words, this feature enables a user to:
1) perform a back up to S3 in one region
2) allow Amazon to replicate to another S3 region automatically, and then
3) restore the save set to a different Teradata system

Pretty cool!


Read more about Teradata software on Azure and AWS Marketplaces: