Author Archive

 

I have been struggling to reconcile two different thoughts over the last few months and watching a video recently forced me to think about this again. There seems to be a catch 22 between finding value in new data and having the tools and mechanisms justified by the value to find the value. Firstly I see a lot of organisations struggling to get their analytics initiatives underway and sustainable, there are many articles on the web about this. The second is why should organisations have a data discovery capability, is this a marketing term or is there real value in it?

Something occurred to me recently. We were discussing the value chain of analytics, or something analogous to a value chain for analytics in large organisations and exploring how different pieces of data have different value and how this could be used in a BI Centre of Excellence to engage with users. The question was how do you decide when to put new data into the warehouse and what data remains outside the warehouse?

What occurred to me was that a lot of the value of much of the data we were considering had already been established or simply assumed (someone asked “loudly” enough for it and they got it). It was being stored and managed in a data warehouse, it was being accessed by users using various toolsets and although critical to the management of the business is fundamentally operational in nature. The value of a particular piece of data had been established previously and subsequently significant investment had gone into that piece of data to get it into the warehouse on an ongoing basis. That piece of data was now being used to manage and change the business so it was creating impact.

A short digression - because the value of data is determined by what it can be used to accomplish. Data has no intrinsic value, in fact even insight or actionable insight has no value unless it is put into action or changes something. Unless you change the business, a process or an offering in some way the data is merely interesting not important. Analytics teams are often cut off from the business and the ability to impact the business in a meaningful way.

The question became how do you get new pieces of data added to the operational store, the warehouse, because that is how it gets used and therefore that’s how it becomes valuable when you don’t know it is valuable yet? You have to know it is worth something before you integrate it into the data warehouse because there is a significant investment in integrating a piece of data into the warehouse. You have to know it is worth something before you invest in operationalising it. Seems a little confusing because you have to know it is valuable before you make it available to make it valuable, in which case you don’t actually know it is valuable.

Knowing that a piece of data is worth something is also important in justifying analytics teams and getting analytics initiatives up and going. Until you know something has value for a fact, there is no way to build a business case that funds the building of an analytics team. I have seen a number of organisations try address this problem by employing a small team, sometimes an individual, to be the analytics team. Problem solved as it makes it an opex problem, something the business can fund month to month. But this approach struggles to gain momentum, struggles to justify value and is very difficult to build a long term business case around.

These teams are either accommodated in IT where they can often get the data but not the business question or they are housed in the business and struggle to get the data. In addition to this much of the value in analytics comes from looking across the business. Silos tend to be quite good at optimising their narrow world, the value comes from optimising across silos.

So how do you confirm there is value in something, like data, when you are removed from the business process itself?

This is when a discovery platform may be valuable. If you can provide easily accessible analytical ‘sandboxes’ that are both easy to use and can access all types of data you change the problem from being one of funding to one of testing the findings. Currently discovering the value in a piece of data is hard. There is no single technology that addresses all requirements requiring users to employ multiple tools. HDFS and Hadoop is attracting a lot of interest but is not the easiest to use, especially for business users. SQL is positioned as more of a business language but does not access all data structures. So what do you do if you want to find valuable data but are skills constrained?

Someone mentioned to me one of the ways this can be done is using an agile analytics methodology or approach. In my experience of agile, admittedly mostly in software development, “agile” has often become an excuse for no documentation, no objective or no accountability so I have been a little sceptical about anything labelled ‘agile’. Admittedly it has come a long way since I first bumped into agile so decided to test my bias and looked through some articles on an internal website about an analytical agility capability. Don’t get me wrong, I buy the drivers for agility – very short term deliverables, direct business involvement, the output is more important than the governance or methodology so I would like it to work.  This was also about analytic agility not so much an agile methodology.

After reading through most of there was something that did strike me and it revolved around what we call a ‘Discovery Platform’.

The only way to identify new items of valuable data is to experiment and test. Something I have been hearing is “fail fast” which sounds bad but really means test lots of things, do it properly and determine which ones are not going to work fast. Take successful experiments and operationalise them fast. This is what a discovery platform can enable. I would like to get other peoples view but this to me is a way to rapidly test things and determine which data long term should be integrated into the data warehouse.

There is the emergence of the Discovery platform, a set of technologies that makes it easier to integrate multiple sources and types of data while providing a uniform mechanism to access them. Namely SQL. They also provide a means to test the insight in a rapid way and thereby prove value before investing in operationalising an insight. You get to test the value in a meaningful way and evaluate the value before having to invest in making it available.

If anyone has a view on either an agile approach to identifying valuable data or how a Discovery platform can help in opreationalising analytics it would be great to hear some views.

Craig Rodger is a senior Pre-sales Consultant with Teradata ANZ focusing on advanced analytics. He has spent 20 years in the IT industry working on how to get value out of systems rather than getting things into them. Having been a member of a number of executive management teams in software, technology and consulting companies and helping build a number of technology business ventures he joined an advanced analytics vendor.

5 Steps to Making BI Smarter in Big Data Analytics

Posted on: May 20th, 2013 by Sundara Raman No Comments

 

In recent months, I met with the Business Intelligence (BI) teams in different countries to discuss Big Data Analytics. What transpired from the meetings is clear lack of awareness of what Big Data Analytics can do for the BI team and how Big Data Analytics fit within the enterprise data warehousing (EDW). As ambassadors to their business community, BI teams have the opportunity to be at the forefront of new technology trends and be able to articulate the value of Big Data Analytics to business stakeholders.

The Big Data trend has been here for a while and there is no shortage of publically available resources on the subject. However, many of these sources do not seem to allow the audience “to see the wood for the trees”! Also, storage vendors such as Dell and EMC are not helping the situation either by confusing the BI teams with low cost storage aspects in preference over business value of Big Data Analytics. I believe that paying attention to business value of Big Data Analytics will make the BI team not only look smarter in front of the business stakeholders but also make it easier to get funding for Big Data Analytics projects which many of the BI teams are considering as an opportunity to advance their career ambition.  

In the next few paragraphs below I have described in a few steps some essentials of Big Data Analytics in technical terms and how they fit into the enterprise data warehousing ecosystem as unified data architecture (UDA) that supports the next era of analytics and business insights. Many of these examples are related to the airline industry but the principles equally apply  to any industry.   

Step 1: Getting to know the essentials of Big Data

First step to Big Data Analytics is to understand the new technology capabilities such as Map Reduce, Hadoop, SQL-Map Reduce (SQL-MR) and how they fit within the enterprise ecosystem. It is also important to understand the differences in approaches between traditional EDW and Big Data Analytics design, development and implementation processes.

For instance, if you are in the airline industry, you would have designed the enterprise data warehouse for transactional reporting and analysis with structured stable schema and normalised data model.

You probably stored unstructured data such as ticket image, recorded audio conversations with customer service agent and ticketing / fare rules in the database as BLOB (Binary Long Object). Furthermore, you may have found it difficult to write in declarative SQL language the complex business rules such as financial settlements of inter-line agreement from code share arrangements, open jaw fare rules, say between Zone 1 and Zone 3, and business rules for fuel optimisation; so, you may have resorted to procedural languages such as user defined functions (UDF).

But UDFs have numerous limitations that MapReduce, more specifically, SQL-MapReduce (SQL-MR) makes it easy to solve while allowing for high performance parallel processing.

- What if you are able to use MapReduce API (Application Programming Interface) through which you can implement a UDF in the language of your choice?
- What if this approach allows maximum flexibility through polymorphism by dynamically allowing determination of input and output schema at query plan-time based on available information?  
- What if it increases reusability by enabling inputs with many different schemas or with different user-specified parameters?
- Further, what if, SQL-MR functions can be leveraged by any BI tools that you are familiar with?

As you can guess, SQL-MapReduce (SQL-MR) overcomes the limitations of UDF by leveraging the power of SQL to enable Big Data Analytics by performing relational operations efficiently while leaving non-relational tasks to procedural MapReduce functions.

You will see some examples of this later but, first and foremost, what is MapReduce? MapReduce is a parallel programming framework invented by Google and popularised by Yahoo!.MapReduce enables parallelism for non-relational data. By making parallel programming easier, MapReduce creates a new category of tools that allows BI teams to tackle Big Data problems that were previously challenging to implement. It should be noted that unlike the core competency for parallelism of the Teradata’s relational database technology over the last 30 years, MapReduce is not a database technology. Instead, MapReduce relies on file system called Hadoop Distributed File System (HDFS). Both MapReduce and HDFS are the open source versions of the Big Data technologies.

Step 2: “Hello World” welcomes you to the world of MapReduce with “Word Count”

Let’s take look at how Hadoop MapReduce works! When you wrote your first program you may have tested it to make sure “Hello World” works by printing / displaying the words correctly. With MapReduce, you will most likely to be testing Word Counts in your MapReduce program.

A MapReduce (MR) program essentially performs a group-by-aggregation in parallel over a cluster of machines. A programmer provides a map function that dictates how the grouping is performed, and a reduce function that performs the aggregation.

Let’s say that you want to create a Book Index from Big Data Analytics for Dummies. When writing your MR program, you will provide a map function that dictates how the grouping is performed on paragraphs containing words, and a reduce function that performs the aggregation of the words to produce the book index. The MapReduce framework will assume responsibility to distribute the Map program to the cluster nodes where parts of the book is located, processed, and output to intermediate files.  The output of the map processing phase is a collection of key-value pairs written to intermediate flat files. The output of the reduce phase is a collection of smaller files containing summarized data. The key-value pairs of words above are reduced to aggregates that produce the book index.

Because the MR program runs in parallel you will notice tremendous increase in reading (e.g. grouping of paragraphs from Big Data Analytics for Dummies) and processing speed (e.g. summarising and aggregation of key-value pairs) that would impress even Johnny 5

Creating an index list of words and counts from Big Data Analytics for Dummies may not be terribly interesting or useful for you but, the capability of such key-value pair generation from any multi-structured data sources can be put to analytical use by creating a set of useful dimensions and measures that the BI teams are familiar with that can be integrated with data in the EDW. Perhaps, instead of creating the Book Index, you may choose to create an index of all flight numbers, origins and destinations from the booklet of an airline time table which you may find more useful in the airline business.

Step 3: Putting MapReduce to solve business problems

Long gone are the days of GSA’s (General Sales Agents) enjoying hefty sales commissions from the airlines! The market is highly competitive and organisations are looking for best decision possible from analytics. With ubiquitous availability and convenience offered by broadband connections, customers’ attitudes and behaviours are rapidly changing. Now customers are looking for best travel and holiday packages online. They are also listening to the opinions of their friends and public remarks on social network forums. Interestingly, this is also instrumental in rapid rate at which huge volumes of data is generated, opening up the need for Big Data technologies.

What if we could utilise the multi-structured data from click streams, Facebook, Twitter data for improving business performance? What if we are able to extract the IP Address from the click stream data and correlate with the profile of the customer from EDW along with best fare for the Round The World Travel deal that the customer is looking for? What if we are able to extract the sentiment of the customer’s travel experience from Twitter and Facebook data and use the positive / negative experience to provide the Next Best Offer during the customer’s next inbound call to the agent or online visit?

Step 4: Integrating unstructured and structured data for Big Data Analytics

Here we consider how the integration of multi-structured data in MapReduce and structured data in EDW can be used for improving business outcome. You will see that instead of the MapReduce program for Word Count that you wrote previously, you will write a new MapReduce program to extract the key-value pairs for IP Address, flight deals and any other relevant information from the Apache Weblog files where the customer’s online interaction is recorded. In a later paragraph I will describe how the MapReduce program you wrote is invoked in SQL by means of SQL-MR or better still how you can leverage several pre-built functions (without having to write your own MapReduce program) using SQL-MR. For now, let’s assume the extracted data from MapReduce is created as a table in the EDW. The extracted IP Address can then be joined with Master Reference in the EDW to identify the User ID which is then used to match the frequency of online visits and lifetime value of the customer etc.

 

 

Step 5: Flying high with SQL-MR (SQL-MapReduce)!

While MapReduce is good for solving Big Data problems it can cause a number of bottlenecks, including the requirements to write software for answering new business questions. Trying to exploit data from HDFS through Apache Hive is another story; let’s not even go there! SQL-MapReduce (SQL-MR) on the other hand helps to reduce the bottleneck of MapReduce by allowing maximum flexibility through polymorphism (by dynamically allowing determination of input and output schema at query plan-time based on available information). It allows reusability by enabling inputs with many different schemas or with different user-specified parameters. More importantly, you can exploit all types of Big Data using the BI tools that you and your business analysts are familiar with.

Here you will see examples of how you may use the SQL-MR function text_parser (with just a few lines of code) to solve the word count problem / creation of a Book Index for Big Data Analytics for Dummies / extraction of IP Addresses from online clickstream data. You will notice reusability of the SQL-MR function that enables inputs with many different schemas and with different user-specified parameters to create output schema at query time.

You will find that SQL-MapReduce (SQL-MR) provides excellent framework for jump starting Big Data Analytics projects with substantial benefits, viz. 3 times faster in development efficiencies, 5 times faster in discovery and 35 times faster with analytics. My colleague, Ross Farrelly, demonstrates with an example of how to reduce the pain of MapReduce ,which will be of interest to you as well. You can see how SQL-MR provides an excellent framework for customising / developing SQL-MR functions easily with an Integrated Development Environment (IDE).

Exploring and discovering value from Big Data is how you will divide and conquer the volume, velocity, variety and complexity characteristics of Big Data. You will also gain great benefits from seamless integration of the different Big Data technologies as a Unified Data Architecture (UDA) to provide advanced analytics.

Here is another business use case that the SQL-MR functions nPath and GraphGen solve elegantly and efficiently compared to either SQL or MapReduce. Try writing this in SQL or MapReduce and notice the difference! The business problem that we are trying to solve is related to identifying the more frequent customer activities or sequence of events that lead to disloyalty.

You can see from the chart below that of all the different channels that customers use to buy airline tickets, the online channel leads to unsuccessful ticket sale. By visualising the sequence of all customer events you will notice that the Online Payment page is where abandonment occurs (i.e. noticeable from the thick purple curved line that indicates the strength of the path segment) which provides insights about the issues with the online channel. By taking corrective actions ahead of the online payment event step you will create customer loyalty and growth in sales.      

Here is the SQL-MR code for the above visualisation of ticket purchase path analysis:

If you are all set and ready to go on your first class journey with Big Data Analytics then, check-in here .While ‘inflight’, treat yourself with ‘cocktail’ of analytical functions from a wide ranging selection of 70+ pre-built SQL-MR functions .

Travel smart, impress your accompanying business stakeholder, double your rewards from analytical outcomes and enjoy your journey with Big Data Analytics! By the way, don’t forget to drop me a note, if you found this useful! Bon voyage!

Sundara Raman is a Senior Communications Industry Consultant at Teradata ANZ. He has 30 years of experience in the telecommunications industry that spans fixed line, mobile, broadband and Pay TV sectors. At Teradata, Sundara specialises in Business Value Consulting and business intelligence solutions for communication service providers.

 

Open Access to Big Data a Major Driver of Value for eBay

Posted on: May 14th, 2013 by Ross Farrelly No Comments

 

At Teradata’s Big Data Analytics Summit, held recently in Sydney, Alex Liang of eBay (Director of the offshore Analytics Platform and Delivery) presented on their big data ecosystem. It should be noted that his description has to be taken in the context of eBay – a company in which their business is their website and is a marketplace on which they aim to match customer’s desires with seller’s products. This takes place on a massive scale with a requirement for 99.9+% availability.

Nevertheless, despite this challenging environment, eBay is committed to the democratisation of data – that is, making data available to large number of employees to query, predict and experiment. To this end they have a decentralised data management system which allows employees to create a virtual datamart, ingest data, identify a trend or gain an insight, form an hypothesis, design an experiment to test that hypothesis, implement that experiment on the eBay website (via A/B testing), measure the results and undo the changes if necessary – all with a great deal of autonomy.  However, with freedom comes responsibility, so employees using data in this way are also responsible for the results they generate from their data analyses.

Can this approach be applied to other companies? As a general principle it can – the philosophy of allowed a larger number of users access to the valuable data held by company can, if implemented well, lead to outstanding results. The approach of allowing users to run free on the data (within certain well defined limits of course) only reigning them in when they approach those limits, can allow companies to exploit the value of big data in a dramatically improved manner. There is a profound philosophical difference between giving users wide access to data and only place restrictions where needed as opposed to starting with a very limited access and adding to it if and only if there is a compelling business need (somewhat analogous to difference between continental and English common law systems).

Has this philosophy of making data as widely available as possible taken root in Australia? Based on the number of questions asked during the conference about how to restrict access and monitor behaviors I would say we still have a long way to go. Of course there is a need to balance the free access to data with the need for appropriate restrictions and Alex outlined eBay’s approach to implementing those restrictions including: permissions, automated monitoring, automated retirement of cold data-marts, productionisation of hot data-marts and the need to pass an exam to get access to Teradata. But it is informative that most questions were about how to manage restrictions rather than on the benefits of open access to data.

Another interesting use of data eBay is the meta-analysis of the queries being submitted. Alex described a program they have to use python to analyze the queries being submitted to Teradata, Singularity and Hadoop. The aim is to identify sub-optimal queries, but also to identify commonly requested information and to develop more efficient ways to deliver this to users. This is an example of a growing trend of data generating data – the data in the warehouse for example indirectly generates or causes users to write queries which themselves then become data which can be analyzed.

Ross Farrelly is the Chief Data Scientist for Teradata ANZ who is responsible for data mining, analytics and advanced modeling projects using the Teradata Aster platform. He is a six sigma black belt and has had many years of experience in a variety of statistical roles.

 

Live Blog: eBay – Know What Your Customers Want

Posted on: May 9th, 2013 by Greg Taranto No Comments

 

Alex Liang from eBay (Session 2 – Know what your customers want)

Alex took us through eBay’s core business model being the creating of markets and bringing buyers and sellers together.  He talked us through the legacy approach of matching buyer searches with merchant listings.  Then he showed us how they applied new behavioural analytics to improve the results of the legacy approach.

eBay is all about connecting Consumers and Merchants together.  eBay is just a website with a bunch of servers with code in those servers behind it.  Merchants come to the website and list and buyers come and find listings and buy – a great business model!!

But how does eBay continue to draw in more merchants and buyers and provide them with a positive experience?

Buyers finding the product they are looking for and Merchants selling as much as they can quickly.

eBay does not really know about the products that merchants are listing – all they have is a description.  eBay had to create their own category and brand hierarchies using listing descriptions – pretty cool given many items are unbranded (especially in China).

eBay usually knows very little about the buyers – what are they really looking for?  What brought them to eBay?  What price sensitivities does each have?  Alex used the analogy of a Pictionary game where the buyers and the sellers are drawing free-form pictures and eBay needs to try to “guess” what they are drawing.  That is really a good way of thinking about most business problems that require analysis or analytics to plug the unknowns and solve the problem.  Looking at the drawings (or footprints) to start guessing and then eventually solving the problem.  The degree to which eBay solves this problem and accurately determines buyer needs and merchant product attributes defines their business success – so it is critical to eBay.

The legacy method eBay used to match buyers to merchants is called Keyword Co-occurrence.  Fundamentally this involved:

  1. Taking the buyer search and removing all the noise data to come up with relevant key words.
  2. With the listings, again de-noise them and pull out key terms and brands
  3. Match the keywords in the searches with the listing brands and category names and perform a brand/category->token mapping.

This approach provides a very logical, pragmatic approach to matching buyers and merchants and was very successful.  But obviously there are always going to be misses where merchant descriptions are not high-quality and buyer searching is immature.

The new approach (method 2) that Alex outlined focused on using historical buyer behaviour and merchant sales to direct the search-to-listing matching.  It uses event counting – counting events which relate search words to particular merchants or listings which create brands.  This seemed like a sort-of affinity scoring technique to tie these search keywords together with listing brands etc.

Just some of the benefits of this new approach:

  • Increase Inventory for Hot Products – which products have high demand?  So we can have merchants increase their inventory.
  • Reducing inventory for Cold Products.
  • Sourcing new products for items being searched for that are not held in Inventory by the merchants.

Alex then reminded us: "A lot of things that were impossible 5 years ago are easy today".  This new method is a case in point – the tools and technologies were not available to perform this sort of deep behavioural analytics on eBay’s massive volumes of data.

Alex also said: "Forget about the complex.  These days usually the simple analytic functions can bring amazing value".

Thanks again Alex for this deep walkthrough of how eBay achieves extreme optimisation of its core business function.

Greg Taranto is a Pre-Sales Consultant at Teradata ANZ. Greg specialises in designing and tailoring Data Warehouse solutions for organisations across many industries. Greg's extensive background in Data Warehouse Architecture, Design and Implementation along with his business solutions experience allow him to bring many worlds together to achieve optimal results for Teradata's customers and prospective customers.

 

Mark Fazackerley, the MicroStrategy VP of Australia and New Zealand is presenting at the Teradata Big Data Summit 2013 at Crown in Melbourne. Mark is presenting on the MicroStrategy Platform to Support Big Data.

Mark is covering Big Data and the use of MicroStrategy to meet these demands with a specific example of the use of MicroStrategy for a project with a large population of Facebook users data which can be analysed and enriched to derive further insights and trends.

Mark sees an emerging 5th Wave of IT, that has developed from mainframe computing into mini computing then personal computing. In recent times we have seen the internet as the fourth wave of IT and more recently mobile as the fifth wave.

This is happening in tandem with wider trends and disruptive technologies, primarily:

-       The explosion of Big Data

-       Mobile Wave

-       Migration of information to the Cloud

-       Emergence of application networks

Focusing on Big Data with the use of mobile we use our smartphones 75% of the time browsing and interacting with apps rather than for voice calls so we are interacting more with each other and with businesses. Mark asks how do we therefore grapple with the deluge of more sophisticated systems and with the accelerated growth whilst getting the value from the information?

Mark asks whether the use of ‘big data’ and data to drive your business is a differentiator and quotes research at HBR 2012 that states companies using a data-driven approach are 5% more productive and 6% more profitable so the use of data to drive your organisation absolutely does make a difference.

MicroStrategy sees the data analysis continuum moving from standard reporting, into ad-hoc analysis, to dashboards through to data mining and predictive analysis. This combined with an increase in self-service expectations from the business users means we have to prepare for this ‘new-norm’.

Three steps necessary for big data to be successful in order to prepare for this new-norm are:

  1. Multiple data sources – to provide agile, responsive access to data
  2. Prediction and optimization models – focus on the biggest drivers in performance
  3. Organisational transformation – create simple understandable tools and processes

MicroStrategy sees an upsurge in visualization techniques for data discovery; businesses need intuitive visualisation to make data understandable and accessible to business users. MicroStrategy helps users quickly discover insights from their data using compelling visualisations. Visualisation techniques to make this widely available are key – this can include bubble maps, map networks, heat maps, network visualisation, graph matrix and density maps. These visualisation techniques within MicroStrategy provide the business users with the ability to discover insights within their data.

Mark provides an example of a big data project MicroStrategy has been involved with, this project looks at data within Facebook. The value of data assets in Facebook is key, Large amounts of data was used from Facebook and this can be grouped into four categories of data, demographic data, universal registry (contact), social graph and interest graph (likes).

Some of the analysis MicroStrategy performed showed that interest analysis for Facebook data can be truly powerful for businesses; affinity calculations divulge the strength of relationships between social networks. Another area of analysis was comparative analysis which lets you gauge similarities and differences between multiple customer segments. The MicroStrategy dashboards illustrated aspects such as fans of your brand vs those of competitors and the ability to assess which retailers are geographically aligned with their target customers.

Mark concluded that MicroStrategy can provide the necessary software enablers to better visualise and discover the value of the data within your organisation.

Steven Lawton is a Senior Solution Architect for Teradata based in Melbourne where is responsible for technology & architecture within Teradata across Australia and New Zealand.

 

 

eBay's Alex Liang (Director of the offshore Analytics Platform and Delivery team based in Shanghai) was keynote speaker at today's Teradata Big Data Analytics 2013 conference.  Alex has a unique view of one of the world's largest analytics and data platforms, and his views and experiences are likely to be leading indicators for the rest of us as our own analytics platforms grow over time.

Alex talked about how eBay has taken the lessons learnt in managing their massive website (with more than 50,000 product categories and more than $3,500 worth of products sold every second) and turned that learning on their internal analytics capability.

eBay has grown to be the world's largest selling website, with more than 50,000 product categories sold through their platform.  Over eBay's lifetime, eBay has been continually innovating the website design to be as user-friendly as possible.  Most recently, eBay has released a capability of analysing each user’s previous behaviour on the website to predict which categories and products are most likely to be of interest and pushing those products to the home page for that user when they next return.

Turning that kind of thinking back into eBay discovered a similar level of complexity existed in the reporting and analytics world. eBay has three BI platforms, each positioned to support a particular type of analytics. eBay has more than 5 primary BI presentation tools.  In total, the integrated analytics environment has more than 100,000 data elements, more than 90PB of stored data and tables with more than 3.5 trillion rows of data in them. This environment was not easy to navigate for the 12,000 internal BI users, who ranged from Data Scientists to casual users who just want regular reports.

Whilst some organisations could have approached the problem by clamping down, dictating BI standards, building controls and restricting usage, eBay's culture is to provide data to the people who can best use it, but in a supported way.

The solution implemented by eBay to resolve the analytics complexity has 3 aspects to it:

1)   eBay confirmed a multi-platform strategy.  They use their EDW platform for corporate BI standard reporting.  They use their 40PB+ Discovery platform (ironically named "Singularity") for website behaviour analytics, and they use their 40PB+ Hadoop cluster for technical analytics such as counterfeit detection, image classification and related non-SQL analytics.

2)   eBay built a Data Hub to provide a central information platform for access to all analytics and information, regardless of which BI platform is used to support it.  This information portal has been configured to drive collaboration between analysts with explicit sharing of the analytics that have been built by anyone in the company. It provides definitional information about each report and can be searched or browsed by category.  The web design borrows heavily from the eBay website and the lessons that have been learnt there have made it easier for eBay's analysts to find the report they are searching for.

3)   eBay developed an integrated dashboard hub.  Each dashboard that has been built is available to any authorised user to drive re-use of any innovations. Common definitions are provided for dashboard metrics, common tools are used across all dashboards for ease of use and wiki sites exist to provide metric definitions, including data lineage and SQL queries used to process the data. Data visualisation is a key focus, as is the ability to drill into details when appropriate

eBay, as are all of us, are on a journey and have not reached the end.  Alex Liang's view of the future of analytics includes:

- Adding In Memory processing to further extend the existing multi-platform architecture.  In Memory processing in eBay will be to provide restricted data sets at memory speeds for very rapid response times to predictable queries.

- Developing machine learning techniques to drive further value from the massive store of detailed data will be a key focus

- Self Serve BI remains at the heart of the eBay culture.  Providing the data and analytical capability to those that drive the business is a core objective and will continue

- Collaboration between analysts will drive extra value through leveraging innovation that might other be isolated in a single department.

- The future will be live, not looking back at the past.  Real time data loading and real time analytics, coupled with forecasting and predicting future events, will lead to even higher value being delivered by the analytics platforms.

Alex Liang's presentation makes it clear that eBay has learnt the techniques required to tame complexity and have developed an analytics infrastructure that is accessible to all users and provides the right reports, metrics and tools to provide self-service BI capability to the entire organisation.

David Stewardson is a Senior Consultant in the Teradata Solutions Group. He has a very strong technical background and business acumen with over 23 years’ experience in the Data Warehouse business, specialising in Business Intelligence. During his extensive career, he worked in 6 countries, across 8 different industries (including Mining, Finance and Insurance, Utilities and Telecoms) and has been responsible for managing teams of varying sizes from five up to 150 in previous Business Analysis, Project Manager, Program Manager and Program Director roles.

 

In the medical field, an ophthalmologist and optometrist perform visual acuity tests to check vision as well as structures and functions of the human eye. The metric 20/20 represents normal vision and indicates the sharpness or clarity of vision at a distance. 20/40, on the other hand, indicates short-sightedness (Myopia).

It is now time for a vision test to validate your technology strategy for data warehousing and analytics to check if it is aligned with the enterprise business strategy. Let’s see why now?

Industry convergence is rapidly changing the market dynamics. Making sense of the dynamic changes requires a clear vision and an ability to make the best decision possible when at the cross roads. Dynamic changes are also happening in data warehousing where Big Data technologies are playing a crucial role in the convergence of traditional data warehousing into Unified Data Architecture (UDA) to support discovery and advanced analytics. Failing to recognise the impact of the change often leads to myopia resulting in wrong decisions and ultimately the demise of the enterprise as evidenced from history. Let’s see how.

eCommerce (Electronic Commerce) and mCommerce (Mobile Commerce) is where rapid convergence is taking place now, driven by the convenience offered by the broadband connected smart devices that is changing customers’ attitudes and behaviours. Smart devices provide a window to the world of mobile commerce! Interestingly, eCommerce and mCommerce are also instrumental in rapid rate at which huge volumes of data are being generated, opening up the need for Big Data Analytics. Traditional retailers, bankers and telcos are all trying to get a grip of this change in contending for a share of the customer’s wallet while attempting to protect their market share from being taken away by the adjacent market players and non-traditional players such as eBay, Amazon and Google.

Let’s take a look at how this convergence is impacting the traditional ecosystem where the ATM has been the window to bankers’ customers; the POS terminals for the retailers; and the TV for broadcasters and Pay TV providers.

Visualise a scenario wherein the strategic planners of the mature industries are pondering about the future while their customers walk right into the retail store, take their smartphones out of their pocket, scan the bar code of the item that is meticulously stacked up on shelf and compare prices of the item on eBay, watch a YouTube demo of the product, decide to buy the item 5% cheaper at eBay and pay for it through PayPal - all within  the comforts of the brick-and-mortar shop – and end up walking out of the store feeling happy about their frictionless Over The Top (OTT) shopping experience! Not to be left out, the strategists at the telco gaze at the balance sheet watching the top line and bottom line dwindle while the OTT players are gaining their customers’ mindshare, leaving the telco as a mere “dumb pipe” carrier!

Sounds grim! Yes, it is real; you only need to look at eBay’s Red Laser, Amazon’s Price Check, Square’s non-merchant payment system and how Venmo enables splitting a restaurant bill at social dining occasion and peer-to-peer money transfer that leverages the Facebook friends-circle, to get a sense of how these are all shaping the future and shaking the industry ecosystem. All this happens on a single window to the world of commerce that is in the hand of the consumer – the Smartphone!

It is probably unthinkable for 100+ year-old companies such as Kodak, the Apple and Google of its time, to be at the verge of collapsing under its own weight . In 1975, Kodak invented the digital camera, the same year that Theodore Levitt wrote his famous article Marketing Myopia in the Harvard Business Review, but Kodak executives did not appear to have read it! It’s a pity, Kodak did not see the success of its own innovation, left it for its competitors to claim the glory (possibly because Kodak thought it was still in the chemical business and failed to see the shape of things to come!). Kodak was not alone; Polaroid, Borders, Blockbuster et al. have been recent victims of their own myopia!

Theodore Levitt talks of the fateful times for several industries leading up to the 1970s (i.e. railroad, electric utility, grocery store chains, the Hollywood et al.).

Several industries are at cross-roads now, more than ever before. What business is the telco in? Are they in communications or media or retail or banking? For that matter, what business are the brick-and-mortar retailers in? More importantly, what signals are they getting from their consumers about changing behaviours? Are these signals falling on deaf ears or going over the top? There is always someone else who is listening to the signals attentively and taking timely action. This time around it is the OTT players who are paying attention and taking charge!

Businesses have a choice. Looking into the crystal ball is an option, but that is sure to miss the fingerprints that your customers leave behind with every interaction with your company. Which way you go depends on your ability to connect the dots! Looking at what happened is useful, but analysing why it happened and more importantly, why it did not happen compared to industry norms will provide greater foresight! Checking the rear-view mirrors often helps notice the blind spots as well as to attain a sense of the frustration your customers may be experiencing during the long bumpy ride with your business and may help to prevent them from jumping off the bus at cross-roads!

For those in the industry who deny existence of a problem and for the nay-sayers who ignore significant trend in their business and technology landscape, I would like to remind them about a quote by Theodore Levitt, “If thinking is an intellectual response to a problem, the absence of a problem leads to absence of thinking”. No doubts, agility and discovery from Big Data Analytics help to uncover problems before they surface. Big Data Analytics is not only intellectually stimulating but also promises to see future obstacles on the road for navigating a safer driving experience – or even for survival!    

Want to align your business and technology strategy? Are you at cross-roads on Big Data? Interested in 20/20 vision on Big Data? Betting on your bid for Big Data? Why not attend the Big Data Summit to see and hear for yourself how the adjacent market leader eBay is enabling business innovation with Big Data and extreme Analytics?

Melbourne - May 7th, 2013

Sydney - May 9th, 2013

Sundara Raman is a Senior Communications Industry Consultant at Teradata ANZ. He has 30 years of experience in the telecommunications industry that spans fixed line, mobile, broadband and Pay TV sectors. At Teradata, Sundara specialises in Business Value Consulting and business intelligence solutions for communication service providers.

 

During the recent Teradata EMEA Universe data warehousing conference, it was announced that after an extensive search, SIEMENS has selected Teradata as their strategic partner to deliver big data reporting platforms for SIEMENS products in the Smart Grid arena.

This was widely announced at the time (view the Siemens news release , video interview and the Teradata press release). What wasn’t discussed much at the time is what solutions are available now.  I’ve been lucky enough to visit the Siemens Smart Meter office in Vienna and had a firsthand demonstration of the initial 2 use cases that have been delivered by the Siemens/Teradata Partnership.

The first use case is based on integrating data from SIEMENS’ Outage Management System (part of the SIEMENS SCADA Suite) into the Teradata data warehouse (Refer to the text  below if you would like to know more about what a SCADA system is).  Teradata and SIEMENS have loaded this data into Teradata tables which have been designed using the Teradata Utilities Logical Data Model as a reference.  Additionally, network asset information from the OMS has been integrated into the same data model.  Using the combination of SIEMENS’ knowledge of the value of SCADA and Teradata’s operational and analytic reporting knowledge, reports have been developed including detailed SCADA-based outage reporting and network status analytics.

The second use case is based on the data available in the SIEMENS Smart Meter Data Management System, EnergyIP.  EnergyIP receives meter data (both events and meter reads) from the eMeter Meter Head-End system and other sources, and supports data validation, data integration and data storage functions.  It also supports passing data to an external Analytics Foundation environment.  If a Utility decides to leverage Teradata as scalable and high performance data integration engine under the Analytics Foundation, reporting such as Revenue leakage, SLA compliance for meter reading responses, network event analysis, outage analysis and load monitoring become available basically out-of-the box with a much shorter time to value than a standard reporting project.

Combining the two use cases into a single environment on Teradata would allow unparalleled levels of analysis of events, load monitoring and network status reporting through integration of back-end SCADA information with front-end smart meter information.  Long term trend identification, real-time event alerting, and regular network analysis all on one integrated platform.

I am eagerly waiting what use case number 3 will deliver for even more business value!

What is a SCADA System? 

SCADA, for those that have not come across it before, stands for the Supervisory Control and Data Acquisition system, which is the heart of the Smart Grid DMS (Distribution Management System) architecture. SCADA uses field-based devices to measure a wide range of operational metrics (voltage, amps, temperature, operational status etc.) and pass that back to a central hub for operational oversight and management of a smart grid network.

What is a Meter Head-End System?

A head-end system is hardware and software that receives the stream of meter data (meter events and meter readings) brought back to the utility through the smart meters. Head-end systems may perform a limited amount of data validation before either making the data available for other systems to request or pushing the data out to other systems.

RSVP for the Teradata ANZ 2013 Big Data Summit featuring a keynote address from eBay. For further event details and to RSVP click on the below links.

Melbourne - May 7th, 2013

Sydney - May 9th, 2013

David Stewardson is a Senior Consultant in the Teradata Solutions Group. He has a very strong technical background and business acumen with over 23 years’ experience in the Data Warehouse business, specialising in Business Intelligence. During his extensive career, he worked in 6 countries, across 8 different industries (including Mining, Finance and Insurance, Utilities and Telecoms) and has been responsible for managing teams of varying sizes from five up to 150 in previous Business Analysis, Project Manager, Program Manager and Program Director roles

10 Steps to Making Big Data Work For You

Posted on: April 18th, 2013 by Sundara Raman 1 Comment

 

A recent research by Analysys Mason has identified that a good proportion of telecommunication companies (Telco) do not have a strategy for Big Data. This is probably because as with any new technologies, Big Data initiatives (or lack of it!) seem to suffer its own share of fear uncertainty and doubts (FUD). Here are some lessons learnt from a recent pilot project that I led which may throw some lights on how to remove the FUD and to start realising the benefits of Big Data. My recent experience with Big Data has been in a project involving complexities and data velocity arising from several terabytes of network data per day in a telco’s Next Generation mobile network. The lessons learnt would equally apply to similar Big Data projects in any industry.

Step 1: Experiment with Big Data to Gain Useful Insights

As with a heap of Lego® bricks, the data in Big Data may appear as a jumbled pile of meaningless lot. However, much as with the Lego® bricks, Big Data offers unlimited open-ended possibilities for assembling and connecting in many ways, to provide meaningful insights that will be of value to the business. Start your Big Data initiatives with a confidence of gaining useful insights!

Step 2: Find a Business Sponsor with a Vision for Big Data

Big data projects are not technology for the sake of technology. The success of Big Data projects will be dependent on finding business value. Look inside your organisation to find a business sponsor who will have a vision of translating what appears to be worthless data into valuable asset. The project I was involved in had a few business champions in the Customer Experience Management, Network Planning and Operations who were the driving force behind this project. The sponsors set their goals to deliver superior customer experience by understanding patterns of service usage and behaviour from the mobile network.   

Step 3: Identify Business Value from Use Cases

If you are on the technology side of Big Data, then you will need to communicate technology capabilities of Big Data in a way that the Business sponsors will understand. The best way I found was to identify business use cases that are likely to result in business value. We developed an initial set of 15 use cases that directly addressed the problems faced by the telco that could be solved with Big Data technologies. The use cases contained such information as what business problem the use case will solve, what data sources are needed, what are potential business values etc. The sponsor then selected 3 primary use cases out of the 15 for this project.

Step 4: Don’t be Afraid of Making Mistakes

You may be experimenting with data that you may not have attempted before! Experimentation or discovery allows you to ‘fail fast’ early in the project phase, so costly mistakes are avoided down the line where the project would have incurred great deal of expenses. The great thing about discovery is that anything constructed can be taken apart and the data and pre-built functions can be reused to develop other use cases. As we experimented with voice call detail records, we found that the same data could be used for fraud prevention analytics that the organisation found of immense value.

Step 5: Use SQL-MapReduce Framework for Time-to-Market  

You are probably not doing any justice to a Big Data project without involving Apache Hadoop and MapReduce! While Map Reduce is good for solving Big Data problems, we found SQL-MapReduce (SQL-MR) provided excellent framework for jump starting the project with substantial benefits, viz. 3 times faster in development efficiencies, 5 times faster in discovery and 35 times faster with analytics. Why struggle with ground-up development with MapReduce when SQL-MR provides a framework for accessing the pre-built MapReduce functions with SQL using any standard BI tools that the business analysts are familiar with? If you have to develop a new function then again SQL-MR provides an excellent framework for customising / developing functions easily with an Integrated Development Environment (IDE).

Step 6: Don’t throw the age-old data governance principles 

Big Data is generally associated with unstructured data, no schema, etc. Unstructured or no scheme does not translate to ‘cannot manage’ / ‘do not need to manage’! On the contrary, we found that this is all the more reason to establish the age-old data governance principles we have learnt with structured data. In fact, as we entered the pilot project, we knew fully well that what we discover would be put into production. Therefore, right from the start, we established architecture principles of seamlessly integrating the Big Data with the Enterprise Data Warehouse, prevailing Data Integration tools and the surrounding control framework.

The result: Reduced total cost of ownership (TCO), investment protection and best return on investment (ROI). We had several pre-built SQL-MR functions for data transformation at our disposal that could be used for extract, load and transform (ELT) functions. This included Log Parser for Apache web log formats. We made use of some of these tools and reserved the Log Parser for the next phase of the project that will deal with web data. 

Step 7: Watch out for that Data Quality Issues

The data inside Big Data often has problems with missing data and other data quality issues. As mentioned earlier, as part of data governance, data analysis is a pre-requisite for data manipulation and discovery. Understanding the semantics of the data is key to deriving value from it. In this project, we found a few data quality issues such as secure packet data missing some key values. We were able to overcome this problem with a pre-built SQL-MR function called nPath that allowed traversing back in time in a single pass to back-fill the missing data using corresponding values that came in later during processing. We found this to be an elegant solution that would have required multiple passes in processing with tradition SQL. By using one or more of the pre-built SQL-MR functions we were able to achieve greater efficiencies in this project, viz. quick turnaround time, less processing load and rapid discovery.

Step 8: Discover greater value from Advanced Analytics 

Big Data technologies allow you to achieve higher business value from advanced analytics that had been difficult to achieve in the past due to technology and processing limitations. SQL-MapReduce removes such log jam by allowing faster processing of iterative analytics. For instance, in this project, we were able to use the pre-built SQL-MR function called degrees to perform social network analysis. As telecom service providers are embarking on their journey towards capitalising on Web 2.0 / Telco 2.0, such Big Data technology capabilities make it easier for them to achieve their goals and allows them to differentiate from competition.

Step 9: Enjoy Big Data discovery! 

Big Data projects can be enjoyable as I found them to be. There is a lot to learn. Working on a project like this offers the most effective way to translate learning to outcome! It is fun and a win-win!

 

 

 

 

Step 10: Share the success! 

Share the success of the outcome from the use cases that you experiment with in the Big Data project. This will pave way for providing confidence to the rest of the enterprise that Big Data is all about delivering business value! Funding for future Big Data projects will then be a ‘no brainer’!

Much as I have done on this project, if you are serious about travelling on the fastest lane to get value from Big Data then, there are plenty of resources available in http://developer.teradata.com/aster that you will find of immense value. Good luck with your Big Data project! More importantly, enjoy as it is fun learning and working with it!

RSVP for the Teradata ANZ 2013 Big Data Summit featuring a keynote address from eBay. For further event details and to RSVP click on the below links.

Melbourne - May 7th, 2013

Sydney - May 9th, 2013

Sundara Raman is a Senior Communications Industry Consultant at Teradata ANZ. He has 30 years of experience in the telecommunications industry that spans fixed line, mobile, broadband and Pay TV sectors. At Teradata, Sundara specialises in Business Value Consulting and business intelligence solutions for communication service providers.

Data and the 2nd Half of the Chessboard

Posted on: April 10th, 2013 by John Berg No Comments

 

Recently I enjoyed listening to a spirited discussion around the fact that the computing age is entering the “2nd half of the Chessboard”.  I found the discussion quite good, and I believe there are lessons to be learned as it applies to data and the ability for organisations to adapt and react to the volumes ahead.

In order to shed some light into this, let me first explain the concept of the 2nd half of the Chessboard as it applies to technology.  It is a bit of an urban legend, but it sheds some light into the concept of exponential maths.  Long ago, there was a ruler whom was presented the game of chess for his approval.  He was so enamoured by the game that he asked the inventor what he wanted in payment.  The inventor replied simply that he would only take his payment in rice, and that the payment should be a grain of rice for the first square, two grains for the second, four for the third and so on and so forth with a doubling for each square.  The ruler considered the request and granted the inventor his payment.  Later, the treasurer informed him that he was unable to complete the calculations and that it would require more rice than was in the entire kingdom.

 The pile would be larger than Mount Everest which exceeds the 2010 production of all rice on Earth by a thousand fold.

In computing, we have been observing Moore’s Law.  Originally, it was holding true that the number of transistors in a computer processor would double every 18-24 months.  More recently we have come to expect that this translates to computing power and this has held true for over 60 years.  If we correlate this to the Chessboard, we are somewhere around square 32 which means the power of today’s CPU is about 4.3 billion times more than the first computer.  Here’s where the big numbers come to play, or as stated above – the 2nd half of the chessboard.

In 2 years, a CPU will have double that power (8.6 billion times more powerful than the first).  This means that all of the technology of the past 60+ years will be superseded.  These numbers are staggeringly large!

Data and data related technology is not on the same slope as Moore’s law; however it is starting to show the signs of that same type of exponential growth.  Computing devices are found almost everywhere, and they are generating data with greater frequency.  Sensor data, monitoring systems in high-tech machinery, generates billions of observations. Web page logs capture every item presented on a screen and capture the events as a user clicks and navigates the site.  There is no sign of these trends slowing; rather, new technologies are being introduced making the understanding of meaning from this data more complex.

I joined Teradata in 2000, and before that I was doing analytic analysis on data in a Teradata system.  At that time, looking at a group of 20,000 customers from a total population of 55 million with behaviour and trend analysis was cutting edge.  Today, some Teradata customers are analysing millions of customers with trillions of interactions, delivering event based decisions in sub second response times.  And the landscape continues to change as non-relational technologies are being used to find insights and patterns in semi-structured datasets. 

All of this change and growth are causing many companies to rethink their information management strategy.  It used to be enough to have a few data marts and a data warehouse to satisfy business users.  Now, with mixed requirements, mixed data types, mixed workloads and varying priorities a federated approach becomes quite complex and expensive to manage.  Teradata’s reaction to this has been to continue to focus on the Analytical Ecosystem  a logically grouped, flexible and extensible solution to provide analytic decisions meeting the complex requirements with the right value proposition.  It has been extended further with the integration of Aster and Hadoop to handle the emerging “semi-structured” data.

What will the Information Management strategy of 2017 look like?  I would suggest that it needs to be positioned to handle 4x the current volumes and complexities without any degradation in performance.  A 4x factor doesn’t seem that large, but to think that we are entering the age of the 2nd half of the chessboard, these are staggeringly large requirements!

John Berg is the Principal Consulting Architect for Teradata, South Pacific Area. He is responsible for helping organisations drive maximum value from their Teradata Integrated Data Warehouses through Professional Services delivery. He has a broad range of skills covering the acquisition, storage, and delivery of data and its optimisation and translation into business value. He has delivered solutions across many industry verticals including Banking, Communications, Retail and Government.