big data analytics

Investing in New Technology? Don’t Forget These 5 Value-Adding Steps

Posted on: November 12th, 2015 by Data Analytics Staff No Comments


By Data and Analytics Staff

IT departments are always on the lookout for the latest and greatest technological advance that will change the way their company works. But striking a balance between IT’s desire to implement a new technology, and the impact of said technology on the company’s bottom line, often can be tricky. For organizations in the processes of determining whether or not big data is right for them, comes the important task of identifying value.

This is why the Teradata Business Value Services team offered some helpful insights to attendees of the annual Teradata Partners Conference last month. Determining the business value of a certain technological undertaking can be a hefty load. During the process, it’s important to consider current environments, processes, and pain points to help nail down exactly what the new technology should be and, more importantly, what it should be achieving. It’s important for organizations to project what could be achieved in the future, and remain focused on identifying those potential future benefits.

Data Points, Teradata, Investing in new tech ID bus value

There are certain steps that can be taken to ensure that ROI is successfully realized and captured, and that the whole organization is driven to create business value:

  1. Prepare and Plan. While it may seem like an obvious first step, too often companies attempt to dive head-first into a new business technology, without proper preparation. In this stage, organizations should take a step back to get a better understanding of what’s going on within the business, collecting important data to further prove the need for a new technology project or improvement. This could be financial data of the company to-date or business data that will give IT leaders a better sense of what shape the company is in and where further technology investment can take it.
  2.  Evaluate and Recommend. Once this pertinent data is compiled, take the time to digest all of the information, make sure it’s understood, and double-check that all the necessary information has been collected in order to put together a solid business case, including metrics, for the project to come. Backing new IT projects with fact-based recommendations surrounding business needs and the current environment can help to validate the business value to the C-suite.
  3.  Educate the Business. It’s important for executives and decision makers to be educated on a new initiative so that IT can gain a better understanding of whether or not an idea under consideration makes financial sense for the company as a whole. The marriage of IT to business and finance departments is important, and educating the masses is crucial. IT should be listening to the needs of the businesses from the very start to deliver value to the organization, not just new technology.
  4.  Develop a Roadmap. Various business improvement opportunities can be achieved by understanding, articulating and quantifying the impacts of a new project. Developing a project roadmap makes it easy to prioritize opportunities and optimize results.
  5.  Refine and Confirm. Once a new program, project or technology has been implemented, people often take a back seat and let it run its course. But when an organization is making such a critical investment it’s crucial to track these processes and make sure they’re reaping the planned value. Teams should start with a solid foundation of metrics upon implementation to make regular analysis a smooth process, and use those metrics to adjust if goals aren’t being met.

By taking these steps, IT teams can better understand the long-term value of a new technology advancement and work to convey that message to the C-suite and the organization overall. By taking an inclusive approach to a technology investment, IT teams are able to display business value, upfront, and make effective technology use a reality for their organizations.

Aster on Hadoop is Hadoop for Everyone!

Posted on: October 19th, 2015 by John Thuma No Comments


One of the biggest announcements at Teradata Partners 2015 is that Aster will run on Hadoop. Many of our customers have already invested in a Hadoop data lake and want to do more than just store data. Storing data is helpful but not all that interesting. What if you could easily do advanced analytics without having to move data out of the lake? What if you had the power of Aster’s multi-genre analytics running on Hadoop? This is exactly what Aster Analytics on Hadoop is all about.

This announcement is a very exciting prospect for some but may strike fear into others. In my blog, I will entertain some of the interesting prospects of bringing together these technologies. I also hope to allay some fears as well.

Aster Brings Multi-Genre Analytics to Hadoop

Almost every day I hear about a new Hadoop project or offering. That means a new approach, a new tool to learn, and usually a lot of programming. With Aster, you have a variety of advanced analytics at your fingertips, ready to take advantage of your data lake. With Aster and its plug-and-play SNAP framework, analysts and data scientists can use a variety of analytics delivered through a common optimizer, executor, and unified interface. Aster offers many different genres of analytics: ANSI SQL, Machine Learning, Text, Graph, Statistics, Time Series, and Path, and Pattern Analysis. Aster on Hadoop is a big win for data scientists, as well as for people who already know and love Aster.

Looks and Feels Just Like Aster

For those who know Aster, Aster on Hadoop might sound daunting, but don’t fret. Everything works the same. You have the same statement interface ‘SELECT * FROM nPath…’ You still have ACT, ncluster_loader, ncluster_export, and Aster Management Console. You can still run ANSI SQL queries and connect to disparate data sources through QueryGrid and SQL-H. AppCenter allows anyone to perform advanced analytics using a simple web interface. Aster Development Environment enables programmers to build their own custom SQL-MR and SQL-GR functions. In other words, everything works the same. The only difference is that it is all running inside Hadoop, enabling a whole new group of people to participate in the Hadoop experience. If you have made a large investment in Hadoop and want to exploit the data located there, then Aster on Hadoop is for you.

Aster on Hadoop: Adaptable Not Invasive

One of the biggest complaints I hear from clients is, “We built a data lake and we want to do analytics, but it’s too hard.” Aster is adaptable to your Hadoop environment and the data you’ve landed there. Aster on Hadoop also means no new appliance; no need to find room in your data center to park a new rack of Aster. There’s no data movement across platforms or across the network; you process data right where it is. Aster on Hadoop runs natively inside Hadoop so you have access to HDFS file formats and a variety of connectors to other JDBC/ODBC compliant data sources. Staff who know ANSI SQL are perfectly positioned to use Aster on Hadoop, and with a little training, they’ll be performing advanced analytics in no time.


Organizations have made huge strides and investments in their Hadoop ecosystem and many are using it as a repository for big data, but that’s not enough. Organizations rightly want to exploit the data contained in Hadoop to gain new insights. Today Aster is being used to solve real world business problems through its multi-genre analytic capabilities. Aster on Hadoop will lower the barriers to entry. It’s a big step in realizing real business value from Hadoop and finally achieving a positive ROI. If you’re an existing Aster client, there’s no need to worry: it all works the same. Teradata Aster on Hadoop democratizes analytics and brings solution freedom to Hadoop! It’s Hadoop for the rest of us.


How Analytics Turns IoT Data into Dollars

Posted on: October 19th, 2015 by Chris Twogood No Comments


The buzz around the term “Internet of Things” (IoT) amplifies with each passing day. It’s taking some time, however, for everyone to fully comprehend just how valuable this phenomenon has become for our world and our economy. Part of this has to do with the learning curve in understanding the sophisticated technologies and analytics involved. But part of it is the sheer, staggering scope of value that’s possible worldwide. A comprehensive study in June 2015 by the McKinsey Global Institute, in fact, concluded that IoT is one of those rare technology trends where the “hype may actually understate the full potential.”

The Internet of Things is our constantly growing universe of sensors and devices that create a flood of granular data about our world. The “things” include everything from environmental sensors monitoring weather, traffic or energy usage; to “smart” household appliances and telemetry from production-line machines and car engines. These sensors are constantly getting smarter, cheaper and smaller (many sensors today are smaller than a dime, and we’ll eventually see smart dust: thousands of small processors that look like dust and are sprinkled on surfaces, swallowed or poured.)

Smart Analytics Drive IoT Value

As the volume and variety of sensors and other telemetry sources grows, the connections between them and the analytic needs also grow to create an IoT value curve that’s rising exponentially as time goes on. IDC predicts the installed base of IoT connected things will reach more than 29.5 billion in 2020, with economic value-add across sectors by then topping $1.9 trillion. For all the focus on sensors and connections, however, the key driver of value is the analytics we can apply to reap insights and competitive advantage.

As we build better algorithms for the burgeoning IoT digital infrastructure, we are learning to use connection-based “smart analytics” to get very proactive in predicting future performance and conditions and even prescribing future actions. What if we could predict such a failure before it ever happens? With advanced smart analytics today, we can. It’s called predictive maintenance and it utilizes a probability-based “Weibull distribution” and other advanced processes to gauge “time to failure” rates so we can predict a machine or device breakdown before it happens.

One major provider of medical diagnostic and treatment machines has leveraged predictive maintenance to create “wearout models” for component parts in its products. This enabled early detection and identification of problems, as well as proactive root cause analysis to prevent down time and unplanned outages. A large European train manufacturer, meanwhile, is leveraging similar techniques to prevent train engine failure. It’s a key capability that has enabled the firm to expand into the leasing market – a line of business that’s profitable only if your trains remain operational.

Building IoT Architectures

There is really no limit to how far we can take this alchemy of sensors, connections and algorithms to create more and more complex systems and solutions to the problems facing businesses. But success remains impossible without the right analytics architectures in place. Most companies today still struggle to capitalize and make use of all this IoT data.

Indeed, McKinsey’s June 2015 IoT report found that less than one percent of IoT data is currently used; and those uses tend to be straightforward things like alarm activation or real-time controls rather than advanced analytics that can help optimize business processes or make predictions.

Even the most tech-savvy businesses are now realizing that extracting value from the data is a difficult and skills-intensive process. Top priorities include intelligent “listening” to massive streams of IoT data to uncover distinctive patterns that may be signposts to valuable insights. We must ingest and propagate that data in an analytical ecosystem advanced machine learning algorithms, operating at scale to reap sophisticated, actionable insights.

Agility is key: Architectures need to follow multiple streams of sensor and IoT data in real-time and deploy an agile central ingestion platform to economically and reliably listen to all relevant data. Architectures also should be configured to deploy advanced analytics – including machine learning, path, pattern, time series, statistics, graph, and text analytics – against massive volumes of data. The entire environment should be thoroughly self-service to enable rapid innovation of any new data set and avoid bogging down IT personnel with costly, requirements-driven custom projects.

These are the kind of capabilities companies must pursue to economically spot and act upon new business opportunities made possible by the Internet of Things. It takes a good deal of investment and strategic planning, but the payoff in terms of analytic insights, competitive advantage and future revenue is well worth it.

A Vision for an Analytic Infrastructure

Posted on: October 12th, 2015 by Guest Blogger No Comments


by Dan Woods

An analytic infrastructure can be much like Mark Twain’s definition of a classic: “A book which people praise and don’t read.” An analytics platform is often referred to but rarely architected. Business analysts and data scientists often talk about the power of analytics without talking about the end game. But to make any progress in the big data world – and remain competitive – companies must change the way they think about analytics and implement an analytic infrastructure.

The current approach to big data analytics is simply unsustainable. For each business question that arises, IT builds a custom application. This application centric approach results in many silos modeled after the operational source. Users can get answers against that silo’s set of data, but they can’t get answers from data across multiple platforms. As a result, data must be constantly moved in and out of the applications, and each application must be maintained.

It behooves you to think about what you want to achieve with analytics. Most companies today want to become data-driven organizations. In order to do so, however, analytics must be scalable and sustainable so that every department has access to the information it needs to make decisions based on data. An application centric approach is neither scalable nor sustainable. So how can analytics be made more productive for everyone involved and enabled to scale across the entire organization? Instead of hardwiring analytics into an application, you need to find a way to:

  • Apply the right analytic to the right data integration type. Instead of building an application that is essentially a black box, we need a platform that can reach out for all the needed data and then apply the analytic. This approach minimizes data movement and data duplication.
  • Leverage multiple analytic techniques to get insights. You need to build applications in which data is loosely coupled, thereby creating just enough structure to answer frequently asked questions while expanding access to analytics across the organization.
  • Provide self-service analytics for all skill levels. R programming shouldn’t be a requirement for performing analytics. You need an analytics platform that supports a spectrum of users, from data scientists to business analysts.

The key to enabling these objectives is to make data reusable so that it can be available to as many analytics processes as possible. That means proactively thinking about whether a piece of data will be needed to answer more than one question in the future and understanding where you’re at in your big data journey. You can’t assume that all data will go into a tightly controlled model, like a data warehouse. If you model data using different types of integration based on what you understand about that data, you can create a foundation so that next time you need to answer a question with that data, you can more easily create it.

In the past, companies had a tendency to over-model and over-integrate their data. Not only was this a waste of money, but it led to an architecture that was difficult to change. Today, companies have the opposite problem: under-modeling and under-integrating. This increases both costs and complexity. A better approach is to invest in tightly coupled integration for high-value data that will be used at scale. Keep other data, of varying levels of maturity, either loosely coupled or non-coupled.

Taking this approach to building an analytic infrastructure will help you:

  • Meet new needs faster. As the infrastructure grows, the “nervous system” will become more powerful and more easily adapted to meet new needs.
  • Decrease the cost and complexity of the infrastructure. Avoiding application centric silos will reduce the cost and complexity of analytics.
  • Increase productivity. By investing time upfront to make analytics easier for various skill sets, more people can benefit. In addition, new data can be integrated at a lower cost since time and money are not being wasted over-modeling data that will not be used.

This is a new way of thinking about data, and it may be foreign to many companies. But it is a solid vision for something rarely spoken about but which is necessary for becoming a data-driven organization – an analytic infrastructure. If you’re serious about analytics, then it’s worth working with an experienced advisor who can help you make such an infrastructure a reality.

Dan-Woods Data Points TeradataDan Woods is CTO and founder of CITO Research. He has written more than 20 books about the strategic intersection of business and technology. Dan writes about data science, cloud computing, mobility, and IT management in articles, books, and blogs, as well as in his popular column on


Why Should Big Data Matter to You?

Posted on: September 15th, 2015 by Marc Clark No Comments


With all the attention given to big data, it is no surprise that more companies feel pressure to explore the possibilities for themselves. The challenge for many has been the high barriers to entry. Put simply, big data has cost big bucks. Maybe even more perplexing has been uncertainty about just what big data might deliver for a given company. How do you know if big data matters to your business?

The advent of cloud-based data warehouse and analytics systems can eliminate much of that uncertainty. For the first time, it is possible to explore the value proposition of big data without the danger of drowning the business in the costs and expertise needed to get big data infrastructure up and running.

cloud analytics Marc Clark Teradata

Subscription-based models replace the need to purchase expensive hardware and software with the possibility of a one-stop-shopping experience where everything—from data integration and modeling tools to security, maintenance and support—is available as a service. Best of all, the cloud makes it feasible to evaluate big data regardless of whether your infrastructure is large and well-established with a robust data warehouse, or virtually nonexistent and dependent on numerous Excel worksheets for analysis.

Relying on a cloud analytics solution to get you started lets your company test use cases, find what works best, and grow at its own pace.

Why Big Data May Matter

Without the risk and commitment of building out your own big data infrastructure, your organization is free to explore the more fundamental question of how your data can influence your business. To figure out if big data analytics matters to you, ask yourself and your company a few questions:

  • Are you able to take advantage of the data available to you in tangible ways that impact your business?
  • Can you get answers quickly to questions about your business?
  • Is your current data environment well integrated, or a convoluted and expensive headache?

For many organizations, the answer to one or more of these questions is almost certainly a sore point. This is where cloud analytics offers alternatives, giving you the opportunity to galvanize operations around data instead of treating data and your day-to-day business as two separate things. The ultimate promise of big data is not one massive insight that changes everything. The goal is to create a ceaseless conveyor belt of insights that impact decisions, strategies, and practices up, down, and across the operational matrix.

The Agile Philosophy for Cloud Analytics

We use the word agile a lot, and cloud analytics embraces that philosophy in important new ways. In the past, companies have invested a lot of time, effort, and money in building infrastructure to integrate their data and create models. Then they find themselves trapped in an environment that doesn’t suit their requirements.

Cloud analytics provides a significant new path. It's a manageable approach that enables companies to get to important questions without bogging down in technology.

And, to really figure out what value is lurking in their data and what its impact might be.

To learn more, download  our free Enterprise Analytics in the Cloud eBook.

Big Data Success Starts With Empowerment: Learn Why and How

Posted on: September 1st, 2015 by Chris Twogood No Comments


As my colleague Bill Franks recently pointed out on his blog, there is often the perception that being data-driven is all about technology. While technology is indeed important, being data-driven actually spans a lot of different areas, including people, big data processes, access, a data-driven culture and more. In order to be successful with big data and analytics, companies need to fundamentally embed it into their DNA.

To be blunt, that level of commitment simply must stem from the top rungs of any organization. This was evident when Teradata recently surveyed 316 senior data and IT executives. The commitment to big data was far more apparent at companies where CEOs personally focus on big data initiatives, as over half of those respondents indicated it as the single most important way to gain a competitive advantage.

Big Data Success Starts With Empowerment, Chris Twogood, Data Points, TeradataIndeed, industries with the most competitive environments are the ones leading the analytics push. These companies simply must find improvements, even if the needle is only being moved in the single digits with regards to things like operational costs and revenue.

Those improvements don’t happen without proper leadership, especially since a data-driven focus impacts just about all facets of the business -- from experimentation to decision-making to rewarding employees. Employees must have access to big data, feel empowered with regards to applying it and be confident in their data-driven decisions.

In organizations where being data-driven isn’t embedded in the DNA, someone may go make a decision and attempt to leverage a little data. But, if they don’t feel empowered by the data’s prospects and aren’t confident in the data, they will spend a lot of cycles seeking validation. A lot of time will be spent simply attempting to ensure they have the right data, the accurate data, that they are actually making the right decision based on it and that they will be backed up once that decision is made.

There is a lot of nuance with regards to being data-driven, of course. While all data has value, there are lots of levels to that value – the challenge generally lies in recognizing the values and extracting it. Our survey confirmed, for instance, just how hot location data is right now, as organization work to understand the navigation of their customers in order to deliver relevant communication.

Other applications of data, according to the survey, include the creation of new business models, the discovery of new product offers, and the monetization of data to external companies. But that’s just the tip of the iceberg. Healthcare, for example, is an up-and-coming industry with regards to data usage. An example is better understanding path to surgery -- breaking down the four or five steps most important to achieving a better patient outcome.

But whether you’re working in a hospital or a hot startup, and working to carve out more market share or improve outcomes for patients, the fundamentals we’ve been discussing here remain the same. Users must be empowered and confident in order to truly be data-driven -- and they’re not going to feel that way unless those at the top are leading the way.


Pluralism and Secularity In a Big Data Ecosystem

Posted on: August 25th, 2015 by Guest Blogger No Comments


Solutions around today's analytic ecosystem are too technically driven without focusing on business values. The buzzwords seem to over-compensate the reality of implementation and cost of ownership. I challenge you to view your analytic architecture using pluralism and secularity. Without such a view of this world your resume will fill out nicely but your business values will suffer.

In my previous role, prior to joining Teradata, I was given the task of trying to move "all" of our organization’s BI data to Hadoop. I will share my approach - how best-in-class solutions come naturally when pluralism and secularity are used to support a business-first environment.

Big data has exposed some great insights into what we can, should, and need to do with our data. However, this space is filled with radical opinions and the pressure to "draw a line in the sand" between time-proven methodologies and what we know as "big data." Some may view these spaces moving in opposite directions; however, these spaces will collide. The question is not "if" but "when." What are we doing now to prepare for this inevitability? Hadapt seems to be moving in the right direction in terms of leadership between the two spaces.

Relational Databases
I found many of the data sets in relational databases to be lacking in structure, highly transient, and loosely coupled. Data scientists needed to have quick access to data sets to perform their hypothesis testing.

Continuously requesting IT to rerun their ETL processes was highly inefficient. A data scientist once asked me "Why can't we just dump the data in a Linux mount for exploration?" Schema-on-write was too restrictive as the data scientists could not predefine the attributes for the data set for ingestion. As the data sets became more complex and unstructured, the ETL processes became exponentially more complicated and performance was hindered.

I also found during this exercise that my traditional BI analysts were perplexed with formulating questions about the data. One of the reasons was that businesses did not know what questions to ask. This is a common challenge in the big data ecosystem. We are used to knowing our data and being able to come up with incredible questions about it. The BI analyst's world has been disrupted as they now need to ask "What insights/answers do I have about my data?" – (according to IIya Katsov in one of his blogs).

The product owner of Hadoop was convinced that the entire dataset should be hosted on Amazon Web Services (S3) which would allow our analytics (via Elastiv Map Reduce) to perform at incredible speeds. However, due to various ISO guidelines, the data sets had to be encrypted at rest and in transit which degraded performance by approximately 30 percent.

Without an access path model, logical model, or unified model, business users and data scientists were left with little appetite for unified analytics. Data scientists were on their own guidelines for integrated/ federated/governed/liberated post-discovery analytical sets.

Communication with the rest of the organization became an unattainable goal. The models which came out of discovery were not federated across the organization as there was a disconnect between the data scientists, data architects, Hadoop engineers, and data stewards -- who spoke different languages. Data scientists were creating amazing predictive models and at the same time data stewards were looking for tools to help them provide insight in prediction for the SAME DATA.

Using NoSQL for a specific question on a dataset required a new collection set. To maintain and govern the numerous collections became a burden. There had to be a better way to answer many questions without having a linear relationship to the number of collections instantiated. The answer may be within access path modeling.

Another challenge I faced was when users wanted a graphical representation of the data and the embedded relationships or lack thereof. Are they asking for a data model? The users would immediately say no, since they read in a blog somewhere that data modeling is not required using NoSQL technology.

At the end of this entire implementation I found myself needing to integrate these various platforms for the sake of providing a business-first solution. Maybe the line in the sand isn't a business-first approach? Those that drive Pluralism (a condition or system in which two or more states, groups, principles, sources of authority, etc., coexist) and Secularity (not being devoted to a specific technology or data 'religion') within their analytic ecosystem -- can truly deliver a business-first solution approach while avoiding the proverbial "silver bullet" architecture solutions.

In my coming post, I will share some of the practices for access path modeling within Big Data and how it supports pluralism and secularity within a business-first analytic ecosystem.

Sunile Manjee

Sunile Manjee is a Product Manager in Teradata’s Architecture and Modeling Solutions team. Big Data solutions are his specialty, along with the architecture to support a unified data vision. He has over 12 years of IT experience as a Big Data architect, DW architect, application architect, IT team lead, and 3gl/4gl programmer.

Optimization in Data Modeling 1 – Primary Index Selection

Posted on: July 14th, 2015 by Guest Blogger No Comments


In my last blog I spoke about the decisions that must be made when transforming an Industry Data Model (iDM) from Logical Data Model (LDM) to an implementable Physical Data Model (PDM). However, being able to generate DDL (Data Definition Language) that will run on a Teradata platform is not enough – you also want it to perform well. While it is possible to generate DDL almost immediately from a Teradata iDM, each customer’s needs mandate that existing structures be reviewed against data and access demographics, so that optimal performance can be achieved.

Having detailed data and access path demographics during PDM design is critical to achieving great performance immediately, otherwise it’s simply guesswork. Alas, these are almost never available at the beginning of an installation, but that doesn’t mean you can’t make “excellent guesses.”

The single most influential factor in achieving PDM performance is proper Primary Index (PI) selection for warehouse tables. Data modelers are focused on entity/table Primary Keys (PK) since it is what defines uniqueness at the row level. Because of this, a lot of physical modelers tend to implement the PK as a Unique Primary Index (UPI) on each table as a default. But one of the keys to Teradata’s great performance is that it utilizes the PI to physical distribute data within a table across the entire platform to optimize parallelism. Each processor gets a piece of the table based on the PI, so rows from different tables with the same PI value are co-resident and do not need to be moved when two tables are joined.

In a Third Normal Form (3NF) model no two entities (outside of super/subtypes and rare exceptions) will have the same PK, so if chosen as a PI, it stands to reason that no two tables share a PI and every table join will require data from at least one table to be moved before a join can be completed – not a solid performance decision to say the least.

The iDM’s have preselected PI’s largely based on Identifiers common across subject areas (i.e. Party Id) so that all information regarding that ID will be co-resident and joins will be AMP-local. These non-unique PI’s (NUPI’s) are a great starting point for your PDM, but again need to be evaluated against customer data and access plans to insure that both performance and reasonably even data distribution is achieved.

Even data distribution across the Teradata platform is important since skewed data can contribute both to poor performance and to space allocation (run out of space on one AMP, run out of space on all). However, it can be overemphasized to the detriment of performance.

Say, for example, a table has a PI of PRODUCT_ID, and there are a disproportionate number of rows for several Products causing skewed distribution Altering the PI to the table PK instead will provide perfectly even distribution, but remember, when joining to that table, if all elements of the PK are not available then the rows of the table will need to be redistributed, most likely by PRODUCT_ID.

This puts them back under the AMP where they were in the skewed scenario. This time instead of a “rest state” skew the rows will skew during redistribution, and this will happen every time the table is joined to – not a solid performance decision. Optimum performance can therefore be achieved with sub-optimum distribution.

iDM tables relating two common identifiers will usually have one of the ID’s pre-selected as a NUPI. In some installations the access demographics will show that other ID may be the better choice. If so, change it! Or it may give leave you with no clear choice, in which case picking one is almost assuredly better than
changing the PI to a composite index consisting of both ID’s as this will only result in a table no longer co-resident with any table indexed by either of the ID’s alone.

There are many other factors that contribute to achieving optimal performance of your physical model, but they all pale in comparison to a well-chosen PI. In my next blog we’ll look at some more of these and discuss when and how best to implement them.

Jake Kurdsjuk Biopic-resize July 15

Jake Kurdsjuk is Product Manager for the Teradata Communications Industry Data Model, purchased by more than one hundred Communications Service Providers worldwide. Jake has been with Teradata since 2001 and has 25 years of experience working with Teradata within the Communications Industry, as a programmer, DBA, Data Architect and Modeler.


In advance of the upcoming webinar Achieving Pervasive Analytics through Data & Analytic Centricity, Dan Woods, CTO and editor of CITO Research, sat down with Clarke Patterson, senior director, Product Marketing, Cloudera, and Chris Twogood, vice president of Poduct and Services Marketing, Teradata, to discuss some of the ideas and concepts that will be shared in more detail on May 14, 2015.


Having been briefed by Cloudera and Teradata on Pervasive Analytics and Data & Analytic Centricity, I have to say it’s refreshing to hear vendors talk about WHY and HOW big data is important in a constructive way, rather than platitudes and jumping into the technical details of the WHAT which is so often the case.

Let me start by asking you both in your own words to describe Pervasive Analytics and Data & Analytic Centricity, and why this an important concept for enterprises to understand?


During eras of global economic shifts, there is always a key resource discovered that becomes the spark of transformation for organizations that can effectively harness it. Today, that resource is unquestionably ‘data’. Forward-looking companies realize that to be successful, they must leverage analytics in order to provide value to their customers and shareholders. In some cases they must package data in a way that adds value and informs employees, or their customers, by deploying analytics into decisions making processes everywhere. This idea is referred to as pervasive analytics.

I would point to the success that Teradata’s customers have had over the past decades in terms of making analytics pervasive throughout enterprises. The spectrum in which their customer have gained value is comprehensive, from business intelligence reporting and executive dashboards, to advanced analytics, to enabling front line decision makers, and embedding analytics into key operational processes. And while those opportunities remain, the explosion of new data types and breadth of new analytic capabilities is leading successful companies to recognize the need to evolve the way they think about data management and processes in order to harness the value of all their data.


I couldn’t agree more. It’s interesting now that we’re several years into the era of big data to see how different companies have approached this opportunity, which really boils down to two approaches. Some companies have taken the approach of what can we do with this newer technology that has emerged, while others take the approach of defining a strategic vision for the role of the data and analytics to support their business objectives and then map the technology to the strategy. The former, which we refer to as an application centric approach, can result in some benefits, but typically runs out of steam as agility slows and new costs and complexities emerge; while the latter is proving to create substantially more competitive advantage as organizations put data and analytics – not a new piece of technology – at the center of their operations. Ultimately, these companies that take a data and analytic centric approach are coming to a conclusion that there are multiple technologies required, and their acumen on applying the-right-tool-to-the-right-job naturally progresses, and the usual traps and pitfalls are avoided.


Would you elaborate on what is meant by “companies need to evolve the way they think about data management?”


Pre “big data,” there was a single approach to data integration whereby data is made to look the same or normalized in some sort of persistence such as a database, and only then can value be created. The idea is that by absorbing the costs of data integration up front, the costs of extracting insights decreases. We call this approach “tightly coupled.” This is still an extremely valuable methodology, but is no longer sufficient as a sole approach to manage all data in the enterprise.

Post “big data,” using the same tightly coupled approach to integration undermines the value of newer data sets that have unknown or under-appreciated value. Here, new methodologies to “loosely couple” or not couple at all are essential to cost effectively manage and integrate the data.   These distinctions are incredibly helpful in understanding the value of Big Data, where best to think about investments, and highlighting challenges that remain a fundamental hindrance to most enterprises.

But regardless of how the data is most appropriately managed, the most important thing is to ensure that organizations retain the ability to connect-the-dots for all their data, in order to draw correlations between multiple subject areas and sources and foster peak agility.


I’d also cite that leading companies are evolving the way they approach analytics. We can analyze any kind of data now - numerical, text, audio, video. We are now able to discover insights in this complex data. Further, new forms of procedural analytics have emerged in the era of big data, such as graph, time-series, machine learning, and text analytics.

This allows us to expand our understanding of the problems at hand. Key business imperatives like churn reduction, fraud detection, increasing sales and marketing effectiveness, and operational efficiencies are not new, and have been skillfully leveraged by data driven businesses with tightly coupled methods and SQL based analytics – that’s not going away. But when organizations harness newer forms of data that adds to the picture, and new complimentary analytic techniques, they realize better churn and fraud models, greater sales and marketing effectiveness, and more efficient business operations.

To learn more, please join the Achieving Pervasive Analytics through Data & Analytic Centricity webinar on Thursday, May 14 the from 10 - 11:00am PT


High Level Data Analytics Graph
(Healthcare Example)

 <---- Click on image to view GRAPH ANIMATION

Michael Porter, in an excellent article in the November 2014 issue of the Harvard Business Review[1], points out that smart connected products are broadening competitive boundaries to encompass related products that meet a broader underlying need. Porter elaborates that the boundary shift is not only from the functionality of discrete products to cross-functionality of product systems, but in many cases expanding to a system of systems such as a smart home or smart city.

So what does all this mean from a data perspective? In that same article, Porter mentions that companies seeking leadership need to invest in capturing, coordinating, and analyzing more extensive data across multiple products and systems (including external information). The key take-away is that the movement of gaining competitive advantage by searching for cross-functional or cross-system insights from data is only going to accelerate and not slow down. Exploiting cross-functional or cross-system centrality of data better than anyone else will continue to remain critical to achieving a sustainable competitive advantage.

Understandably, as technology changes, the mechanisms and architecture used to exploit this cross-system centrality of data will evolve. Current technology trends point to a need for a data & analytic-centric approach that leverages the right tool for the right job and orchestrates these technologies to mask complexity for the end users; while also managing complexity for IT in a hybrid environment. (See this article published in Teradata Magazine.)

As businesses embrace the data & analytic-centric approach, the following types of questions will need to be addressed: How can business and IT decide on when to combine which data and to what degree? What should be the degree of data integration (tight, loose, non-coupled)? Where should the data reside and what is the best data modeling approach (full, partial, need based)? What type of analytics should be applied on what data?

Of course, to properly address these questions, an architecture assessment is called for. But for the sake of going beyond the obvious, one exploratory data point in addressing such questions could be to measure and analyze the cross-functional/cross-system centrality of data.

By treating data and analytics as a network of interconnected nodes in Gephi[2], the connectedness between data and analytics can be measured and visualized for such exploration. We can examine a statistical metric called Degree Centrality[3] which is calculated based on how well an analytic node is connected.

The high level sample data analytics graph demonstrates the cross-functional Degree Centrality of analytics from an Industry specific perspective (Healthcare). It also amplifies, from an industry perspective, the need for organizations to build an analytical ecosystem that can easily harness this cross-functional Degree Centrality of data analytics. (Learn more about Teradata’s Unified Data Architecture.)

In the second part of this blog post series we will walk through a zoomed-in view of the graph, analyze the Degree Centrality measurements for sample analytics, and draw some high-level data architecture implications.


[2] Gephi is a tool to explore and understand graphs. It is a complementary tool to traditional statistics.

[3] Degree centrality is defined as the number of links incident upon a node (i.e., the number of ties that a node has).

Ojustwin blog bio

Ojustwin Naik (MBA, JD) is a Director with 15 years of experience in planning, development, and delivery of Analytics. He has experience across multiple industries and is passionate at nurturing a culture of innovation based on clarity, context, and collaboration.