data science

 

A recent blog - “Why use an Industry Data Model?” - discussed how industry data models bring exceptional business value to customers by providing a logical “blueprint” for data integration and enterprise analytics within well-defined business spaces. If this data model is expanded to include a template industry physical data model (iPDM), can the model realistically be expected to support a database physical implementation?

If we follow the premise that these added PDM components can provide a starting point for building an integrated solution using field-proven physical design patterns and industry best practices, then we see that this “template iPDM” can greatly improve physical implementation productivity by leveraging from field-proven designs and practices.

In terms of a solution framework as it applies to an integrated data warehouse environment, both Integrated Data and Semantic Layers are essential to a well-architected data environment. The Integrated Data Layer contains data and metadata that is “neutral” from the data source perspective, and from the perspective of data and metadata usage. It is in the Semantic Layer that structures should be implemented for specific business uses.

The most foundational aspect of the integrated data warehouse design is the availability of a well-architected data model. As has long been the case, a logical data model (LDM) contains data elements organized to support a specific business or industry. The physical data model (PDM) components are the framework for the implementation of these structures, providing the details necessary to generate the DDL for the warehouse. The physical model resides alongside the logical model, expanded to include the components necessary to generate physical database structures like tables, views and indexes, designed to ensure optimum performance. . Simplicity as well as efficiency is realized with a combined logical/physical model, so whenever possible, PDM elements should retain the dynamic nature of their LDM counterparts to ensure ongoing flexibility.

Industry data models should be considered as templates that require further refinement as part of a customer implementation. This includes the validation of modeling requirements with a customer by mapping to business scenarios and data sources, identifying any gaps in functionality, followed by extending and trimming the model to fill the gaps. For all logical modeling changes there will be parallel physical modeling changes, and vice versa, so a flexible modeling approach affords the best solution, with a keen eye kept to insure that changes to each side maintain support for all business requirements.

Properly customized, the physical model can be quickly and correctly instantiated using the tools provided within the data modeling tool. Too much physical modeling time has been spent in the past on tasks such as manual naming of indexes and tables. Templates for generating DDL for both tables and views are provided, utilizing standard abbreviation files to ensure that logical names are consistently abbreviated within table, column and index names. The ability to derive these structures directly from the model file preserves data model relevancy and integrity throughout the implementation, and allows developers to devote time to things like evaluating source data and usage analytics, allowing them to more effectively implement a complete solution that includes value compression and appropriate partitioned primary index choices.

It is important to produce data model structures that allow you to implement the physical solution expressed in your logical model without the need to de-normalize. If you are building an integrated data warehouse, you need to use a database platform that supports high performance and a data modeling framework that facilitates the building of normalized databases for analytics right from the start. You need to start with an architecture that integrates seamlessly with “big data”, BI/OLAP relational analytics and advanced analytics engines. The Teradata database engine in concert with their Industry Data Models provides you with the perfect place to start.

JakeKurdsjukBiopic resize
 Jake Kurdsjuk is Product Manager for the Teradata Communications Industry Data Model, purchased by more than one hundred Communications Service Providers worldwide. Jake has been with Teradata since 2001 and has 25 years of experience working with Teradata within the Communications Industry, as a programmer, DBA, Data Architect and Modeler.

 

Farrah Bostic, presenting a message encouraging both skepticism and genuine intimacy, was one of the most provocative speakers at Strata 2014 in Santa Clara earlier this year. As founder of The Difference Engine, a Brooklyn, NY-based agency that helps companies with research and digital and product strategy, Bostic warns her clients away from research that seems scientific but doesn’t create a clear model of what customers want.

Too often, Bostic says, numbers are used to paint a picture of a consumer, someone playing a limited role in an interaction with a company. The goal of the research is to figure out how to “extract value” from the person playing that role. Bostic suggests that “People are data too.” Instead of performing research to confirm your strategy, Bostic recommends using research to discover and attack your biases. It is a better idea to create a genuine understanding of a customer that is more complete and then figure out how your product or service can provide value to that person that will make their lives better and help them achieve their goals.

After hearing Bostic speak, I had a conversation with Dave Schrader, director of marketing and strategy at Teradata, about how to bring a better model of the customer to life. As Scott Gnau, president of Teradata Labs, and I pointed out in “How to Stop Small Thinking from Preventing Big Data Victories,” one of the key ways big data created value is by improving the resolution of the models used to run a business. Here are some of the ways that models of the customer can be improved.

The first thing that Schrader recommends is to focus on the levers of the business. “What actions can you take? What value will those actions provide? How can those actions affect the future?,” said Schrader. This perspective helps focus attention on the part of the model that is most important.

Data then should be used to enhance the model in as many ways as possible. “In a call center, for example, we can tell if someone is pressing the zero button over and over again,” said Schrader. “This is clearly an indication of frustration. If that person is a high value customer, and we know from the data that they just had a bad experience – like a dropped call with a phone company, or 10 minutes on the banking fees page before calling, it makes sense to raise an event and give them special attention. Even if they aren’t a big spender, something should be done to calm them down and make sure they don’t churn.” Schrader suggests that evidence of customer mood and intent can be harvested in numerous ways, through voice and text analytics and all sorts of other means.

“Of course, you should be focused on what you know and how to do the most with that,” said Schrader. “But you should also be spending money or even 10% of your analyst’s time to expand your knowledge in ways that help you know what you don’t know.” Like Bostic, Schrader recommends that experiments be done to attack assumptions, to find the unknown unknowns.

To really make progress, Schrader recommends finding ways to break out of routine thinking. “Why should our analysts be chosen based on statistical skills alone?” asks Schrader. “Shouldn’t we find people who are creative and empathetic, who will help us think new thoughts and challenge existing biases? Of course we should.” Borrowing from the culture of development, Schrader suggests organizing data hack-a-thons to create a safe environment for wild curiosity. “Are you sincere in wanting to learn from data? If so, you will then tolerate failure that leads to learning,” said Schrader.

Schrader also recommends being careful about where in an organization to place experts such as data scientists. “You must ­add expertise in areas that will maximize communication and lead to storytelling,” said Schrader. In addition, he recommends having an open data policy wherever possible to encourage experimentation.

In my view, Bostic and Schrader are both crusaders who seek to institutionalize the spirit of the skeptical gadfly. It is a hard trick to pull off, but one that pays tremendous dividends.

By: Dan Woods, Forbes Blogger and Co-Founder of Evolved Media

Big Apple Hosts the Final Big Analytics Roadshow of the Year

Posted on: November 26th, 2013 by Teradata Aster No Comments

 

Speaking of ending things on a high note, New York City on December 6th will play host to the final event in the Big Analytics 2013 Roadshow series. Big Analytics 2013 New York is taking place at the Sheraton New York Hotel and Towers in the heart of Midtown on bustling 7th Avenue.

As we reflect on the illustrious journey of the Big Analytics 2013 Roadshow, kicking off in San Francisco, this year the Roadshow traveled through major international destinations including Atlanta, Dallas, Beijing, Tokyo, London and finally culminating at the Big Apple – it truly capsulated the appetite today for collecting, processing, understanding and analyzing data.

Big Analytics Atlanta 2013 photo

Big Analytics Roadshow 2013 stops in Atlanta

Drawing business & technical audiences across the globe, the roadshow afforded the attendees an opportunity to learn more about the convergence of technologies and methods like data science, digital marketing, data warehousing, Hadoop, and discovery platforms. Going beyond the “big data” hype, the event offered learning opportunities on how technologies and ideas combine to drive real business innovation. Our unyielding focus on results from data is truly what made the events so successful.

Continuing on with the rich lineage of delivering quality Big Data information, the New York event promises to pack tremendous amount of Big Data learning & education. The keynotes for the event include such industry luminaries as Dan Vesset, Program VP of Business Analytics at IDC, Tasso Argyros, Senior VP of Big Data at Teradata & Peter Lee, Senior VP of Tibco Software.

Photo of the Teradata Aster team in Dallas

Teradata team at the Dallas Big Analytics Roadshow


The keynotes will be followed by three tracks around Big Data Architecture, Data Science & Discovery & Data Driven Marketing. Each of these tracks will feature industry luminaries like Richard Winter of WinterCorp, John O’Brien of Radiant Advisors & John Lovett of Web Analytics Demystified. They will be joined by vendor presentations from Shaun Connolly of Hortonworks, Todd Talkington of Tableau & Brian Dirking of Alteryx.

As with every Big Analytics event, it presents an exciting opportunity to hear first hand from leading organizations like Comcast, Gilt Groupe & Meredith Corporation on how they are using Big Data Analytics & Discovery to deliver tremendous business value.

In summary, the event promises to be nothing less than the Oscars of Big Data and will bring together the who’s who of the Big Data industry. So, mark your calendars, pack your bags and get ready to attend the biggest Big Data event of the year.

Teradata’s UDA is to Data as Prius is to Engines

Posted on: November 12th, 2013 by Teradata Aster No Comments

 

I’ve been working in the analytics and database market for 12 years. One of the most interesting pieces of that journey has been seeing how the market is ever-shifting. Both the technology and business trends during these short 12 years have massively changed not only the tech landscape today, but also the future of evolution of analytic technology. From a “buzz” perspective, I’ve seen “corporate initiatives” and “big ideas” come and go. Everything from “e-business intelligence,” which was a popular term when I first started working at Business Objects in 2001, to corporate performance management (CPM) and “the balanced scorecard.” From business process management (BPM) to “big data”, and now the architectures and tools that everyone is talking about.

The one golden thread that ties each of these terms, ideas and innovations together is that each is aiming to solve the questions related to what we are today calling “big data.” At the core of it all, we are searching for the right way to enable the explosion of data and analytics that today’s organizations are faced with, to simply be harnessed and understood. People call this the “logical data warehouse”, “big data architecture”, “next-generation data architecture”, “modern data architecture”, “unified data architecture”, or (I just saw last week) “unified data platform”.  What is all the fuss about, and what is really new?  My goal in this post and the next few will be to explain how the customers I work with are attacking the “big data” problem. We call it the Teradata Unified Data Architecture, but whatever you call it, the goals and concepts remain the same.

Mark Beyer from Gartner is credited with coining the term “logical data warehouse” and there is an interesting story and explanation. A nice summary of the term is,

The logical data warehouse is the next significant evolution of information integration because it includes ALL of its progenitors and demands that each piece of previously proven engineering in the architecture should be used in its best and most appropriate place.  …

And

… The logical data warehouse will finally provide the information services platform for the applications of the highly competitive companies and organizations in the early 21st Century.”

The idea of this next-generation architecture is simple: When organizations put ALL of their data to work, they can make smarter decisions.

It sounds easy, but as data volumes and data types explode, so does the need for more tools in your toolbox to help make sense of it all. Within your toolbox, data is NOT all nails and you definitely need to be armed with more than a hammer.

In my view, enterprise data architectures are evolving to let organizations capture more data. The data was previously untapped because the hardware costs required to store and process the enormous amount of data was simply too big. However, the declining costs of hardware (thanks to Moore’s law) have opened the door for more data (types, volumes, etc.) and processing technologies to be successful. But no singular technology can be engineered and optimized for every dimension of analytic processing including scale, performance or concurrent workloads.

Thus, organizations are creating best-of-breed architectures by taking advantage of new technologies and workload-specific platforms such as MapReduce, Hadoop, MPP data warehouses, discovery platforms and event processing, and putting them together into, a seamless, transparent and powerful analytic environment. This modern enterprise architecture enables users to get deep business insights and allows ALL data to be available to an organization, creating competitive advantage while lowering the total system cost.

But why not just throw all your data into files and put a search engine like Google on top? Why not just build a data warehouse and extend it with support for “unstructured” data? Because, in the world of big data, the one-size-sits-all approach simply doesn’t work.

Different technologies are more efficient at solving different analytical or processing problems. To steal an analogy from Dave Schrader—a colleague of mine—it’s not unlike a hybrid car. The Toyota Prius can average 47 mpg with hybrid (gas and electric) vs. 24 mpg with a “typical” gas-only car – almost double! But you do not pay twice as much for the car.

How’d they do it? Toyota engineered a system that uses gas when I need to accelerate fast (and also to recharge the battery at the same time), electric mostly when driving around town, and braking to recharge the battery.

Three components integrated seamlessly – the driver doesn’t need to know how it works.  It is the same idea with the Teradata UDA, which is a hybrid architecture for extracting the most insights per unit of time – at least doubling your insight capabilities at reasonable cost. And, business users don’t need to know all of the gory details. Teradata builds analytic engines—much like the hybrid drive train Toyota builds— that are optimized and used in combinations with different ecosystem tools depending on customer preferences and requirements, within their overall data architecture.

In the case of the hybrid car, battery power and braking systems, which recharge the battery, are the “new innovations” combined with gas-powered engines. Similarly, there are several innovations in data management and analytics that are shaping the unified data architecture, such as discovery platforms and Hadoop. Each customer’s architecture is different depending on requirements and preferences, but the Teradata Unified Data Architecture recommends three core components that are key components in a comprehensive architecture – a data platform (often called “Data Lake”), a discovery platform and an integrated data warehouse. There are other components such as event processing, search, and streaming which can be used in data architectures, but I’ll focus on the three core areas in this blog post.

Data Lakes

In many ways, this is not unlike the operational data store we’ve seen between transactional systems and the data warehouse, but the data lake is bigger and less structured. Any file can be “dumped” in the lake with no attention to data integration or transformation. New technologies like Hadoop provide a file-based approach to capturing large amounts of data without requiring ETL in advance. This enables large-scale data processing for data refining, structuring, and exploring data prior to downstream analysis in workload-specific systems, which are used to discover new insights and then move those insights into business operations for use by hundreds of end-users and applications.

Discovery Platforms

Discovery platforms are a new workload-specific system that is optimized to perform multiple analytic techniques in a single workflow to combine SQL with statistics, MapReduce, graph, or text analysis to look at data from multiple perspectives. The goal is to ultimately provide more granular and accurate insights to users about their business. Discovery Platforms enable a faster investigative analytical process to find new patterns in data, identify different types fraud or consumer behavior that traditional data mining approaches may have missed.

Integrated Data Warehouses

With all the excitement about what’s new, companies quickly forget the value of consistent, integrated data for reuse across the enterprise. The integrated data warehouse has become a mission-critical operational system which is the point of value realization or “operationalization” for information. The data within a massively parallel data warehouse has been cleansed, and provides a consistent source of data for enterprise analytics. By integrating relevant data from across the entire organization, a couple key goals are achieved. First, they can answer the kind of sophisticated, impactful questions that require cross-functional analyses. Second, they can answer questions more completely by making relevant data available across all levels of the organization. Data lakes (Hadoop) and discovery platforms complement the data warehouse by enriching it with new data and new insights that can now be delivered to 1000’s of users and applications with consistent performance (i.e., they get the information they need quickly).

A critical part of incorporating these novel approaches to data management and analytics is putting new insights and technologies into production in reliable, secure and manageable ways for organizations.  Fundamentals of master data management, metadata, security, data lineage, integrated data and reuse all still apply!

The excitement of experimenting with new technologies is fading. More and more, our customers are asking us about ways to put the power of new systems (and the insights they provide) into large-scale operation and production. This requires unified system management and monitoring, intelligent query routing, metadata about incoming data and the transformations applied throughout the data processing and analytical process, and role-based security that respects and applies data privacy, encryption and other policies required. This is where I will spend a good bit of time on my next blog post.

Anna Littick and the Unified Data Architecture — Part 1

Posted on: October 8th, 2013 by Dan Graham No Comments

 

Ring ring ringtone.
Dan: “Hello. This is Dan at Teradata. How can I help you today?”

Anna: “Hi Dan. I’m Anna Littick at Sunshine-Stores in Dallas. I believe we swapped some emails and you said I should call you.”

Dan: “Oh yeah. You’re the data scientist Ph.D that said you were a little confused by all the Hadoop-la. Yeah, I remember. Anyway, how can I help?”

Anna: “Well, a new CFO is running our IT division. He keeps saying Hadoop is free and wants to move everything to Hadoop. To me it seems risky.”

Dan: “Yes, we’ve seen this happen elsewhere. The CIO cracks under budget pressure when some evangelist claims he can do everything for free with Hadoop. Hadoop fever always runs its course until reality sets in again after several months.”
Anna: “Well, I guess we have the fever. If you remember my email, the CFO is causing internal debates that never seem to end. Let me list our points of debate again quickly:
1. Hadoop replaces the data warehouse
2. Hadoop is a landing zone and archive
3. Hadoop is a database
4. Hadoop does deep analytics.”

Dan: “OK, let’s take Hadoop replaces the data warehouse. You know the old adage ‘If it sounds too good to be true, then it probably is.’ Well, the biggest data warehouse strengths are managing multiple subject areas that are fully integrated. Subject-areas mean sales, inventory, customers, financials, and so on. Every one of these subjects has dozens – even 100s -- of sophisticated data tables. Subject areas are defined in a data model so they can be integrated and consistent. For example, data formats and values are standardized – like account types, country names, postal codes, or gender. We can’t tolerate invalid data in those fields. It also means throwing away duplicates, checking date formats, and ensuring valid relationships between historical events. Hadoop might hold onto all the data, but it’s not organized, cleansed, and tightly integrated into subject areas. Hadoop doesn’t do any of that – it’s all do-it-yourself programming. Check out Gartner’s definition to see that Hadoop is not a data warehouse. (1)  Wikipedia has the same definitions under top-down design as well.”

Anna: “Interesting. Like everyone else, I just took that for granted. Of course I’m a programmer and I never make mistakes. [snicker] But if I tell the CFO that, he’ll ignore me. Give me some upside, some things Hadoop does well so he will take me seriously.”
Dan: “Well, let’s start with the most obvious. When I first talked to Amr Awadallah, CTO at Cloudera, he told me ‘Hadoop’s biggest differentiators come from capturing massive amounts of raw data and querying that data for 5-10 years – and all that at a low cost.’ So Hadoop is both a landing zone and an archive for files. Hadoop can manage a few million files on huge, low cost, hard disk drives. With a little effort, Hadoop and Hive can query data that’s kept for 7, 8, even 10 years. Tape backups can do that but tape is sloooowww. Imagine getting a regulatory request from the Texas governor saying ‘Show us all your hiring by ethnicity, income, promotions, and raises going back to 2005.’ Most DBAs won’t keep data in the data warehouse that’s more than 5-7 years old because of costs. Hadoop provides a low cost archive and basic query capabilities.”
Anna: “Cool. It sounds like Hadoop would be a good place for ETL processing, right?”
Dan: “That’s a tricky question. A lot of companies are stampeding towards Hadoop as an ETL tool. Yet Gartner clearly states that Hadoop lacks the functions of an ETL engine (2).   At Teradata we have seen some great Hadoop ETL successes and some failures as well. I believe vendors like Informatica and IBM DataStage will do more data integration projects with Hadoop. They have the MDM, data lineage, metadata, and oodles of transformers. Hadoop mostly has do-it-yourself programming. I’m guessing the ETL vendors will have integrated so well with Hadoop in a few years and you will usually use them together.”
Anna: “OK, so we need to keep our ETL and data warehouse, then add Hadoop where it has strengths.”
Dan: “Agreed. That’s what we have seen the visionary customers and fast followers doing. Those customers have been asking us to make Teradata products and Hadoop work well together. This is driving us to invest a ton of money into what we call the Teradata Unified Data Architecture (UDA). UDA is hardware and software platforms optimized for specific workloads plus data flow between them for an ideal, best-of-breed analytic workplace.”
Anna: “Looks like it’s time for me to have a heart-to-heart chat with our new CFO. His name is Xavier Money. Isn’t that hilarious?”
Dan: “Oh yeah. Two ironies in one day.”

Anna: “What?”

Dan: “Oh nothing, just thinking of something else. How about I send you an email about our PARTNERS conference where you can hear these topics directly from Teradata customers like yourself? Real customer stories of hurdles and results are invaluable. Pay extra attention to the WinterCorp session on Big Data Cost of Ownership – your CFO will want to hear that one.”

Anna: “Thanks, I’ve got to run. Maybe we can finish up our chat in a couple days. I’ll call you.” 

1 Gartner, Of Data Warehouses, Operational Data Stores, Data Marts and Data Outhouses, Dec 2005
2 Gartner, Hadoop Is Not a Data Integration Solution, January 2013

 

About one year ago, Teradata Aster launched a powerful new way of integrating a database with Hadoop. With Aster SQL-H™, users of the Teradata Aster Discovery Platform got the ability to issue SQL and SQL-MapReduce® queries directly on Hadoop data as if that data had been in Aster all along. This level of simplicity and performance was unprecedented, and it enabled BI & SQL analysts that knew nothing about Hadoop to access Hadoop data and discover new information through Teradata Aster.

This innovation was not a one-off. Teradata has put forward the most complete vision for a data and analytics architecture in the 21st century. We call that the Unified Data Architecture™. The UDA combines Teradata, Teradata Aster & Hadoop into a best-of-breed, tightly integrated ecosystem of workload-specific platforms that provide customers the most powerful and cost-effective environment for their analytical needs. With Aster SQL-H™, Teradata provided a level of software integration between Aster & Hadoop that was, and still is, unchallenged in the industry.

 

Teradata Unified Data Architecture™ image

Teradata Unified Data Architecture™

Today, Teradata makes another leap in making its Unified Data Architecture™ vision a reality. We are announcing SQL-H™ for Teradata, bringing the best SQL engine for data warehousing and analytics to Hadoop. From now on, Enterprises that use Hadoop to store large amounts of data will be able to utilize Teradata's analytics and data warehousing capabilities to directly query Hadoop data securely through ANSI standard SQL and BI tools by leveraging the open source Hortonworks HCatalog project. This is fundamentally the best and tightest integration between a data warehouse engine and Hadoop that exists in the market today. Let me explain why.

It is interesting to consider Teradata's approach versus alternatives. If one wants to execute SQL on Hadoop, with the intent of building Data Warehouses out of Hadoop data, there are not many realistic options. Most databases have a very poor integration with Hadoop, and require Hadoop experts to manage the overall system - not a viable option for most Enterprises due to cost. SQL-H™ removes this requirement for Teradata/Hadoop deployments. Another "option" are the SQL-on-Hadoop tools that have started to emerge; but unfortunately, there are about a decade away from becoming sufficiently mature to handle true Data Warehousing workloads. Finally, the approach of taking a database and shoving it inside Hadoop has significant issues since it suffers from the worst of both worlds – Hadoop activity has to be limited so that it doesn't disrupt the database, data is duplicated between HDFS and the database store, and performance of the database is less compared to a stand–alone version.

In contrast, a Teradata/Hadoop deployment with SQL-H™ offers the best of both worlds: unprecedented performance and reliability in the Teradata layer; seamless BI & SQL access to Hadoop data via SQL-H™; and it frees up Hadoop to perform data processing tasks at full efficiency.

Teradata is committed to being the strategic advisor of the Enterprise when it comes to Data Warehousing and Big Data. Through its Unified Data Architecture™ and today's announcement on Teradata SQL-H™, it provides even more performance, flexibility and cost-effective options to Enterprises eager to use data as a competitive advantage.

Introducing In-Database Visual MapReduce Functions

Posted on: February 20th, 2013 by Teradata Aster No Comments

 

Ever since Aster Data became part of Teradata a couple years ago, we have been fortunate to have the resources and focus to accelerate our rate of product innovation. In the past 8 months alone, we have led the market in deploying big analytics on Hadoop and introducing an ultra-fast appliance for discovering big data insights. Our focus is to provide the market with the best big data discovery platform; that is, the most efficient, cost-effective, and enterprise-friendly way to extract valuable business insights form massive piles of structured and unstructured data.

Today I am excited to announce another significant innovation that extends our lead in this direction. For the first time, we are introducing in-database, SQL-MapReduce-based visualization functions, as part of the Teradata Aster Discovery Platform 5.10 software release. These are functions that take the output of an analytical process (either SQL or MapReduce) and create an interactive data visualization that can be accessed directly from our platform through any web browser. There are several functions that we are introducing with today's announcement, including functions that let you visualize flows of people or events, graphs, and arbitrary patterns. These functions complement your existing BI solution by extending the types of information you can visualize without adding the complexity of another BI deployment.

It did take some significant engineering effort and innovation from our field in working with customers to make a discovery platform produce in-database, in-process visualizations. So, why bother? Because these functions have three powerful characteristics: they are beautiful; powerful; and instant. Let me elaborate in reverse order.

Instant: the goal of a discovery platform like Aster’s is to accelerate the hypothesis --> analysis --> validation iteration process. One of the major big data challenges is that the data is so complex that you don't even know what questions to ask. So you start with 10s or 100s of possible questions that you need to quickly implement and validate until you find the couple questions that extract the gold nuggets of information from the data. Besides analyzing the data, having access to instant visualizations can help data scientists and business analysts understand if they are down the right path of finding the insights they're looking for. Being able to rapidly analyze and – now – visualize the insights in-process can rapidly accelerate the discovery cycle and save an analysts time and cost by more than 80% as has been recently validated.    

Powerful: Aster comes with a broad library of pre-built SQL-MapReduce functions. Some of the most powerful, like nPath, crunch terabytes of customer or event data and produce patterns of activity that yield significant insights in a single pass of the data, regardless of the complexity of the pattern or history being analyzed. In the past, visualizing these insights required a lot of work – even after the insight was generated. This is because there were no specialized visualization tools that could consume the insight as-is to produce the visualizations. Abstracting the insights in order to visualize them is sub-optimal since it is killing the 'a-ha!' moment. With today’s announcement, we provide analysts with the ability to natively visualize concepts such as a graph of interactions or patterns of customer behavior with no compromises and no additional effort!

Beautiful: We all know that numbers and data are only as good as the story that goes with them. By having access to instant, powerful and also aesthetically beautiful in-database visualizations, you can do justice to your insights and communicate them effectively to the rest of the organization, whether that means business clients, executives, or peer analysts.

In addition, with this announcement we are introducing four buckets of pre-built SQL-MapReduce functions, I.e. Java functions that can be accessed through a familiar SQL or BI interface. These buckets are Data Acquisition (connecting to external sources and acquiring data); Data Preparation (manipulate structured and unstructured data to quickly prepare for analysis); Data Analytics (everything from path and pattern analysis to statistics and marketing analytics); and Data Visualization (introduced today). This is the most powerful collection of big data tools available in the industry today, and we're proud to provide them to our customers.

Teradata Aster Discovery Portfolio - figure 2

Our belief is that our industry is still scratching the surface in terms of providing powerful analytical tools to enterprises that help them find more valuable insights, more quickly and more easily. With today's launch, the Teradata Aster Discovery Platform reconfirms its lead as the most powerful and enterprise-friendly tool for big data analytics.

Big Insights from Big Analytics Roadshow

Posted on: January 25th, 2013 by Teradata Aster No Comments

 

Last month in New York we completed the 4th and final event in the Big Analytics 2012 roadshow. This series of events shared ideas on practical ways to address the big data challenge in organizations and change the conversation from “technology” to “business value”. In New York alone, 500 people attended from across both business and IT and we closed out the event with two speaker panels. The data science panel was, in my opinion, one of the most engaging and interesting panels I’ve ever seen at an event like this. The topic was on whether organizations really need a data scientist (and what’s different about the skill set from other analytic professionals). Mike Gualtieri from Forrester Research did a great job leading & prodding the discussion.

Overall, these events were a great way to learn and network. The events had great speakers from cutting-edge companies, universities, and industry thought-leaders including LinkedIn, DJ Patil, Barnes & Noble, Razorfish, Gilt Groupe, eBay, Mike Gualtieri from Forrester Research, Wayne Eckerson, and Mohan Sawhney from Kellogg School of Management.

As an aside, I’ve long observed that there has been a historic disconnect between marketing groups and the IT organizations and data warehouses that they support. I noticed this first when I worked at Business Objects where very few reporting applications ever included Web clickstream data. The marketing department always used a separate tool or application like Web Side Story (now part of Adobe) to handle this. There is a bridge being built to connect these worlds – both in terms of technology which can handle web clickstream and other customer interactional data, but also new analytic techniques which make it easier for marketing/business analysts to understand their customers more intimately and better serve them a relevant experience.

We ran a survey at the events, and I wanted to share some top takeaways. The events were split into business and technical tracks with themes of “data science” and “digital marketing”. Thus, the survey data compares the responses from those who were more interested in technology than the business content, so we can compare their responses. The survey data includes responses from 507 people in San Francisco, 322 in Boston, 441 in Chicago, and 894 in New York City for a total of 2164 respondents.

You can get the full set of graphs here, but here are a couple of my own observations / conclusions in looking at the data:

1)      “Who is talking about big data analytics in your organization?” - IT and Marketing were by far the largest responses with nearly 60% of IT organizations and 43% of marketing departments talking about it. New York had slightly higher # of CIO’s and CEO’s talking about big data at 23 and 21%, respectively

 Survey Data: Figure 1

 

 

 


 

 

 

 

 

 

 

2)      “Where is big data analytics in your company” - Across all cities, “customer interactions in Web/social/mobile” was 62% - the biggest area of big data analytics. With all the hype around machine/sensor data, it was surprisingly only being discussed in 20% of organizations. Since web servers and mobile devices are machines, it would have been interesting to see how the “machine generated data” responses would have been if we had taken the more specific example of customer interactions away

 Survey Data: Figure 2

 

 

 

 


 

 

 

 

 

 

3)      This chart is a more detailed breakdown of the areas where big data analytics is found, broken down by city. NYC has a few more “other.” Some of the “other” answers in NYC included:

  1. Claims
  2. Client Data Cloud
  3. Development, and Data Center Systems
  4. Customer Solutions
  5. Data Protection
  6. Education
  7. Financial Transaction
  8. Healthcare data
  9. Investment Research
  10. Market Data
  11.  Predictive Analytics (sales and servicing)
  12. Research
  13. Risk management /analytics
  14. Security

 Survey Data: Figure 3

 

 

 

 

 

 


 

 

 

 

4)      “What are the Greatest Big Analytics Application Opportunities for Businesses Today? – on average, general “data discovery or data science” was highest at 72%, with “digital marketing optimization” as second with just under 60% of respondents. In New York, “fraud detection and prevention” at 39% was slightly higher than in other cities, perhaps tied to the # of financial institutions in attendance

 Survey Data: Figure 4

 


 

 

 

 

 

 

 

 

 

In summary, there are lots of applications for big data analytics, but having a discovery platform which supports iterative exploration of ALL types of data and can support both business/marketing analysts as well as savvy data scientists is important. The divide between business groups like marketing and IT are closing. Marketers are more technically savvy and the most demanding for analytic solutions which can harness the deluge of customer interaction data. They need to partner closely with IT to architect the right solutions which tackle “big analytics” and provide the right toolsets to give the self-service access to this information without always requiring developer or IT support.

We are planning to sponsor the Big Analytics roadshow again in 2013 and take it international, as well. If you attended the event and have feedback or requests for topics, please let us know. I hear that there will be a “call for papers” going out soon. You can view the speaker bios & presentations from the Big Analytics 2012 events for ideas.

Santa Claus and Data Scientists

Posted on: December 3rd, 2012 by Teradata Aster No Comments

 

Who do you believe in more – Santa Claus or Data Scientists? That’s the debate we’re having in New York City on Dec 12th at Big Analytics 2012. Due to the sold-out event this panel discussion will be simulcast live to dig a little deeper behind the hype.

Some believe that data scientists are a new breed of analytic professional that mergers mathematics, statistics, programming, visualization, and systems operations (and perhaps a little quantum mechanics and string theory for good measure) all in one. Others say that Data Scientists are simply data analysts who live in California.

Whatever you believe, the skills gap for “data scientists” and analytic professionals is real and not expected to close until 2018. Businesses see the light in terms of data-driven competitive advantage, but are they willing to fork out $300,000/yr for a person that can do data science magic? That’s what CIO Journal is reporting with the guidance that “CIOs need to make sure that they are hiring for these positions to solve legitimate business problems, and not just because everyone else is doing it too”.

Universities like Northwestern University have built programs and degrees in analytics to help close the gap. Technology vendors are bridging the gap to make new analytic techniques on big data tenable to a broader set of analysts in mainstream organizations. But is data science really new? What are businesses doing to unlock and monetize new insights? What skills do you need to be a “data scientist”? How can you close the gap? What should you be paying attention to?

Mike Gualtieri from Forrester Research will be moderating a panel to answer these questions and more with:

  • Geoff Guerdat, Director of Data Architecture, Gilt Groupe
  • Bill Franks, Chief Analytics Officer, Teradata
  • Bernard Blais, SAS
  • Jim Walker, Director of Product Marketing, Hortonworks

 

Join the discussion at 3:30 EST on Dec 12th where you can ask questions and follow the discussion thread on Twitter with #BARS12, or follow along on TweetChat at: http://tweetchat.com/room/BARS12

... it certainly beats sitting up all night with milk and cookies looking out for Santa!