SQL

Hadoop Summit June 2015: 4 Takeaways

Posted on: June 18th, 2015 by Cesar Rojas No Comments

 

For those in data—the developers, architects, administrators and analysts who capture, distill and integrate complex information for their organizations—the Hadoop Summit is one of the most important events of the year. We get to talk, share and learn from each other about how we can make Hadoop key to the enterprise data architecture.

The 2015 conference, held this month in San Jose, Calif., lived up to its billing. As a sponsor, Teradata had a big presence, including a booth that provided real-time demonstrations of our data solutions, as well as a contribution to the dialogue, with experts leading informative talks.

  • Peyman Mohajerian and Bill Kornfeld from Think Big  spoke on the new business value of a data lake strategy .
  • Teradata’s Justin Borgman,and Chris Rocca,  explored the future of Hadoop and SQL.

Over the course of the conference some big themes emerged. Here’s our insider look at the top takeaways from the 2015 Hadoop Summit:

1. Have no fear.

Yes, big data is here to stay.  And the opportunities to be gained are too great to let fear of failure guide your organization’s actions. David T. Lin, leader and evangelist of cloud platform engineering for Symantec, summed it up well: “Kill the fear. Haters to the left. Get it started and go.”

2. Take it step by step.

There’s an abundance of paths you can take to use and derive insights from your data.  Start small and scale. Hemal Gandhi, director of data engineering at One Kings Lane, said a good way to do that is to think like a startup, which often runs on innovation and agility. “There are lots of challenges in building highly scalable big data platforms … we took an approach that allows us to build a scalable data platform rapidly.”

 

3. Use predictive analytics.

Predictive analytics are worth taking the risk because they help uncover an organization’s next-best action to progress toward a goal. Alexander Gray, CTO of Skytree, discussed the benefits of “bigger” data and how those benefits can be quantified—in dollar terms. Because data size is a basic lever for predictive power, Gray said, “increasing business value is achieved by increasing predictive power.”

4. Personalize customer experiences.

Siloed applications combined in the Lambda architecture allow you to give your customers an experience that is tailored their needs. Russell Foltz-Smith, vice president of data platform at TrueCar, said his system allows his company to accurately identify, assess value, predict and prescribe “who, what and where” —giving customers the transparency they’re increasingly demanding. “We need to make everything easily accessible,” Williams said. “We are moving to a contextually aware, intelligent search engine. You have to open it up and let people forage through your data to find what they need.

Were you able to attend the Hadoop Summit or follow it online? What lessons did you take away from the event? Share your top Hadoop Summit insights in the comments below

 

In advance of the upcoming webinar Achieving Pervasive Analytics through Data & Analytic Centricity, Dan Woods, CTO and editor of CITO Research, sat down with Clarke Patterson, senior director, Product Marketing, Cloudera, and Chris Twogood, vice president of Poduct and Services Marketing, Teradata, to discuss some of the ideas and concepts that will be shared in more detail on May 14, 2015.

Dan:

Having been briefed by Cloudera and Teradata on Pervasive Analytics and Data & Analytic Centricity, I have to say it’s refreshing to hear vendors talk about WHY and HOW big data is important in a constructive way, rather than platitudes and jumping into the technical details of the WHAT which is so often the case.

Let me start by asking you both in your own words to describe Pervasive Analytics and Data & Analytic Centricity, and why this an important concept for enterprises to understand?

Clarke:

During eras of global economic shifts, there is always a key resource discovered that becomes the spark of transformation for organizations that can effectively harness it. Today, that resource is unquestionably ‘data’. Forward-looking companies realize that to be successful, they must leverage analytics in order to provide value to their customers and shareholders. In some cases they must package data in a way that adds value and informs employees, or their customers, by deploying analytics into decisions making processes everywhere. This idea is referred to as pervasive analytics.

I would point to the success that Teradata’s customers have had over the past decades in terms of making analytics pervasive throughout enterprises. The spectrum in which their customer have gained value is comprehensive, from business intelligence reporting and executive dashboards, to advanced analytics, to enabling front line decision makers, and embedding analytics into key operational processes. And while those opportunities remain, the explosion of new data types and breadth of new analytic capabilities is leading successful companies to recognize the need to evolve the way they think about data management and processes in order to harness the value of all their data.

Chris:

I couldn’t agree more. It’s interesting now that we’re several years into the era of big data to see how different companies have approached this opportunity, which really boils down to two approaches. Some companies have taken the approach of what can we do with this newer technology that has emerged, while others take the approach of defining a strategic vision for the role of the data and analytics to support their business objectives and then map the technology to the strategy. The former, which we refer to as an application centric approach, can result in some benefits, but typically runs out of steam as agility slows and new costs and complexities emerge; while the latter is proving to create substantially more competitive advantage as organizations put data and analytics – not a new piece of technology – at the center of their operations. Ultimately, these companies that take a data and analytic centric approach are coming to a conclusion that there are multiple technologies required, and their acumen on applying the-right-tool-to-the-right-job naturally progresses, and the usual traps and pitfalls are avoided.

Dan:

Would you elaborate on what is meant by “companies need to evolve the way they think about data management?”

Chris:

Pre “big data,” there was a single approach to data integration whereby data is made to look the same or normalized in some sort of persistence such as a database, and only then can value be created. The idea is that by absorbing the costs of data integration up front, the costs of extracting insights decreases. We call this approach “tightly coupled.” This is still an extremely valuable methodology, but is no longer sufficient as a sole approach to manage all data in the enterprise.

Post “big data,” using the same tightly coupled approach to integration undermines the value of newer data sets that have unknown or under-appreciated value. Here, new methodologies to “loosely couple” or not couple at all are essential to cost effectively manage and integrate the data.   These distinctions are incredibly helpful in understanding the value of Big Data, where best to think about investments, and highlighting challenges that remain a fundamental hindrance to most enterprises.

But regardless of how the data is most appropriately managed, the most important thing is to ensure that organizations retain the ability to connect-the-dots for all their data, in order to draw correlations between multiple subject areas and sources and foster peak agility.

Clarke:

I’d also cite that leading companies are evolving the way they approach analytics. We can analyze any kind of data now - numerical, text, audio, video. We are now able to discover insights in this complex data. Further, new forms of procedural analytics have emerged in the era of big data, such as graph, time-series, machine learning, and text analytics.

This allows us to expand our understanding of the problems at hand. Key business imperatives like churn reduction, fraud detection, increasing sales and marketing effectiveness, and operational efficiencies are not new, and have been skillfully leveraged by data driven businesses with tightly coupled methods and SQL based analytics – that’s not going away. But when organizations harness newer forms of data that adds to the picture, and new complimentary analytic techniques, they realize better churn and fraud models, greater sales and marketing effectiveness, and more efficient business operations.

To learn more, please join the Achieving Pervasive Analytics through Data & Analytic Centricity webinar on Thursday, May 14 the from 10 - 11:00am PT

 

About one year ago, Teradata Aster launched a powerful new way of integrating a database with Hadoop. With Aster SQL-H™, users of the Teradata Aster Discovery Platform got the ability to issue SQL and SQL-MapReduce® queries directly on Hadoop data as if that data had been in Aster all along. This level of simplicity and performance was unprecedented, and it enabled BI & SQL analysts that knew nothing about Hadoop to access Hadoop data and discover new information through Teradata Aster.

This innovation was not a one-off. Teradata has put forward the most complete vision for a data and analytics architecture in the 21st century. We call that the Unified Data Architecture™. The UDA combines Teradata, Teradata Aster & Hadoop into a best-of-breed, tightly integrated ecosystem of workload-specific platforms that provide customers the most powerful and cost-effective environment for their analytical needs. With Aster SQL-H™, Teradata provided a level of software integration between Aster & Hadoop that was, and still is, unchallenged in the industry.

 

Teradata Unified Data Architecture™ image

Teradata Unified Data Architecture™

Today, Teradata makes another leap in making its Unified Data Architecture™ vision a reality. We are announcing SQL-H™ for Teradata, bringing the best SQL engine for data warehousing and analytics to Hadoop. From now on, Enterprises that use Hadoop to store large amounts of data will be able to utilize Teradata's analytics and data warehousing capabilities to directly query Hadoop data securely through ANSI standard SQL and BI tools by leveraging the open source Hortonworks HCatalog project. This is fundamentally the best and tightest integration between a data warehouse engine and Hadoop that exists in the market today. Let me explain why.

It is interesting to consider Teradata's approach versus alternatives. If one wants to execute SQL on Hadoop, with the intent of building Data Warehouses out of Hadoop data, there are not many realistic options. Most databases have a very poor integration with Hadoop, and require Hadoop experts to manage the overall system - not a viable option for most Enterprises due to cost. SQL-H™ removes this requirement for Teradata/Hadoop deployments. Another "option" are the SQL-on-Hadoop tools that have started to emerge; but unfortunately, there are about a decade away from becoming sufficiently mature to handle true Data Warehousing workloads. Finally, the approach of taking a database and shoving it inside Hadoop has significant issues since it suffers from the worst of both worlds – Hadoop activity has to be limited so that it doesn't disrupt the database, data is duplicated between HDFS and the database store, and performance of the database is less compared to a stand–alone version.

In contrast, a Teradata/Hadoop deployment with SQL-H™ offers the best of both worlds: unprecedented performance and reliability in the Teradata layer; seamless BI & SQL access to Hadoop data via SQL-H™; and it frees up Hadoop to perform data processing tasks at full efficiency.

Teradata is committed to being the strategic advisor of the Enterprise when it comes to Data Warehousing and Big Data. Through its Unified Data Architecture™ and today's announcement on Teradata SQL-H™, it provides even more performance, flexibility and cost-effective options to Enterprises eager to use data as a competitive advantage.

Introducing In-Database Visual MapReduce Functions

Posted on: February 20th, 2013 by Teradata Aster No Comments

 

Ever since Aster Data became part of Teradata a couple years ago, we have been fortunate to have the resources and focus to accelerate our rate of product innovation. In the past 8 months alone, we have led the market in deploying big analytics on Hadoop and introducing an ultra-fast appliance for discovering big data insights. Our focus is to provide the market with the best big data discovery platform; that is, the most efficient, cost-effective, and enterprise-friendly way to extract valuable business insights form massive piles of structured and unstructured data.

Today I am excited to announce another significant innovation that extends our lead in this direction. For the first time, we are introducing in-database, SQL-MapReduce-based visualization functions, as part of the Teradata Aster Discovery Platform 5.10 software release. These are functions that take the output of an analytical process (either SQL or MapReduce) and create an interactive data visualization that can be accessed directly from our platform through any web browser. There are several functions that we are introducing with today's announcement, including functions that let you visualize flows of people or events, graphs, and arbitrary patterns. These functions complement your existing BI solution by extending the types of information you can visualize without adding the complexity of another BI deployment.

It did take some significant engineering effort and innovation from our field in working with customers to make a discovery platform produce in-database, in-process visualizations. So, why bother? Because these functions have three powerful characteristics: they are beautiful; powerful; and instant. Let me elaborate in reverse order.

Instant: the goal of a discovery platform like Aster’s is to accelerate the hypothesis --> analysis --> validation iteration process. One of the major big data challenges is that the data is so complex that you don't even know what questions to ask. So you start with 10s or 100s of possible questions that you need to quickly implement and validate until you find the couple questions that extract the gold nuggets of information from the data. Besides analyzing the data, having access to instant visualizations can help data scientists and business analysts understand if they are down the right path of finding the insights they're looking for. Being able to rapidly analyze and – now – visualize the insights in-process can rapidly accelerate the discovery cycle and save an analysts time and cost by more than 80% as has been recently validated.    

Powerful: Aster comes with a broad library of pre-built SQL-MapReduce functions. Some of the most powerful, like nPath, crunch terabytes of customer or event data and produce patterns of activity that yield significant insights in a single pass of the data, regardless of the complexity of the pattern or history being analyzed. In the past, visualizing these insights required a lot of work – even after the insight was generated. This is because there were no specialized visualization tools that could consume the insight as-is to produce the visualizations. Abstracting the insights in order to visualize them is sub-optimal since it is killing the 'a-ha!' moment. With today’s announcement, we provide analysts with the ability to natively visualize concepts such as a graph of interactions or patterns of customer behavior with no compromises and no additional effort!

Beautiful: We all know that numbers and data are only as good as the story that goes with them. By having access to instant, powerful and also aesthetically beautiful in-database visualizations, you can do justice to your insights and communicate them effectively to the rest of the organization, whether that means business clients, executives, or peer analysts.

In addition, with this announcement we are introducing four buckets of pre-built SQL-MapReduce functions, I.e. Java functions that can be accessed through a familiar SQL or BI interface. These buckets are Data Acquisition (connecting to external sources and acquiring data); Data Preparation (manipulate structured and unstructured data to quickly prepare for analysis); Data Analytics (everything from path and pattern analysis to statistics and marketing analytics); and Data Visualization (introduced today). This is the most powerful collection of big data tools available in the industry today, and we're proud to provide them to our customers.

Teradata Aster Discovery Portfolio - figure 2

Our belief is that our industry is still scratching the surface in terms of providing powerful analytical tools to enterprises that help them find more valuable insights, more quickly and more easily. With today's launch, the Teradata Aster Discovery Platform reconfirms its lead as the most powerful and enterprise-friendly tool for big data analytics.

2 months & 10 questions on new Aster Big Analytics Appliance

Posted on: December 18th, 2012 by Teradata Aster No Comments

 

It’s been about two months since Teradata launched the Aster Big Analytics Appliance and since then we have had the opportunity to showcase the appliance to various customers, prospects, partners, analysts, journalists etc. We are pleased to report that since the launch the appliance has already received the “Ventana Big Data Technology of the Year” award and has been well received by industry experts and customers alike.

Over the past two months, starting with the launch tweetchat, we have received numerous enqueries around the appliance and think now is a good time to answer the top 10 most frequently asked questions about the new Teradata Aster offering. Without further ado here are the top 10 questions and their answers:

WHAT IS THE TERADATA ASTER BIG ANALYTICS APPLIANCE?

The Aster Big Analytics Appliance is a powerful, ready to-run platform that is pre-configured and optimized specifically for big data storage and analysis. A purpose built, integrated hardware and software solution for analytics at big data scale, the appliance runs Teradata Aster patented SQL-MapReduce® and SQL-H technology on a time-tested, fully supported Teradata hardware platform. Depending on workload needs, it can be exclusively configured with Aster nodes, Hortonworks Data Platform (HDP) Hadoop nodes, or a mixture of Aster and Hadoop nodes. Additionally, integrated backup nodes are available for data protection and high availability

WHO WILL BENEFIT MOST BY DEPLOYING THE APPLIANCE?

The appliance is designed for organizations looking for a turnkey integrated hardware and software solution to store, manage and analyze structured and unstructured data (ie: multi-structured data formats). The appliance meets the needs of both departmental and enterprise-wide buyers and can scale linearly to support massive data volumes.

WHY DO I NEED THIS APPLIANCE?

This appliance can help you gain valuable insights from all of your multi-structured data. Using these insights, you can optimize business processes to reduce cost and better serve your customers. More importantly, these insights can help you innovate by identifying new markets, new products, new business models etc. For example, by using the appliance a telecommunications company can analyze multi-structured customer interaction data across multiple channels such as web, call center and retail stores to identify the path customers take to churn. This insight can be used proactively to increase customer retention and improve customer satisfaction.

WHAT’S UNIQUE ABOUT THE APPLIANCE?

The appliance is an industry first in tightly integrating SQL-MapReduce®, SQL-H and Apache Hadoop. The appliance delivers a tightly integrated hardware and software solution to store, manage and analyze big data. The appliance delivers integrated interfaces for analytics and administration, so all types of multi-structured data can be quickly and easily analyzed through SQL based interfaces. This means that you can continue to use your favorite BI tools and all existing skill sets while deploying new data management and analytics technologies like Hadoop and MapReduce. Furthermore, the appliance delivers enterprise class reliability to allow technologies like Hadoop to now be used for mission critical applications with stringent SLA requirements.

WHY DID TERADATA BRING ASTER & HADOOP TOGETHER?

With the Aster Big Analytics Appliance, we are not just putting Aster and Hadoop in the same box. The Aster Big Analytics Appliance is the industry’s first unified big analytics appliance, providing a powerful, ready to run big analytics and discovery platform that is pre-configured and optimized specifically for big data analysis. It provides intrinsic integration between the Aster Database and Apache Hadoop, and we believe that customers will benefit the most by having these two systems in the same appliance.

Teradata’s vision stems from the Unified Data Architecture. The Aster Big Analytics Appliance offers customers the flexibility to configure the appliance to meet their needs. Hadoop is best for capture, storing and refining multi-structured data in batch whereas Aster is a big analytics and discovery platform that helps derive new insights from all types of data. Hadoop is best for capture, storing and refining multi-structured data in batch. Depending on the customer’s needs, the appliance can be configured with all Aster nodes, all Hadoop nodes or a mix of the two.

WHAT SKILLS DO I NEED TO DEPLOY THE APPLIANCE?

The Aster Big Analytics appliance is an integrated hardware and software solution for big data analytics, storage, and management, which is also designed as a plug and play solution that does not require special skill sets.

DOES THE APPLIANCE MAKE DATA SCIENTISTS OR DATA ANALYSTS IRRELEVANT?

Absolutely not. By integrating the hardware and software in an easy to use solution and providing easy to use interfaces for administration and analytics, the appliance allows data scientists to spend more time analyzing data.

In fact, with this simplified solution, your data scientists and analysts are freed from the constraints of data storage and management and can now spend their time on value added insights generation that ultimately leads to a greater fulfillment of your organization’s end goals.

HOW IS THE APPLIANCE PRICED?

Teradata doesn’t disclose product pricing as part of its standard business operating procedures. However, independent research conducted by industry analyst Dr. Richard Hackathorn, president and founder, Bolder Technology Inc., confirms that on a TCO and Time-to-Value basis the appliance presents a more attractive option vs. commonly available do-it-yourself solutions. http://teradata.com/News-Releases/2012/Teradata-Big-Analytics-Appliance-Enables-New-Business-Insights-on--All-Enterprise-Data/

WHAT OTHER ASTER DEPLOYMENT OPTIONS ARE AVAILABLE?

Besides deploying via the appliance, customers can also acquire and deploy Aster as a software only solution on commodity hardware] or in a public cloud.

WHERE CAN I GET MORE INFORMATION?

You can learn more about the Big Analytics Appliance via http://asterdata.com/big-analytics-appliance/  – home to release information, news about the appliance, product info (data sheet, solution brief, demo) and Aster Express tutorials.

 

Join the conversation on Twitter for additional Q&A with our experts:

Manan Goel @manangoel | Teradata Aster @asterdata

 

For additional information please contact Teradata at http://www.teradata.com/contact-us/