Unified Data Architecture

LA kicks off the 2014 Teradata User Group Season

Posted on: April 22nd, 2014 by Guest Blogger No Comments

 

By Rob Armstrong,  Director, Teradata Labs Customer Briefing Team

After presenting for years at the Teradata User Group meetings, it was refreshing to see some changes in this roadshow.  While I had my usual spot on the agenda to present Teradata’s latest database release (15.0), we had some hot new topics including Cloud and Hadoop, some more business level folks were there, more companies researching Teradata’s technology (vs. just current users) and there was a hands-on workshop the following day for the more technically inclined looking to walk through real world Unified Data Architecture™ (UDA) use cases of a Teradata customer.  While LA tends to be a smaller venue than most, the room was packed and we had 40% more attendees compared with last year.

I would be remiss if I did not give a big Thanks to the partner sponsors of the user group meeting.  In LA we had Hortonworks and Dot Hill as our gold and silver sponsors.  I took a few minutes to chat with them and found out some interesting upcoming items.  Most notably, Lisa Sensmeier from Hortonworks talked to me about Hadoop Summit which is coming up in San Jose, June 3-5th.  Jim Jonez, from Dot Hill, gave me the latest on their newest “Ultra Large” disk technology where they’ll have 48 1 TB drives in a single 2U rack.  It is not in the Teradata line up yet, but we are certainly intrigued for the proper use case.

Now, I’d like to take a few minutes to toot my own horn about the Teradata Database 15.0 presentation that had some very exciting elements to help change the way users get to and analyze all of their data.  You may have seen the recent news releases, but if not, here is a quick recap:

  • Teradata 15.0 continues our Unified Data architecture with the new Teradata QueryGrid.  This is the new environment to define and access data from Teradata to other data servers such as Apache Hadoop (Hortonwoks), Teradata Aster Discovery Platform, Oracle, and others.  This lays the foundation for an extension to even more foreign data servers.  15.0 simplifies the whole definition and usage as well as added bi-directional and predicate pushdown.  In a related session, Cesar Rojas provided some good recent examples of customers taking advantage of the entire UDA ecosystem where data from all of the Teradata offerings were used together to generate new actions.
  • The other big news in 15.0 is the inclusion of the JSON data type.  This allows customers to store direct JSON documents in a column and then apply “schema on read” for much greater flexibility with greatly reduced IT resources.  As the JSON document changes, there is no table or database changes necessary to absorb the new content.

Keep your eyes and ears open for the next Teradata User Group event coming your way, or better yet, just go to the webpage: http://www.teradata.com/user-groups/ to see where the bus stops next and to register.  The TUGs are free of charge.  Perhaps we’ll cross paths as I make the circuit? Until then, ‘Keep Calm and Analyze On’ (as the cool kids say).

 Since joining Teradata in 1987, Rob Armstrong has worked in all areas of the data warehousing arena.  He has gone from writing and supporting the database code to implementing and managing systems at some of Teradata’s largest and most innovative customers.  Currently Rob provides sales and marketing support by traveling the globe and evangelizing the Teradata solutions.

 

The best Strata session that I attended was the overview Kurt Brown gave of the Netflix data platform, which contained hype-deflating lessons and many chestnuts of tech advice straight from one of the most intense computing environments on the planet.

Brown, who as a director leads the design and implementation of the data platform, had a cheerful demeanor but demonstrated ruthless judgment and keen insight in his assessment of how various technologies serve the goals of Netflix. It was interesting to me how dedicated he was to both MPP SQL technology and to Apache™ Hadoop.

I attended the session with Daniel Graham, Technical Marketing Specialist of Teradata, who spoke with me afterward about the implications of the Netflix architecture and Brown’s point of view.

SQL Vs Hadoop
Brown rejected the notion that it was possible to build a complete data platform exclusively using either SQL technology or Hadoop alone. In his presentation, Brown explained how Netflix made great use of Hadoop, used Hive for various purposes, and had an eye on Presto, but also couldn’t live without Teradata and Microstrategy as well.

Brown recalled a conversation in which another leader of a data platform explained that he was discarding all his data warehouse technology and going to put everything on Hive. Brown’s response, “Why would you ever want to do that?”

While Brown said he enjoyed the pressure that open source puts on commercial vendors to improve, he was dedicated to using whatever technology could provide answers to questions in the most cost-effective manner. Brown said he was especially pleased that Teradata was going to be able to support a cloud-based implementation that could run at scale. Brown said that Netflix had upwards of 5 petabytes of data in the cloud, all stored on Amazon S3.

After the session, I pointed out to Graham that the pattern in evidence at Netflix and most of the companies who are acknowledged as the leaders in big data, mimics the recommendation of the white paper “Optimize the Value of All Your Enterprise Data” that provides an overview of the Teradata Unified Data Architecture™.

The Unified Data Architecture recommends that that the data that has the most “business value density” be stored in an enterprise data warehouse powered by MPP SQL. This data is used most often by the most users. Hadoop is used as a data refinery to process flat files or NoSQL data in batch mode.

Netflix is a big data companies that arrived at this pattern by adding SQL to a Hadoop infrastructure. Many well-known users of huge MPP SQL installations have added Hadoop.

“Data doesn’t stay unstructured for long. Once you have distilled it, it usually has a structure that is well-represented by flat files,” said Teradata's Graham. “This is the way that the canonical model of most enterprise activity is stored. Then the question is: How you ask questions of that data? There are numerous ways to make this easy for users, but almost all of those ways pump out SQL that then is used to grab the data that is needed.”

Replacing MPP SQL with Hive or Presto is a non-starter because to really support hundreds or thousands of users who are pounding away at a lot of data, you need a way to provide speedy and optimized queries and also to manage the consumption of the shared resources.

“For over 35 years, Teradata has been working on making SQL work at scale for hundreds or thousands of people at a time,” said Graham. “It makes perfect sense to add SQL capability to Hadoop, but it will be a long time, perhaps a decade or more, before you will get the kind of query optimization and performance that Teradata provides. The big data companies use Teradata and other MPP SQL systems because they are the best tool for the job for making huge datasets of high business value density available to an entire company.”

Efforts such as Tez and Impala will clearly move Hive’s capability forward. The question is how far forward and how fast. We will know that victory has been achieved when Netflix, which uses Teradata in a huge cloud implementation, is able to support their analytical workloads with other technology.

Graham predicts that in 5 years, Hadoop will be a good data mart but will still have trouble with complex parallel queries.

“It is common for a product like Microstrategy to pump out SQL statements that may be 10, 20, or even 50 pages long,” said Graham. “When you have 5 tables, the complexity of the queries could be 5 factorial. With 50 tables, that grows to 50 factorial. Handling such queries is a 10- or 20-year journey. Handling them at scale is a feat that many companies can never pull off.”

Graham acknowledges the need for an MPP SQL data warehouse extended to support data discovery, e.g. Teradata Aster Discovery Platform, along with the extensions for using Hadoop and graph analytics through enhanced SQL, is needed by most businesses.

Teradata is working to demonstrate that the power of this collection of technology can address some of the unrealistic enthusiasm surrounding Hadoop.

By: Dan Woods, Forbes Blogger and Co-Founder of Evolved Media

 

In years past, Strata has celebrated the power of raw technology, so it was interesting to note how much the keynotes on Wednesday focused on applications, models, and how to learn and change rather than on speeds and feeds.

After attending the keynotes and some fascinating sessions, it seems clear that the blinders are off. Big data and data science have been proven in practice by many innovators and early adopters. The value of new forms of data and methods of analysis are so well established that there’s no need for exaggerated claims. Hadoop can do so many cool things that it doesn’t have to pretend to do everything, now or in the future. Indeed, the pattern in place at Facebook, Netflix, the Obama Campaign, and many other organizations with muscular data science and engineering departments is that MPP SQL and Hadoop sit side by side, each doing what they do best.

In his excellent session, Kurt Brown, Director, Data Platform at Netflix, recalled someone explaining that his company was discarding its data warehouse and putting everything on Hive. Brown responded, “Why would you want to do that?” What was obvious to Brown, and what he explained at length, is that the most important thing any company can do is assemble technologies and methods that serve its business needs. Brown demonstrated the logic of creating a broad portfolio that serves many different purposes.

Real Value for Real People
The keynotes almost all celebrated applications and models. Vendors didn’t talk about raw power, but about specific use cases and ease-of-use. Farrah Bostic, a marketing and product design consultant, recommended ways to challenge assumptions and create real customer intimacy. This was a key theme: Use the data to understand a person in their terms not yours. Bostic says you will be more successful if you focus on creating value for the real people who are your customers instead of extracting value from some stilted and limited model of a consumer. A skateboarding expert and a sports journalist each explained models and practices for improving performance. This is a long way from the days when a keynote would show a computer chewing through a trillion records.

Geoffrey Moore, the technology and business philosopher, was in true provocative form. He asserted that big data and data science are well on their way to crossing the chasm because so many upstarts pose existential threats to established businesses. This pressure will force big data to cross the chasm and achieve mass adoption. His money quote: "Without big data analytics, companies are blind and deaf, wandering out onto the Web like deer on the freeway.”

An excellent quote to be sure, but it goes too far. Moore would have been more accurate and less sensational if he said, “Without analytics,” not “Without big data analytics.” The reason that MPP SQL and Hadoop have made such a perfect pair is because more than one type of data and method of analysis is needed. Every business needs all the relevant data it can get to understand the people it does business with.

The Differentiator: A Culture of Analytics
The challenge I see companies facing lies in creating a culture of analytics. Tom Davenport has been a leader in promoting analytics as a means to competitive advantage. In his keynote at Strata Rx in September 2013, Davenport stressed the importance of integration.

In his session at Strata this year, Bill Franks, Chief Analytics Officer at Teradata, put it quite simply, "Big data must be an extension of an existing analytics strategy. It is an illusion that big data can make you an analytics company."

When people return from Strata and roll up their sleeves to get to work, I suspect that many will realize that it’s vital to make use of all the data in every way possible. But one person can only do so much. For data to have the biggest impact, people must want to use it. Implementing any type of analytics provides supply. Leadership and culture create demand. Companies like CapitalOne and Netflix don’t do anything without looking at the data.

I wish there were a shortcut to creating a culture of analytics, but there isn’t, and that’s why it’s such a differentiator. Davenport’s writings are probably the best guide, but every company must figure this out based on its unique situation.

Supporting a Culture of Analytics
If you are a CEO, your job is to create a culture of analytics so that you don’t end up like Geoffrey Moore’s deer on the freeway. But if you have Kurt Brown’s job, you must create a way to use all the data you have, to use the sweet spot of each technology to best effect, and to provide data and analytics to everyone who wants them.

At a company like Netflix or Facebook, creating such a data supply chain is a matter of solving many unique problems connected with scale and advanced analytics. But for most companies, common patterns can combine all the modern capabilities into a coherent whole.

I’ve been spending a lot of time with the thought leaders at Teradata lately and closely studying their Unified Data Architecture. Anyone who is seeking to create a comprehensive data and analytics supply chain of the sort in use at leading companies like Netflix should be able to find inspiration in the UDA, as described in a white paper called “Optimizing the Business Value of All Your Enterprise Data.”

The paper does excellent work in creating a framework for data processing and analytics that unifies all the capabilities by describing four use cases: the file system, batch processing, data discovery, and the enterprise data warehouse. Each of these use cases focuses on extracting value from different types of data and serving different types of users. The paper proposes a framework for understanding how each use case creates data with different business value density. The highest volume interaction takes place with data of the highest business value density. For most companies, this is the enterprise data warehouse, which contains a detailed model of all business operations that is used by hundreds or thousands of people. The data discovery platform is used to explore new questions and extend that model. Batch processing and processing of data in a file system extract valuable signals that can be used for discovery and in the model of the business.

While this structure doesn’t map exactly to that of Netflix or Facebook, for most businesses, it supports the most important food groups of data and analytics and shows how they work together.

The refreshing part of Strata this year is that thorny problems of culture and context are starting to take center stage. While Strata will always be chock full of speeds and feeds, it is even more interesting now that new questions are driving the agenda.

By: Dan Woods, Forbes Blogger and Co-Founder of Evolved Media

Open Your Mind to all the Data

Posted on: February 14th, 2014 by Guest Blogger No Comments

 

In technology discussions, we often forget who the big winners are. When you look at almost any technology market, if a vendor makes a certain amount, the systems integrator may make 4 or 5 times that amount selling services to configure and adapt the solution. Of course, the biggest winner is the company using the technology, with a return of 10X, 20X, or even 100X. This year, I’m attending Strata and trying to understand how to hunt down and capture that 100X return.

I’m not alone in this quest. In the past Strata was mostly a technology conference, focused on vendor or systems integrator return. But this year, it is clear that the organizers realized they must find a way to focus on the 100X return, as evidenced by the Data-driven Business track. If Strata becomes an event where people can identify and celebrate the pathway to the 100X return, attendance will skyrocket.

Most of the time the 100X return comes from using data that only applies to a specific business. If you realize those kind of returns, are you really going to come to Strata to talk about it? I don’t think so. So we won’t find sessions that provide a complete answer for your business. You will only get parts. The challenge when attending a conference like Strata is how to find all the parts, put them together, and come home with ideas for getting that 100X return.

My answer is to focus on questions, then on data, and then on technology that can find answers to the questions. There are many ways to run this playbook, but here’s one way to make it work.

Questions Achieve Focus
The problem that anyone attending Strata or any large conference faces is the huge array of vendors and sessions. One way to guide exploration is through curiosity. This maintains enthusiasm during the conference but may not give you a 100X idea. Another way is to begin with a predetermined problem in mind. This is a good approach, but it may cut off avenues that lead to a 100X result.

Remember: 100X results are usually found through experimentation. Everything with an obvious huge return is usually already being done. You have to find a way to be open to new ideas that are focused on business value. Here’s one way I’ve thought of, called the Question Game.

The Question Game is a method for aligning technology with the needs of a business. It is also a great way to organize a trip to a conference like Strata, where the ratio of information to marketing spin is high.

I came up with the Question Game while reading Walter Isaacson’s biography of Steve Jobs. Two things struck me. First was the way that Jobs and Tim Cook cared a lot about focus. Jobs held an annual offsite attended by about 100 of Apple’s top people. At the end of the session, Jobs would write up the 10 most important initiatives and then cross out 7 so the team could focus on executing just 3. Both Jobs and Cook were proud of saying no to ideas that detracted from their focus.

Here’s how Tim Cook describes it: “We are the most focused company that I know of, or have read of, or have any knowledge of. We say no to good ideas every day. We say no to great ideas in order to keep the amount of things we focus on very small in number, so that we can put enormous energy behind the ones we do choose, so that we can deliver the best products in the world.”

Because I focus most of my time figuring out how technology leaders can make magic in business, I was eager to find away to empower people to say no. At the same time, I wanted room for invention and innovation. Is it possible to explore all the technology and data out there in an efficient way focused on business needs? Yes, using the Question Game. Here’s how it works:

  • The CIO, CTO, or whoever is leading innovation surveys each business leader, asking for questions they would like to have answered and the business value of answering them
  • Questions are then ranked by value, whether monetary or otherwise

The Question Game provides a clear way for the business to express its desires. With these questions in mind, Strata becomes a hunting ground far more likely to yield a path to 100X ideas. To prepare for this hunt, list all the questions and look at the agenda to find relevant sessions.

It’s the Data, All the Data
With your questions in hand, keep reminding yourself that all the technology in the world won’t answer the questions on your list. Usually, your most valuable questions will be answered by acquiring or analyzing data. The most important thing you can do is look for sources of data to answer the high value questions.

The first place to look for data is inside your company. In most companies, it is not easy to find all the available data and even harder to find additional data that could be available with a modest effort.

Too often the search for data stops inside the company. As I pointed out in “Do You Suffer from the Data Not Invented Here Syndrome?” many companies have blinders to external data sources, which can be an excellent way to shed light on high value questions. Someone should be looking for external data as well.

Remember, it is vital to keep an open mind. The important thing is to find data that can answer high value questions, not to find any particular type of data. Strata is an excellent place to do this. Some sessions shed light on particular types of data, but I’ve found that working the room, asking people what kinds of data they use, and showing them your questions can lead to great ideas.

Finding Relevant Technology
Once you have an idea about the important questions and the relevant data, you will be empowered to focus. When you tour the vendors, you can say no with confidence when a technology doesn’t have a hope of processing the relevant data to answer a high value question. Of course, it is impossible to tell from a session description or a vendor’s website if they will be relevant to a specific question. But by having the questions in mind and showing them to people you talk to, you will greatly speed the path to finding what you need. When others see your questions, they will have suggestions you didn’t expect.

Will this method guarantee you a 100X solution every time you attend Strata? I wouldn’t make that claim. But by following this plan, you have a much better chance for a victory.

While I'm at Strata, I'll be playing the question game so I can quickly and effectively learn a huge amount about the fastest path to a data-driven business.

By: Dan Woods, Forbes Blogger and Co-Founder of Evolved Media

To Learn More:

Optimize the Business Value of All Your Enterprise Data (white paper)
Bring Dark Data into the Light (infographic)
Benefit from Any and All Data (article)
To Succeed with Big Data, Begin with the Decision in Mind (article)

Anna Littick and the Unified Data Architecture — Part 2

Posted on: October 16th, 2013 by Dan Graham 1 Comment

 

Ring ring ringtone.
Dan: “Hello. This is Dan at Teradata. How can I help you today?”

Anna: “Hi Dan. It’s Anna Littick from Sunshine-Stores calling again. Can we finish our conversation?”

Dan: “Oh yeah, hi Anna. Sure. Where did we leave off?”

Anna: “Well, you remember our new CFO – Xavier Money -- wants us to move everything to Hadoop because he thinks it’s all free. You and I were ticking through his perceptions.”

Dan: “Yes. I think got through the first two but not number 3 and 4. Here’s what I remember:
1. Hadoop replaces the data warehouse
2. Hadoop is a landing zone and archive
3. Hadoop is a database
4. Hadoop does deep analytics.”

Anna: “Yep. So how do I respond to Xavier about those two?”

Dan: “Well, I guess we should start with ‘what is a database?’ I’ll try to keep this simple. A database has these characteristics:
• High performance data access
• Robust high availability
• A data model that isolates the schema from the application
• ACID properties

There’s a lot more to a database but these are the minimums. High speed is the name of the game for databases. Data has to be restructured, indexed, with a cost-based optimizer to be fast. Hive and Impala does a little restructuring of data but are a long way off from sophisticated indexes, partitioning, and optimizers. Those things take many years – each. For example, Teradata Database has multiple kinds of indexes like join indexes, aggregate indexes, hash indexes, and sparse indexes.”

Anna: “Ouch. What about the other stuff? Does Hive or Impala have that?”

Dan: “Well, high performance isn’t interesting if the data is not available. Between planned and unplanned downtime, a database has to hit 99.99% uptime or better to be mission critical. That’s roughly 53 minutes of downtime a year. Hundreds of hardware, software, and installation features have to mature to get there. I’m guessing a well-built Hadoop cluster is around 99% uptime. But just running out of memory in an application causes the cluster to crash. There’s a lot of work to be done in Hadoop.”

“Second, isolating the application programs from the schema is opposite Hadoop’s strategic direction of schema-on-read. They don’t want fixed data types and data rules enforcement. On the upside this means Hadoop has a lot of flexibility – especially with complex data that changes a lot. On the downside, we have to trust every programmer to validate and transform every data field correctly at runtime. It’s dangerous and exciting at the same time. The schema-on-read works great with some kinds of data, but the majority of data works better with a fixed schema.”

Anna: “I’ll have to think about that one. I like the ‘no rules’ flexibility but I don’t like having to scrub the incoming data every time. I already spend too much time preparing data for predictive analytics.”

Dan: “Last is the ACID properties. It’s a complex topic you should look at on Wikipedia. It boils down to trusting the data as it’s updated. If a change is made to an account balance, ACID ensures all the changes are applied or none, that no one else can change it at the same time you do, and that the changes are 100% recoverable across any kind of failure. Imagine you and your spouse at an ATM withdrawing $500 when there’s only $600 in the account. The database can’t give both of you $500 –that’s the ACID at work. Neither Hadoop, Hive, Impala, nor any other project has plans to build the huge ACID infrastructure and become a true database. Hadoop system isn’t so good at updating data in place. ”

“According to Curt Monash ‘Developing a good DBMS requires 5-7 years and tens of millions of dollars. That’s if things go extremely well. 1’ ”

Anna: “OK, Hadoop and Hive and Impala aren’t a database. So what? Who cares what you call it?”

Dan: “Well, a lot of end users, BI tools, ETL tools, and skills are expecting Hadoop to behave like a database. That’s not fair. It was not built for that purpose. Hadoop lacks a lot of functionality not being a database but it forces Hadoop to innovate and differentiate its strengths. Let’s not forget Hadoop’s progress in basic search indexing, archival of cold data, simple reporting at scale, and image processing. We’re at the beginning of a lot of innovation and it’s exciting.”

Anna: “OK. I’ll trust you on that. What about deep analytics? That’s what I care about most.”

Dan: “So Anna, off the record, you being a data scientist and all that. Do people tease you about your name? I mean Anna Littick the data scientist? I Googled you and you’re not the only one. ”

Anna: “Yes. Some guys around here think it’s funny. Apparently childishness isn’t limited to children. So during meetings I throw words at them like Markov Chains, Neural Networks, and edges in graph partitions. They pretend to understand --they nod a lot. Those guys never tease me again. [laugh]”

Dan: “Hey, those advanced analytics you mentioned are powerful stuff. You should hear David Simmen talk at our PARTNERS conference on Sunday. He’s teaching about our new graph engine that handles millions of vertices and billions of edges. It sounds like you would enjoy it.”

Anna: “Well, it looks like have approval to go, especially since PARTNERS is here in Dallas. Enough about me. What about deep analytics in Hadoop?”

Dan: “Right. OK, well first I have to tell you we do a lot of predictive and prescriptive analytics in-database with Teradata. I suspect you’ve been using SAS algorithms in-database already. The parallelism makes a huge difference in accuracy. What you probably haven’t seen is our Aster Database where you can run map-reduce algorithms under the control of SQL for fast, iterative discovery. It can run dozens of complex analytic algorithms including map-reduce algorithms in parallel. And we just added the graph engine in our 6.0 release. I mentioned. And one thing it does that Hadoop doesn’t is you can use your BI tools, SAS procs, and map-reduce all in one SQL statement. It’s ultra cool.”

Anna: “OK. I think I’ll go to David’s session. But what about Hadoop? Can it do deep analytics?”

Dan: “Yes. Both Aster and Hadoop can run complex predictive and prescriptive analytics in parallel. They can both do statistics, random forests, Markov Chains, and all the basics like naïve Bayes and regressions. If an algorithm is hard to do in SQL, these platforms can handle it.”

Anna [impatient]: “OK. I’ll take the bait. What’s the difference between Aster and Hadoop?”

Dan: “Well, Aster has a database underneath its SQL-MapReduce so you can use the BI tools interactively. There is also a lot of emphasis on behavioral analysis so the product has things like Teradata Aster nPath time-series analysis to visualize patterns of behavior and detect many kinds of consumer churn events or fraud. Aster has more than 80 algorithms packaged with it as well as SAS support. Sorry, I had to slip that Aster commercial in. It’s in my contract --sort of. Maybe. If I had a contract.”

Anna: “ And what about Hadoop?”

Dan: “Hadoop is more of a do-it-yourself platform. There are tools like Apache Mahout2 for data mining. It doesn’t have as many algorithms as Aster so you often find yourself getting algorithms from University research or GitHub and implementing them yourself. Some Teradata customers have implemented Markov Chains on Hadoop because it’s much easier to work with than SQL for that kind of algorithm. . So data scientists have more tools than ever with Teradata in-database algorithms, Aster SQL-MapReduce, SAS, and Hadoop/Mahout and others. That’s what our Unified Data Architecture does for you – it matches workloads to the best platform for that task.”

Anna: “OK. I think I’ve got enough information to help our new CFO. He may not like me bursting his ‘free-free-free’ monastic chant. But just because we can eliminate some initial software costs doesn't mean we will save any money. I’ve got to get him thinking of the big picture for big data. You called it UDA, right?”

Dan: “Right. Anna, I’m glad I could help, if only just a little. And I’ll send you a list of sessions at Teradata PARTNERS where you can hear from experts about their Hadoop implementations – and Aster. See you at PARTNERS.”

Title

Company

Day

Time

Comment

Aster Analytics: Delivering results with R Desktop

Teradata

Sun

9:30

RevolutionR

Do’s and Don’ts of using Hadoop in practice

Otto

Sun

1:00

Hadoop

Graph Analysis with Teradata Aster Discovery Platform

Teradata

Sun

2:30

Graph

Hadoop and the Data Warehouse: When to use Which

Teradata

Sun

4:00

Hadoop

The Voices of Experience: A Big Data Panel of Experts

Otto, Wells Fargo

Wed

9:30

Hadoop

An Integrated Approach to Big Data Analytics using Teradata and Hadoop

PayPal

Wed

11:00

Hadoop

TCOD: A Framework for the Total Cost of Big Data

WinterCorp

Wed

11:00

Costs

 1 Curt Monash, DBMS development and other subjects, March 18, 2013

 

About one year ago, Teradata Aster launched a powerful new way of integrating a database with Hadoop. With Aster SQL-H™, users of the Teradata Aster Discovery Platform got the ability to issue SQL and SQL-MapReduce® queries directly on Hadoop data as if that data had been in Aster all along. This level of simplicity and performance was unprecedented, and it enabled BI & SQL analysts that knew nothing about Hadoop to access Hadoop data and discover new information through Teradata Aster.

This innovation was not a one-off. Teradata has put forward the most complete vision for a data and analytics architecture in the 21st century. We call that the Unified Data Architecture™. The UDA combines Teradata, Teradata Aster & Hadoop into a best-of-breed, tightly integrated ecosystem of workload-specific platforms that provide customers the most powerful and cost-effective environment for their analytical needs. With Aster SQL-H™, Teradata provided a level of software integration between Aster & Hadoop that was, and still is, unchallenged in the industry.

 

Teradata Unified Data Architecture™ image

Teradata Unified Data Architecture™

Today, Teradata makes another leap in making its Unified Data Architecture™ vision a reality. We are announcing SQL-H™ for Teradata, bringing the best SQL engine for data warehousing and analytics to Hadoop. From now on, Enterprises that use Hadoop to store large amounts of data will be able to utilize Teradata's analytics and data warehousing capabilities to directly query Hadoop data securely through ANSI standard SQL and BI tools by leveraging the open source Hortonworks HCatalog project. This is fundamentally the best and tightest integration between a data warehouse engine and Hadoop that exists in the market today. Let me explain why.

It is interesting to consider Teradata's approach versus alternatives. If one wants to execute SQL on Hadoop, with the intent of building Data Warehouses out of Hadoop data, there are not many realistic options. Most databases have a very poor integration with Hadoop, and require Hadoop experts to manage the overall system - not a viable option for most Enterprises due to cost. SQL-H™ removes this requirement for Teradata/Hadoop deployments. Another "option" are the SQL-on-Hadoop tools that have started to emerge; but unfortunately, there are about a decade away from becoming sufficiently mature to handle true Data Warehousing workloads. Finally, the approach of taking a database and shoving it inside Hadoop has significant issues since it suffers from the worst of both worlds – Hadoop activity has to be limited so that it doesn't disrupt the database, data is duplicated between HDFS and the database store, and performance of the database is less compared to a stand–alone version.

In contrast, a Teradata/Hadoop deployment with SQL-H™ offers the best of both worlds: unprecedented performance and reliability in the Teradata layer; seamless BI & SQL access to Hadoop data via SQL-H™; and it frees up Hadoop to perform data processing tasks at full efficiency.

Teradata is committed to being the strategic advisor of the Enterprise when it comes to Data Warehousing and Big Data. Through its Unified Data Architecture™ and today's announcement on Teradata SQL-H™, it provides even more performance, flexibility and cost-effective options to Enterprises eager to use data as a competitive advantage.

Teradata and Informatica Data Integration Optimization

Posted on: March 6th, 2013 by Sam Tawfik No Comments

 

Teradata and Informatica created a joint offering to optimize data integration processes by leveraging the right platform for the right transformation job. The offering is based on the powerful Teradata analytical infrastructure called the Teradata Unified Data Architecture (UDA) (which includes Teradata, Teradata Aster, and Hadoop) with the combination of the Informatica PowerCenter Big Data Edition.

Today, customers are creating new processes to manage and integrate Big Data for deeper and richer analytics. Customers are also evaluating options to optimize their data integration processes for faster deployment and scalable performance at reduced costs. And because we are talking about Big Data; Hadoop is being increasingly considered for data integration processing.

But as is always the case when deploying or evaluating new technologies it is important to understand the current business and technical challenges in order to ensure the deployment of the appropriate solution. Some of the business challenges include the increasing cost of data processing as data volumes grow and new types of data emerge and the risk of deploying the wrong platform. Technical challenges related to big data analytics include finding the right skills, the complexity of deploying new technologies into production data centers, the need for more real-time data, and the lack of enterprise capabilities such as security.

The new Teradata and Informatica offering helps customers optimize their data integration processes and deploy a single Unified Data Architecture. The new offering allows companies to design data integration processes once and deploy on any Teradata platform with up to 5x productivity gains using the resource skills they have today. In this architecture customers can process data from the traditional transaction systems as well as unstructured text and interaction data such as from social media and web logs. Customers may choose to run data integration processes on Teradata Platforms, Hadoop, or Informatica servers.

Teradata Unified Architecture with Data Integration Optimization

The key benefits for deploying the Teradata Unified Data Architecture with the Informatica PowerCenter Big Data Edition is to increase productivity up to 5x over hand-coding and up to half the labor cost, minimize the risk of Big Data projects and protect against poor and sub-optimal decisions. The offering allows the staff to focus on meeting the business requirements and not on hand-coding transformation logic so customers can innovate faster and generate new revenue streams.

Teradata and Informatica provide service assessments offerings to help customers get started quickly and make smart decisions about data integration optimizations reducing the risk of offloading the wrong work onto the wrong platform while reducing labor costs and delivering faster performance.

See the joint Informatica and Teradata press release link

Sam Tawfik

 

2 months & 10 questions on new Aster Big Analytics Appliance

Posted on: December 18th, 2012 by Teradata Aster No Comments

 

It’s been about two months since Teradata launched the Aster Big Analytics Appliance and since then we have had the opportunity to showcase the appliance to various customers, prospects, partners, analysts, journalists etc. We are pleased to report that since the launch the appliance has already received the “Ventana Big Data Technology of the Year” award and has been well received by industry experts and customers alike.

Over the past two months, starting with the launch tweetchat, we have received numerous enqueries around the appliance and think now is a good time to answer the top 10 most frequently asked questions about the new Teradata Aster offering. Without further ado here are the top 10 questions and their answers:

WHAT IS THE TERADATA ASTER BIG ANALYTICS APPLIANCE?

The Aster Big Analytics Appliance is a powerful, ready to-run platform that is pre-configured and optimized specifically for big data storage and analysis. A purpose built, integrated hardware and software solution for analytics at big data scale, the appliance runs Teradata Aster patented SQL-MapReduce® and SQL-H technology on a time-tested, fully supported Teradata hardware platform. Depending on workload needs, it can be exclusively configured with Aster nodes, Hortonworks Data Platform (HDP) Hadoop nodes, or a mixture of Aster and Hadoop nodes. Additionally, integrated backup nodes are available for data protection and high availability

WHO WILL BENEFIT MOST BY DEPLOYING THE APPLIANCE?

The appliance is designed for organizations looking for a turnkey integrated hardware and software solution to store, manage and analyze structured and unstructured data (ie: multi-structured data formats). The appliance meets the needs of both departmental and enterprise-wide buyers and can scale linearly to support massive data volumes.

WHY DO I NEED THIS APPLIANCE?

This appliance can help you gain valuable insights from all of your multi-structured data. Using these insights, you can optimize business processes to reduce cost and better serve your customers. More importantly, these insights can help you innovate by identifying new markets, new products, new business models etc. For example, by using the appliance a telecommunications company can analyze multi-structured customer interaction data across multiple channels such as web, call center and retail stores to identify the path customers take to churn. This insight can be used proactively to increase customer retention and improve customer satisfaction.

WHAT’S UNIQUE ABOUT THE APPLIANCE?

The appliance is an industry first in tightly integrating SQL-MapReduce®, SQL-H and Apache Hadoop. The appliance delivers a tightly integrated hardware and software solution to store, manage and analyze big data. The appliance delivers integrated interfaces for analytics and administration, so all types of multi-structured data can be quickly and easily analyzed through SQL based interfaces. This means that you can continue to use your favorite BI tools and all existing skill sets while deploying new data management and analytics technologies like Hadoop and MapReduce. Furthermore, the appliance delivers enterprise class reliability to allow technologies like Hadoop to now be used for mission critical applications with stringent SLA requirements.

WHY DID TERADATA BRING ASTER & HADOOP TOGETHER?

With the Aster Big Analytics Appliance, we are not just putting Aster and Hadoop in the same box. The Aster Big Analytics Appliance is the industry’s first unified big analytics appliance, providing a powerful, ready to run big analytics and discovery platform that is pre-configured and optimized specifically for big data analysis. It provides intrinsic integration between the Aster Database and Apache Hadoop, and we believe that customers will benefit the most by having these two systems in the same appliance.

Teradata’s vision stems from the Unified Data Architecture. The Aster Big Analytics Appliance offers customers the flexibility to configure the appliance to meet their needs. Hadoop is best for capture, storing and refining multi-structured data in batch whereas Aster is a big analytics and discovery platform that helps derive new insights from all types of data. Hadoop is best for capture, storing and refining multi-structured data in batch. Depending on the customer’s needs, the appliance can be configured with all Aster nodes, all Hadoop nodes or a mix of the two.

WHAT SKILLS DO I NEED TO DEPLOY THE APPLIANCE?

The Aster Big Analytics appliance is an integrated hardware and software solution for big data analytics, storage, and management, which is also designed as a plug and play solution that does not require special skill sets.

DOES THE APPLIANCE MAKE DATA SCIENTISTS OR DATA ANALYSTS IRRELEVANT?

Absolutely not. By integrating the hardware and software in an easy to use solution and providing easy to use interfaces for administration and analytics, the appliance allows data scientists to spend more time analyzing data.

In fact, with this simplified solution, your data scientists and analysts are freed from the constraints of data storage and management and can now spend their time on value added insights generation that ultimately leads to a greater fulfillment of your organization’s end goals.

HOW IS THE APPLIANCE PRICED?

Teradata doesn’t disclose product pricing as part of its standard business operating procedures. However, independent research conducted by industry analyst Dr. Richard Hackathorn, president and founder, Bolder Technology Inc., confirms that on a TCO and Time-to-Value basis the appliance presents a more attractive option vs. commonly available do-it-yourself solutions. http://teradata.com/News-Releases/2012/Teradata-Big-Analytics-Appliance-Enables-New-Business-Insights-on--All-Enterprise-Data/

WHAT OTHER ASTER DEPLOYMENT OPTIONS ARE AVAILABLE?

Besides deploying via the appliance, customers can also acquire and deploy Aster as a software only solution on commodity hardware] or in a public cloud.

WHERE CAN I GET MORE INFORMATION?

You can learn more about the Big Analytics Appliance via http://asterdata.com/big-analytics-appliance/  – home to release information, news about the appliance, product info (data sheet, solution brief, demo) and Aster Express tutorials.

 

Join the conversation on Twitter for additional Q&A with our experts:

Manan Goel @manangoel | Teradata Aster @asterdata

 

For additional information please contact Teradata at http://www.teradata.com/contact-us/