UDA

Take a Giant Step with Teradata QueryGrid

Posted on: April 29th, 2014 by Dan Graham No Comments

 

Teradata 15.0 has gotten tremendous interest from customers and the press because it enables SQL access to native JSON data. This heralds the end of the belief that data warehouses can’t handle unstructured data. But there’s an equally momentous innovation in this release called Teradata QueryGrid.

What is Teradata QueryGrid?
In Teradata’s Unified Data Architecture (UDA), there are three primary platforms: the data warehouse, the discovery platform, and the data platform. The huge gray arrows represent data flowing between these systems. A year or two ago, these arrows were extract files moved in batch mode.

Teradata QueryGrid is both a vision and a technology. The vision --simply said-- is that a business person connected to the Teradata Database or Aster Database can submit a single SQL query that joins data together from a second system for analysis. There’s no need to plead with the programmers to extract data and load it into another machine. The business person doesn’t have to care where the data is – they can simply combine relational tables in Teradata with tables or flat files found in Hadoop on demand. Imagine a data scientist working on an Aster discovery problem and needing data from Hadoop. By simply adjusting the queries she is already using, Hadoop data is fetched and combined with tables in the Aster Database. That should be a huge “WOW” all by itself but let’s look further.

You might be saying “That’s not new. We’ve had data virtualization queries for many years.” Teradata QueryGrid is indeed a form of data virtualization. But Teradata QueryGrid doesn’t suffer from the normal limitations of data virtualization such as slow performance, clogged networks, and security concerns.

Today, the vision is translated into reality as connections between Teradata Database and Hadoop as well as Aster Databases and Hadoop. Teradata QueryGrid also connects the Teradata Data Warehouse to Oracle databases. In the near future, it will extend to all combinations of UDA servers such as Teradata to Aster, Aster to Aster, Teradata to Teradata, and so on.

Seven League Boots for SQL
With QueryGrid, you can add a clause in a SQL statement that says “Call up Hadoop, pass Hive a SQL request, receive the Hive results, and join it to the data warehouse tables.” Running a single SQL statement spanning Hadoop and Teradata is amazing in itself – a giant step forward. Notice too that all the database security, advanced SQL functions, and system management in the Teradata or Aster system is supporting these queries. The only effort required is for the database administrator to set up a “view” that connects the systems. It’s self-service for the business user after that. Score: complexity zero, business users one.

Parallel Performance, Performance, Performance
Historically, data virtualization tools lack the ability to move data between systems in parallel. Such tools send a request to a remote database and the data comes back serially through an Ethernet wire. Teradata QueryGrid is built to connect to remote systems in parallel and exchange data through many network connections simultaneously. Wanna move a terabyte per minute? With the right configurations it can be done. Parallel processing by both systems makes this incredibly fast. I know of no data virtualization system that does this today.

Inevitably, the Hadoop cluster will have a different number of servers compared to the Teradata or Aster MPP systems. The Teradata and Aster systems start the parallel data exchange by matching up units of parallelism between the two systems. That is, all the Teradata parallel workers (called AMPs) connect to a buddy Hadoop worker node for maximum throughput. Anytime the configuration changes, the workers match-up changes. This is non-trivial rocket-science class technology. Trust me – you don’t want to do this yourself and the worst situation would be to expose this to the business users. But Teradata QueryGrid does it all for you completely invisible to the user.

Put Data in the Data Lake FAST
Imagine complex predictive analytics using R® or SAS® are run inside the Teradata data warehouse as part of a merger and acquisition project. In this case, we want to pass this data to the Hadoop Data Lake where it is combined with temporary data from the company being acquired. With moderately simple SQL stuffed in a database view, the answers calculated by the Teradata Database can be sent to Hadoop to help finish up some reports. Bi-directional data exchange is another breakthrough in the Teradata Query Grid, new in release 15.0. The common thread in all these innovations is that the data moves from the memory of one system to the memory of the other. No extracts, no landing the data on disk until the final processing step – and sometimes not even then.

Push-down Processing
What we don’t want to do is transfer terabytes of data from Hadoop and throw away 90% of it since it’s not relevant. To minimize data movement, Teradata QueryGrid sends the remote system SQL filters that eliminate records and columns that aren’t needed. An example constraint could be “We only want records for single women age 30-40 with an average account balance over $5000. Oh, and only send us the account number, account type, and address.” This way, the Hadoop system discards unnecessary data so it doesn’t flood the network with data that will be thrown away. After all the processing is done in Hadoop, data is joined in the data warehouse, summarized, and delivered to the user’s favorite business intelligence tool.

Teradata QueryGrid delivers some important benefits:
• It’s easy to use: any user with any BI tool can do it
• Low DBA labor: it’s mostly setting up views and testing them once
• High performance: reducing hours to minutes means more accuracy and faster turnaround for demanding users
• Cross-system data on demand: don’t get stuck in programmer’s work queue
• Teradata/Aster strengths: security, workload management, system management
• Minimum data movement improves performance and reduces network use
• Move the processing to the data

Big data is now taking giant steps through your analytic architecture --frictionless, invisible, and in parallel. Nice boots!

Teradata’s UDA is to Data as Prius is to Engines

Posted on: November 12th, 2013 by Teradata Aster No Comments

 

I’ve been working in the analytics and database market for 12 years. One of the most interesting pieces of that journey has been seeing how the market is ever-shifting. Both the technology and business trends during these short 12 years have massively changed not only the tech landscape today, but also the future of evolution of analytic technology. From a “buzz” perspective, I’ve seen “corporate initiatives” and “big ideas” come and go. Everything from “e-business intelligence,” which was a popular term when I first started working at Business Objects in 2001, to corporate performance management (CPM) and “the balanced scorecard.” From business process management (BPM) to “big data”, and now the architectures and tools that everyone is talking about.

The one golden thread that ties each of these terms, ideas and innovations together is that each is aiming to solve the questions related to what we are today calling “big data.” At the core of it all, we are searching for the right way to enable the explosion of data and analytics that today’s organizations are faced with, to simply be harnessed and understood. People call this the “logical data warehouse”, “big data architecture”, “next-generation data architecture”, “modern data architecture”, “unified data architecture”, or (I just saw last week) “unified data platform”.  What is all the fuss about, and what is really new?  My goal in this post and the next few will be to explain how the customers I work with are attacking the “big data” problem. We call it the Teradata Unified Data Architecture, but whatever you call it, the goals and concepts remain the same.

Mark Beyer from Gartner is credited with coining the term “logical data warehouse” and there is an interesting story and explanation. A nice summary of the term is,

The logical data warehouse is the next significant evolution of information integration because it includes ALL of its progenitors and demands that each piece of previously proven engineering in the architecture should be used in its best and most appropriate place.  …

And

… The logical data warehouse will finally provide the information services platform for the applications of the highly competitive companies and organizations in the early 21st Century.”

The idea of this next-generation architecture is simple: When organizations put ALL of their data to work, they can make smarter decisions.

It sounds easy, but as data volumes and data types explode, so does the need for more tools in your toolbox to help make sense of it all. Within your toolbox, data is NOT all nails and you definitely need to be armed with more than a hammer.

In my view, enterprise data architectures are evolving to let organizations capture more data. The data was previously untapped because the hardware costs required to store and process the enormous amount of data was simply too big. However, the declining costs of hardware (thanks to Moore’s law) have opened the door for more data (types, volumes, etc.) and processing technologies to be successful. But no singular technology can be engineered and optimized for every dimension of analytic processing including scale, performance or concurrent workloads.

Thus, organizations are creating best-of-breed architectures by taking advantage of new technologies and workload-specific platforms such as MapReduce, Hadoop, MPP data warehouses, discovery platforms and event processing, and putting them together into, a seamless, transparent and powerful analytic environment. This modern enterprise architecture enables users to get deep business insights and allows ALL data to be available to an organization, creating competitive advantage while lowering the total system cost.

But why not just throw all your data into files and put a search engine like Google on top? Why not just build a data warehouse and extend it with support for “unstructured” data? Because, in the world of big data, the one-size-sits-all approach simply doesn’t work.

Different technologies are more efficient at solving different analytical or processing problems. To steal an analogy from Dave Schrader—a colleague of mine—it’s not unlike a hybrid car. The Toyota Prius can average 47 mpg with hybrid (gas and electric) vs. 24 mpg with a “typical” gas-only car – almost double! But you do not pay twice as much for the car.

How’d they do it? Toyota engineered a system that uses gas when I need to accelerate fast (and also to recharge the battery at the same time), electric mostly when driving around town, and braking to recharge the battery.

Three components integrated seamlessly – the driver doesn’t need to know how it works.  It is the same idea with the Teradata UDA, which is a hybrid architecture for extracting the most insights per unit of time – at least doubling your insight capabilities at reasonable cost. And, business users don’t need to know all of the gory details. Teradata builds analytic engines—much like the hybrid drive train Toyota builds— that are optimized and used in combinations with different ecosystem tools depending on customer preferences and requirements, within their overall data architecture.

In the case of the hybrid car, battery power and braking systems, which recharge the battery, are the “new innovations” combined with gas-powered engines. Similarly, there are several innovations in data management and analytics that are shaping the unified data architecture, such as discovery platforms and Hadoop. Each customer’s architecture is different depending on requirements and preferences, but the Teradata Unified Data Architecture recommends three core components that are key components in a comprehensive architecture – a data platform (often called “Data Lake”), a discovery platform and an integrated data warehouse. There are other components such as event processing, search, and streaming which can be used in data architectures, but I’ll focus on the three core areas in this blog post.

Data Lakes

In many ways, this is not unlike the operational data store we’ve seen between transactional systems and the data warehouse, but the data lake is bigger and less structured. Any file can be “dumped” in the lake with no attention to data integration or transformation. New technologies like Hadoop provide a file-based approach to capturing large amounts of data without requiring ETL in advance. This enables large-scale data processing for data refining, structuring, and exploring data prior to downstream analysis in workload-specific systems, which are used to discover new insights and then move those insights into business operations for use by hundreds of end-users and applications.

Discovery Platforms

Discovery platforms are a new workload-specific system that is optimized to perform multiple analytic techniques in a single workflow to combine SQL with statistics, MapReduce, graph, or text analysis to look at data from multiple perspectives. The goal is to ultimately provide more granular and accurate insights to users about their business. Discovery Platforms enable a faster investigative analytical process to find new patterns in data, identify different types fraud or consumer behavior that traditional data mining approaches may have missed.

Integrated Data Warehouses

With all the excitement about what’s new, companies quickly forget the value of consistent, integrated data for reuse across the enterprise. The integrated data warehouse has become a mission-critical operational system which is the point of value realization or “operationalization” for information. The data within a massively parallel data warehouse has been cleansed, and provides a consistent source of data for enterprise analytics. By integrating relevant data from across the entire organization, a couple key goals are achieved. First, they can answer the kind of sophisticated, impactful questions that require cross-functional analyses. Second, they can answer questions more completely by making relevant data available across all levels of the organization. Data lakes (Hadoop) and discovery platforms complement the data warehouse by enriching it with new data and new insights that can now be delivered to 1000’s of users and applications with consistent performance (i.e., they get the information they need quickly).

A critical part of incorporating these novel approaches to data management and analytics is putting new insights and technologies into production in reliable, secure and manageable ways for organizations.  Fundamentals of master data management, metadata, security, data lineage, integrated data and reuse all still apply!

The excitement of experimenting with new technologies is fading. More and more, our customers are asking us about ways to put the power of new systems (and the insights they provide) into large-scale operation and production. This requires unified system management and monitoring, intelligent query routing, metadata about incoming data and the transformations applied throughout the data processing and analytical process, and role-based security that respects and applies data privacy, encryption and other policies required. This is where I will spend a good bit of time on my next blog post.

Anna Littick and the Unified Data Architecture — Part 2

Posted on: October 16th, 2013 by Dan Graham 1 Comment

 

Ring ring ringtone.
Dan: “Hello. This is Dan at Teradata. How can I help you today?”

Anna: “Hi Dan. It’s Anna Littick from Sunshine-Stores calling again. Can we finish our conversation?”

Dan: “Oh yeah, hi Anna. Sure. Where did we leave off?”

Anna: “Well, you remember our new CFO – Xavier Money -- wants us to move everything to Hadoop because he thinks it’s all free. You and I were ticking through his perceptions.”

Dan: “Yes. I think got through the first two but not number 3 and 4. Here’s what I remember:
1. Hadoop replaces the data warehouse
2. Hadoop is a landing zone and archive
3. Hadoop is a database
4. Hadoop does deep analytics.”

Anna: “Yep. So how do I respond to Xavier about those two?”

Dan: “Well, I guess we should start with ‘what is a database?’ I’ll try to keep this simple. A database has these characteristics:
• High performance data access
• Robust high availability
• A data model that isolates the schema from the application
• ACID properties

There’s a lot more to a database but these are the minimums. High speed is the name of the game for databases. Data has to be restructured, indexed, with a cost-based optimizer to be fast. Hive and Impala does a little restructuring of data but are a long way off from sophisticated indexes, partitioning, and optimizers. Those things take many years – each. For example, Teradata Database has multiple kinds of indexes like join indexes, aggregate indexes, hash indexes, and sparse indexes.”

Anna: “Ouch. What about the other stuff? Does Hive or Impala have that?”

Dan: “Well, high performance isn’t interesting if the data is not available. Between planned and unplanned downtime, a database has to hit 99.99% uptime or better to be mission critical. That’s roughly 53 minutes of downtime a year. Hundreds of hardware, software, and installation features have to mature to get there. I’m guessing a well-built Hadoop cluster is around 99% uptime. But just running out of memory in an application causes the cluster to crash. There’s a lot of work to be done in Hadoop.”

“Second, isolating the application programs from the schema is opposite Hadoop’s strategic direction of schema-on-read. They don’t want fixed data types and data rules enforcement. On the upside this means Hadoop has a lot of flexibility – especially with complex data that changes a lot. On the downside, we have to trust every programmer to validate and transform every data field correctly at runtime. It’s dangerous and exciting at the same time. The schema-on-read works great with some kinds of data, but the majority of data works better with a fixed schema.”

Anna: “I’ll have to think about that one. I like the ‘no rules’ flexibility but I don’t like having to scrub the incoming data every time. I already spend too much time preparing data for predictive analytics.”

Dan: “Last is the ACID properties. It’s a complex topic you should look at on Wikipedia. It boils down to trusting the data as it’s updated. If a change is made to an account balance, ACID ensures all the changes are applied or none, that no one else can change it at the same time you do, and that the changes are 100% recoverable across any kind of failure. Imagine you and your spouse at an ATM withdrawing $500 when there’s only $600 in the account. The database can’t give both of you $500 –that’s the ACID at work. Neither Hadoop, Hive, Impala, nor any other project has plans to build the huge ACID infrastructure and become a true database. Hadoop system isn’t so good at updating data in place. ”

“According to Curt Monash ‘Developing a good DBMS requires 5-7 years and tens of millions of dollars. That’s if things go extremely well. 1’ ”

Anna: “OK, Hadoop and Hive and Impala aren’t a database. So what? Who cares what you call it?”

Dan: “Well, a lot of end users, BI tools, ETL tools, and skills are expecting Hadoop to behave like a database. That’s not fair. It was not built for that purpose. Hadoop lacks a lot of functionality not being a database but it forces Hadoop to innovate and differentiate its strengths. Let’s not forget Hadoop’s progress in basic search indexing, archival of cold data, simple reporting at scale, and image processing. We’re at the beginning of a lot of innovation and it’s exciting.”

Anna: “OK. I’ll trust you on that. What about deep analytics? That’s what I care about most.”

Dan: “So Anna, off the record, you being a data scientist and all that. Do people tease you about your name? I mean Anna Littick the data scientist? I Googled you and you’re not the only one. ”

Anna: “Yes. Some guys around here think it’s funny. Apparently childishness isn’t limited to children. So during meetings I throw words at them like Markov Chains, Neural Networks, and edges in graph partitions. They pretend to understand --they nod a lot. Those guys never tease me again. [laugh]”

Dan: “Hey, those advanced analytics you mentioned are powerful stuff. You should hear David Simmen talk at our PARTNERS conference on Sunday. He’s teaching about our new graph engine that handles millions of vertices and billions of edges. It sounds like you would enjoy it.”

Anna: “Well, it looks like have approval to go, especially since PARTNERS is here in Dallas. Enough about me. What about deep analytics in Hadoop?”

Dan: “Right. OK, well first I have to tell you we do a lot of predictive and prescriptive analytics in-database with Teradata. I suspect you’ve been using SAS algorithms in-database already. The parallelism makes a huge difference in accuracy. What you probably haven’t seen is our Aster Database where you can run map-reduce algorithms under the control of SQL for fast, iterative discovery. It can run dozens of complex analytic algorithms including map-reduce algorithms in parallel. And we just added the graph engine in our 6.0 release. I mentioned. And one thing it does that Hadoop doesn’t is you can use your BI tools, SAS procs, and map-reduce all in one SQL statement. It’s ultra cool.”

Anna: “OK. I think I’ll go to David’s session. But what about Hadoop? Can it do deep analytics?”

Dan: “Yes. Both Aster and Hadoop can run complex predictive and prescriptive analytics in parallel. They can both do statistics, random forests, Markov Chains, and all the basics like naïve Bayes and regressions. If an algorithm is hard to do in SQL, these platforms can handle it.”

Anna [impatient]: “OK. I’ll take the bait. What’s the difference between Aster and Hadoop?”

Dan: “Well, Aster has a database underneath its SQL-MapReduce so you can use the BI tools interactively. There is also a lot of emphasis on behavioral analysis so the product has things like Teradata Aster nPath time-series analysis to visualize patterns of behavior and detect many kinds of consumer churn events or fraud. Aster has more than 80 algorithms packaged with it as well as SAS support. Sorry, I had to slip that Aster commercial in. It’s in my contract --sort of. Maybe. If I had a contract.”

Anna: “ And what about Hadoop?”

Dan: “Hadoop is more of a do-it-yourself platform. There are tools like Apache Mahout2 for data mining. It doesn’t have as many algorithms as Aster so you often find yourself getting algorithms from University research or GitHub and implementing them yourself. Some Teradata customers have implemented Markov Chains on Hadoop because it’s much easier to work with than SQL for that kind of algorithm. . So data scientists have more tools than ever with Teradata in-database algorithms, Aster SQL-MapReduce, SAS, and Hadoop/Mahout and others. That’s what our Unified Data Architecture does for you – it matches workloads to the best platform for that task.”

Anna: “OK. I think I’ve got enough information to help our new CFO. He may not like me bursting his ‘free-free-free’ monastic chant. But just because we can eliminate some initial software costs doesn't mean we will save any money. I’ve got to get him thinking of the big picture for big data. You called it UDA, right?”

Dan: “Right. Anna, I’m glad I could help, if only just a little. And I’ll send you a list of sessions at Teradata PARTNERS where you can hear from experts about their Hadoop implementations – and Aster. See you at PARTNERS.”

Title

Company

Day

Time

Comment

Aster Analytics: Delivering results with R Desktop

Teradata

Sun

9:30

RevolutionR

Do’s and Don’ts of using Hadoop in practice

Otto

Sun

1:00

Hadoop

Graph Analysis with Teradata Aster Discovery Platform

Teradata

Sun

2:30

Graph

Hadoop and the Data Warehouse: When to use Which

Teradata

Sun

4:00

Hadoop

The Voices of Experience: A Big Data Panel of Experts

Otto, Wells Fargo

Wed

9:30

Hadoop

An Integrated Approach to Big Data Analytics using Teradata and Hadoop

PayPal

Wed

11:00

Hadoop

TCOD: A Framework for the Total Cost of Big Data

WinterCorp

Wed

11:00

Costs

 1 Curt Monash, DBMS development and other subjects, March 18, 2013

Anna Littick and the Unified Data Architecture — Part 1

Posted on: October 8th, 2013 by Dan Graham No Comments

 

Ring ring ringtone.
Dan: “Hello. This is Dan at Teradata. How can I help you today?”

Anna: “Hi Dan. I’m Anna Littick at Sunshine-Stores in Dallas. I believe we swapped some emails and you said I should call you.”

Dan: “Oh yeah. You’re the data scientist Ph.D that said you were a little confused by all the Hadoop-la. Yeah, I remember. Anyway, how can I help?”

Anna: “Well, a new CFO is running our IT division. He keeps saying Hadoop is free and wants to move everything to Hadoop. To me it seems risky.”

Dan: “Yes, we’ve seen this happen elsewhere. The CIO cracks under budget pressure when some evangelist claims he can do everything for free with Hadoop. Hadoop fever always runs its course until reality sets in again after several months.”
Anna: “Well, I guess we have the fever. If you remember my email, the CFO is causing internal debates that never seem to end. Let me list our points of debate again quickly:
1. Hadoop replaces the data warehouse
2. Hadoop is a landing zone and archive
3. Hadoop is a database
4. Hadoop does deep analytics.”

Dan: “OK, let’s take Hadoop replaces the data warehouse. You know the old adage ‘If it sounds too good to be true, then it probably is.’ Well, the biggest data warehouse strengths are managing multiple subject areas that are fully integrated. Subject-areas mean sales, inventory, customers, financials, and so on. Every one of these subjects has dozens – even 100s -- of sophisticated data tables. Subject areas are defined in a data model so they can be integrated and consistent. For example, data formats and values are standardized – like account types, country names, postal codes, or gender. We can’t tolerate invalid data in those fields. It also means throwing away duplicates, checking date formats, and ensuring valid relationships between historical events. Hadoop might hold onto all the data, but it’s not organized, cleansed, and tightly integrated into subject areas. Hadoop doesn’t do any of that – it’s all do-it-yourself programming. Check out Gartner’s definition to see that Hadoop is not a data warehouse. (1)  Wikipedia has the same definitions under top-down design as well.”

Anna: “Interesting. Like everyone else, I just took that for granted. Of course I’m a programmer and I never make mistakes. [snicker] But if I tell the CFO that, he’ll ignore me. Give me some upside, some things Hadoop does well so he will take me seriously.”
Dan: “Well, let’s start with the most obvious. When I first talked to Amr Awadallah, CTO at Cloudera, he told me ‘Hadoop’s biggest differentiators come from capturing massive amounts of raw data and querying that data for 5-10 years – and all that at a low cost.’ So Hadoop is both a landing zone and an archive for files. Hadoop can manage a few million files on huge, low cost, hard disk drives. With a little effort, Hadoop and Hive can query data that’s kept for 7, 8, even 10 years. Tape backups can do that but tape is sloooowww. Imagine getting a regulatory request from the Texas governor saying ‘Show us all your hiring by ethnicity, income, promotions, and raises going back to 2005.’ Most DBAs won’t keep data in the data warehouse that’s more than 5-7 years old because of costs. Hadoop provides a low cost archive and basic query capabilities.”
Anna: “Cool. It sounds like Hadoop would be a good place for ETL processing, right?”
Dan: “That’s a tricky question. A lot of companies are stampeding towards Hadoop as an ETL tool. Yet Gartner clearly states that Hadoop lacks the functions of an ETL engine (2).   At Teradata we have seen some great Hadoop ETL successes and some failures as well. I believe vendors like Informatica and IBM DataStage will do more data integration projects with Hadoop. They have the MDM, data lineage, metadata, and oodles of transformers. Hadoop mostly has do-it-yourself programming. I’m guessing the ETL vendors will have integrated so well with Hadoop in a few years and you will usually use them together.”
Anna: “OK, so we need to keep our ETL and data warehouse, then add Hadoop where it has strengths.”
Dan: “Agreed. That’s what we have seen the visionary customers and fast followers doing. Those customers have been asking us to make Teradata products and Hadoop work well together. This is driving us to invest a ton of money into what we call the Teradata Unified Data Architecture (UDA). UDA is hardware and software platforms optimized for specific workloads plus data flow between them for an ideal, best-of-breed analytic workplace.”
Anna: “Looks like it’s time for me to have a heart-to-heart chat with our new CFO. His name is Xavier Money. Isn’t that hilarious?”
Dan: “Oh yeah. Two ironies in one day.”

Anna: “What?”

Dan: “Oh nothing, just thinking of something else. How about I send you an email about our PARTNERS conference where you can hear these topics directly from Teradata customers like yourself? Real customer stories of hurdles and results are invaluable. Pay extra attention to the WinterCorp session on Big Data Cost of Ownership – your CFO will want to hear that one.”

Anna: “Thanks, I’ve got to run. Maybe we can finish up our chat in a couple days. I’ll call you.” 

1 Gartner, Of Data Warehouses, Operational Data Stores, Data Marts and Data Outhouses, Dec 2005
2 Gartner, Hadoop Is Not a Data Integration Solution, January 2013

 

About one year ago, Teradata Aster launched a powerful new way of integrating a database with Hadoop. With Aster SQL-H™, users of the Teradata Aster Discovery Platform got the ability to issue SQL and SQL-MapReduce® queries directly on Hadoop data as if that data had been in Aster all along. This level of simplicity and performance was unprecedented, and it enabled BI & SQL analysts that knew nothing about Hadoop to access Hadoop data and discover new information through Teradata Aster.

This innovation was not a one-off. Teradata has put forward the most complete vision for a data and analytics architecture in the 21st century. We call that the Unified Data Architecture™. The UDA combines Teradata, Teradata Aster & Hadoop into a best-of-breed, tightly integrated ecosystem of workload-specific platforms that provide customers the most powerful and cost-effective environment for their analytical needs. With Aster SQL-H™, Teradata provided a level of software integration between Aster & Hadoop that was, and still is, unchallenged in the industry.

 

Teradata Unified Data Architecture™ image

Teradata Unified Data Architecture™

Today, Teradata makes another leap in making its Unified Data Architecture™ vision a reality. We are announcing SQL-H™ for Teradata, bringing the best SQL engine for data warehousing and analytics to Hadoop. From now on, Enterprises that use Hadoop to store large amounts of data will be able to utilize Teradata's analytics and data warehousing capabilities to directly query Hadoop data securely through ANSI standard SQL and BI tools by leveraging the open source Hortonworks HCatalog project. This is fundamentally the best and tightest integration between a data warehouse engine and Hadoop that exists in the market today. Let me explain why.

It is interesting to consider Teradata's approach versus alternatives. If one wants to execute SQL on Hadoop, with the intent of building Data Warehouses out of Hadoop data, there are not many realistic options. Most databases have a very poor integration with Hadoop, and require Hadoop experts to manage the overall system - not a viable option for most Enterprises due to cost. SQL-H™ removes this requirement for Teradata/Hadoop deployments. Another "option" are the SQL-on-Hadoop tools that have started to emerge; but unfortunately, there are about a decade away from becoming sufficiently mature to handle true Data Warehousing workloads. Finally, the approach of taking a database and shoving it inside Hadoop has significant issues since it suffers from the worst of both worlds – Hadoop activity has to be limited so that it doesn't disrupt the database, data is duplicated between HDFS and the database store, and performance of the database is less compared to a stand–alone version.

In contrast, a Teradata/Hadoop deployment with SQL-H™ offers the best of both worlds: unprecedented performance and reliability in the Teradata layer; seamless BI & SQL access to Hadoop data via SQL-H™; and it frees up Hadoop to perform data processing tasks at full efficiency.

Teradata is committed to being the strategic advisor of the Enterprise when it comes to Data Warehousing and Big Data. Through its Unified Data Architecture™ and today's announcement on Teradata SQL-H™, it provides even more performance, flexibility and cost-effective options to Enterprises eager to use data as a competitive advantage.

Teradata and Informatica Data Integration Optimization

Posted on: March 6th, 2013 by Sam Tawfik No Comments

 

Teradata and Informatica created a joint offering to optimize data integration processes by leveraging the right platform for the right transformation job. The offering is based on the powerful Teradata analytical infrastructure called the Teradata Unified Data Architecture (UDA) (which includes Teradata, Teradata Aster, and Hadoop) with the combination of the Informatica PowerCenter Big Data Edition.

Today, customers are creating new processes to manage and integrate Big Data for deeper and richer analytics. Customers are also evaluating options to optimize their data integration processes for faster deployment and scalable performance at reduced costs. And because we are talking about Big Data; Hadoop is being increasingly considered for data integration processing.

But as is always the case when deploying or evaluating new technologies it is important to understand the current business and technical challenges in order to ensure the deployment of the appropriate solution. Some of the business challenges include the increasing cost of data processing as data volumes grow and new types of data emerge and the risk of deploying the wrong platform. Technical challenges related to big data analytics include finding the right skills, the complexity of deploying new technologies into production data centers, the need for more real-time data, and the lack of enterprise capabilities such as security.

The new Teradata and Informatica offering helps customers optimize their data integration processes and deploy a single Unified Data Architecture. The new offering allows companies to design data integration processes once and deploy on any Teradata platform with up to 5x productivity gains using the resource skills they have today. In this architecture customers can process data from the traditional transaction systems as well as unstructured text and interaction data such as from social media and web logs. Customers may choose to run data integration processes on Teradata Platforms, Hadoop, or Informatica servers.

Teradata Unified Architecture with Data Integration Optimization

The key benefits for deploying the Teradata Unified Data Architecture with the Informatica PowerCenter Big Data Edition is to increase productivity up to 5x over hand-coding and up to half the labor cost, minimize the risk of Big Data projects and protect against poor and sub-optimal decisions. The offering allows the staff to focus on meeting the business requirements and not on hand-coding transformation logic so customers can innovate faster and generate new revenue streams.

Teradata and Informatica provide service assessments offerings to help customers get started quickly and make smart decisions about data integration optimizations reducing the risk of offloading the wrong work onto the wrong platform while reducing labor costs and delivering faster performance.

See the joint Informatica and Teradata press release link

Sam Tawfik