June, 2014

MongoDB and Teradata QueryGrid – Even Better Together

Posted on: June 19th, 2014 by Dan Graham 3 Comments

 

It wasn’t so long ago that NoSQL products were considered competitors with relational databases (RDBMS). Well, for some workloads they still are. But Teradata is an analytic RDBMS which is quite different and complementary to MongoDB. Hence, we are teaming up for the benefit of mutual customers.

The collaboration of MongoDB with Teradata represents a virtuous cycle, a symbiotic exchange of value. This virtuous cycle starts when data is exported from MongoDB to Teradata’s Data Warehouse where it is analyzed and enriched, then sent back to MongoDB to be exploited further. Let me give an example.

An eCommerce retailer builds a website to sell clothing, toys, etc. They use MongoDB because of the flexibility to manage constantly changing web pages, product offers, and marketing campaigns. This front office application exports JSON data to the back-office data warehouse throughout the business day. Automated processes analyze the data and enrich it, calculating next best offers, buyer propensities, consumer profitability scores, inventory depletions, dynamic discounts, and fraud detection. Managers and data scientists also sift through sales results looking for trends and opportunities using dashboards, predictive analytics, visualization, and OLAP. Throughout the day, the data warehouse sends analysis results back to MongoDB where they are used to enhance the visitor experience and improve sales. Then we do it again. It’s a cycle with positive benefits for the front and back office.

Teradata Data Warehouses have been used in this scenario many times with telecommunications, banks, retailers, and other companies. But several things are different working with MongoDB in this scenario. First, MongoDB uses JSON data. This is crucial to frequently changing data formats where new fields are added on a daily basis. Historically, RDBMS’s did not support semi-structured JSON data. Furthermore, the process of changing a database schema to support frequently changing JSON formats took weeks to get through governance committees.

Nowadays, the Teradata Data Warehouse ingests native JSON and accesses it through simple SQL commands. Furthermore, once a field in a table is defined as JSON, the frequently changing JSON structures flow right into the data warehouse without spending weeks in governance committees. Cool! This is a necessary big step forward for the data warehouse. Teradata Data Warehouses can ingest and analyze JSON data easily using any BI tool or ETL tool our customers prefer.

Another difference is that MongoDB is a scale-out system, growing to tens or hundreds of server nodes in a cluster. Hmmm. Teradata systems are also scale-out systems. So how would you exchange data between Teradata Data Warehouse server nodes and MongoDB server nodes? The simple answer is to export JSON to flat files and import them to the other system. Mutual customers are already doing this. Can we do better than import/export? Can we add an interactive dynamic data exchange? Yes, and this is the near term goal of our partnership --connecting Teradata QueryGrid to MongoDB clusters.

Teradata QueryGrid and Mongo DB

Teradata QueryGrid is a capability in the data warehouse that allows a business user to issue requests via popular business intelligence tools such as SAS®, Tableau®, or MicroStrategy®. The user issues a query which runs inside the Teradata Data Warehouse. This query reaches across the network to the MongoDB cluster. JSON data is brought back, joined to relational tables, sorted, summarized, analyzed, and displayed to the business user. All of this is done exceptionally fast and completely invisible to the business user. It’s easy! We like easy.

QueryGrid can also be bi-directional, putting the results of an analysis back into the MongoDB server nodes. The two companies are working on hooking up Teradata QueryGrid right now and we expect to have the solution early in 2015.

The business benefit of connecting Teradata QueryGrid to MongoDB is that data can be exchanged in near real time. That is, a business user can run a query that exchanges data with MongoDB in seconds (or a few minutes if the data volume is huge). This means new promotions and pricing can be deployed from the data warehouse to MongoDB with a few mouse clicks. It means Marketing people can analyze consumer behavior on the retail website throughout the day, making adjustments to increase sales minutes later. And of course, applications with mobile phones, sensors, banking, telecommunications, healthcare and others will get value from this partnership too.

So why does the leading NoSQL vendor partner with the best in class analytic RDBMS? Because they are highly complementary solutions that together provide a virtuous cycle of value to each other. MongoDB and Teradata are already working together well in some sites. And soon we will do even better.

Come visit our Booth at MongoDB World and attend the session “The Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse” Riverside Suite, 3:10 p.m., June 24. You can read more about the partnership between Teradata and MongoDB in this news release issued earlier today. Also, check out the MongoDB blog.

PS: The MongoDB people have been outstanding to work with on all levels. Kudos to Edouard, Max, Sandeep, Rebecca, and others. Great people!

 

It happens every few years and it’s happening again. A new technology comes along and a significant segment of the IT and business community want to toss out everything we’ve learned over the past 60 years and start fresh. We “discover” that we’ve been wasting time applying unnecessary rigor and bureaucracy to our projects. No longer should we have to wait three to six months or longer to deliver technical solutions to the business. We can turn these things around in three to six days or even less.

In the mid 1990’s, I was part of a team that developed a “pilot” object-oriented, client-server (remember when these were the hot buzzwords?) application to replenish raw materials for a manufacturing function. We were upending the traditional mainframe world by delivering a solution quickly and iteratively with a small team. When the end users started using the application in real life, it was clear they were going to rely on it to do their jobs every day. Wait, was this a pilot or…? I would come into work in the morning, walk into a special room that housed the application and database servers, check the logs, note any errors, make whatever fixes needed to be made, re-run jobs, and so on.

It wasn’t long before this work began to interfere with my next project, and the end users became frustrated when I wasn’t available to fix problems quickly. It took us a while and several conversations with operations to determine that “production” didn’t just mean “the mainframe”. “Production” meant that people were relying on the solution on a regular basis to do their jobs. So we backtracked and started talking about what kind of availability guarantees we could make, how backup and recovery should work, how we could transition monitoring and maintenance to operations, and so on. In other words, we realized what we needed was a traditional IT project that just happened to leverage newer technologies.

This same scenario is happening today with Hadoop and related tools. When I visit client organizations, a frightening number will have at least one serious person saying something like, “I really don’t think ‘data warehousing’ makes sense any more. It takes too long. We should put all our data in Hadoop and let our end users access whatever they want.” It is indeed a great idea to establish an environment that enables exploration and quick-turnaround analysis against raw data and production data. But to position this approach as a core data and analytics strategy is nothing short of professional malpractice.

The problem is that people are confusing experimentation with IT projects. There is a place for both, and there always has been. Experimentation (or discovery, research, ad-hoc analysis, or whatever term you wish to use) should have lightweight processes and data management practices – it requires prioritization of analysis activity, security and privacy policies and implementation, some understanding of available data, and so on, but it should not be overburdened with the typical rigor required of projects that are building solutions destined for production. Once a prototype is ready to be used on a regular basis for important business functions, that solution should be built through a rigorous IT project leveraging an appropriate – dare I say it – solution development life cycle (SDLC), along with a comprehensive enterprise architecture plan including, yes, a data warehouse that provides integrated, shared, and trusted production data.

An experimental prototype should never be “promoted” to a production environment. That’s what a project is for. Experimentation can be accomplished with Hadoop, relational technology, Microsoft Office, and many other technologies. These same technologies can also be used for production solutions. So, it’s not that “things are done differently and more quickly in Hadoop”. Instead, it’s more appropriate to say that experimentation is different than an IT project, regardless of technology.

Yes, we should do everything we can to reduce unnecessary paperwork and to speed up delivery using proper objective setting, scoping, and agile development techniques. But that is different than abandoning rigor altogether. In fact, using newer technologies in IT projects requires more attention to detail, not less, because we have to take the maturity of the technology into consideration. Can it meet the service level needs of a particular solution? This needs to be asked and examined formally within the project.

Attempting to build production solutions using ad-hoc, experimental data preparation and analysis techniques is like building a modern skyscraper with a grass hut mentality. It just doesn’t make any sense.

Guest Blogger Kevin Lewis is responsible for Teradata’s Strategy and Governance practice. Prior to joining Teradata in 2007, he was responsible for initiating and leading enterprise data management at Publix Super Markets. Since joining Teradata, he has advised dozens of clients in all major industries.