Solutions around today's analytic ecosystem are too technically driven without focusing on business values. The buzzwords seem to over-compensate the reality of implementation and cost of ownership. I challenge you to view your analytic architecture using pluralism and secularity. Without such a view of this world your resume will fill out nicely but your business values will suffer.
In my previous role, prior to joining Teradata, I was given the task of trying to move "all" of our organization’s BI data to Hadoop. I will share my approach - how best-in-class solutions come naturally when pluralism and secularity are used to support a business-first environment.
Big data has exposed some great insights into what we can, should, and need to do with our data. However, this space is filled with radical opinions and the pressure to "draw a line in the sand" between time-proven methodologies and what we know as "big data." Some may view these spaces moving in opposite directions; however, these spaces will collide. The question is not "if" but "when." What are we doing now to prepare for this inevitability? Hadapt seems to be moving in the right direction in terms of leadership between the two spaces.
I found many of the data sets in relational databases to be lacking in structure, highly transient, and loosely coupled. Data scientists needed to have quick access to data sets to perform their hypothesis testing.
Continuously requesting IT to rerun their ETL processes was highly inefficient. A data scientist once asked me "Why can't we just dump the data in a Linux mount for exploration?" Schema-on-write was too restrictive as the data scientists could not predefine the attributes for the data set for ingestion. As the data sets became more complex and unstructured, the ETL processes became exponentially more complicated and performance was hindered.
I also found during this exercise that my traditional BI analysts were perplexed with formulating questions about the data. One of the reasons was that businesses did not know what questions to ask. This is a common challenge in the big data ecosystem. We are used to knowing our data and being able to come up with incredible questions about it. The BI analyst's world has been disrupted as they now need to ask "What insights/answers do I have about my data?" – (according to IIya Katsov in one of his blogs).
The product owner of Hadoop was convinced that the entire dataset should be hosted on Amazon Web Services (S3) which would allow our analytics (via Elastiv Map Reduce) to perform at incredible speeds. However, due to various ISO guidelines, the data sets had to be encrypted at rest and in transit which degraded performance by approximately 30 percent.
Without an access path model, logical model, or unified model, business users and data scientists were left with little appetite for unified analytics. Data scientists were on their own guidelines for integrated/ federated/governed/liberated post-discovery analytical sets.
Communication with the rest of the organization became an unattainable goal. The models which came out of discovery were not federated across the organization as there was a disconnect between the data scientists, data architects, Hadoop engineers, and data stewards -- who spoke different languages. Data scientists were creating amazing predictive models and at the same time data stewards were looking for tools to help them provide insight in prediction for the SAME DATA.
Using NoSQL for a specific question on a dataset required a new collection set. To maintain and govern the numerous collections became a burden. There had to be a better way to answer many questions without having a linear relationship to the number of collections instantiated. The answer may be within access path modeling.
Another challenge I faced was when users wanted a graphical representation of the data and the embedded relationships or lack thereof. Are they asking for a data model? The users would immediately say no, since they read in a blog somewhere that data modeling is not required using NoSQL technology.
At the end of this entire implementation I found myself needing to integrate these various platforms for the sake of providing a business-first solution. Maybe the line in the sand isn't a business-first approach? Those that drive Pluralism (a condition or system in which two or more states, groups, principles, sources of authority, etc., coexist) and Secularity (not being devoted to a specific technology or data 'religion') within their analytic ecosystem -- can truly deliver a business-first solution approach while avoiding the proverbial "silver bullet" architecture solutions.
In my coming post, I will share some of the practices for access path modeling within Big Data and how it supports pluralism and secularity within a business-first analytic ecosystem.
Sunile Manjee is a Product Manager in Teradata’s Architecture and Modeling Solutions team. Big Data solutions are his specialty, along with the architecture to support a unified data vision. He has over 12 years of IT experience as a Big Data architect, DW architect, application architect, IT team lead, and 3gl/4gl programmer.