This is a guest post with exclusive content by Bill Inmon. Bill Inmon “is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, first magazine column, held the first conference, and was the first to offer classes in data warehousing.” — Wikipedia.
Our critical points:
- Data warehousing requires data integration
- Integration is complex, risky, hard to do, imprecise, and requires research
- A data warehouse's major value is having a foundation of integrated data
Data warehouses are the whack-a-mole of technology. Like the carnival where the mole sticks its head up out of a random hole and you take a whack at it, data warehousing just keeps popping up — sometimes, in the unlikeliest of places.
This whack-a-mole act of data warehousing is especially impressive because there is no vendor nor any organization behind data warehousing. Data warehouses are supported solely and only by the end user. There is no committee, no company, no organization that sits around and makes decisions about any data warehouse. The data warehouse has a life of its own.
So, who tried to kill the data warehouse? Who kept taking swings at the ever-reappearing mole that kept randomly popping out of the hole?
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Gunning for Data Warehouses
What’s the problem with the data warehouse? Why did people want to kill the data warehouse? Well, there were a lot of issues. But the primary issue is that people dread data integration. Data warehousing requires data integration. Integration is complex, risky, hard to do, imprecise, and requires research. Integrating data requires using your brain and using elbow grease. Vendors and most IT professionals just hate doing that.
Corporations had huge silos of information that couldn't communicate. These silos were an impediment to analytical processing across the enterprise. The only way to break these silos apart was to integrate the data found in them and place the integrated data into a data warehouse. There simply was no other way.
No ifs, ands, or buts.
But vendors and most IT professionals just didn't have the backbone or the intellect to integrate the siloed data. So, the silos remained, and extracting enterprise-level data continued to be an elusive, unreachable goal.
Vendors would rather walk across a bed of fiery red-hot coals barefoot than go back and integrate data. The problem is that the major value of a data warehouse is in having a foundation of integrated data.
A Murder of Data
There have been several major attempts at exterminating and/or bypassing a data warehouse:
-
Dimensional modeling and star joins. Ralph Kimball introduced the idea of a data mart. Ralph stated that you can just build a data mart directly from an application. There was no need for one of those messy, hard-to-build data warehouses using the Kimball approach.
-
ETL (Extract, Transform, Load) changed to ELT (Extract, Load, Transform). The vendors of the world gave us ELT. ELT was a descendant of ETL. The problem with ELT was that you did the E and you did the L, and conveniently forgot to do the T. In doing so you just copied data from one place to the next. There was no need for a data warehouse with ELT.
-
Big Data. Big Data came along and proclaimed that with Big Data you didn’t need a data warehouse. Large mainframe vendors, Cloudera et al, said that with Big Data there was no need for a data warehouse. You could just conveniently store your data in Big Data and that was it.
-
Data lake. Data lakes came along and proclaimed that all you needed was a data lake. There was no need to go through all that creepy and complex stuff you have to do with a data warehouse. Just dump all your data in a data lake and that was it.
-
Data mesh/mash. Data mesh/data mash came along and said all you needed to do was have some fancy connections of data and, when you did that, there was no need for a data warehouse
-
Data scientists disdained data warehouses. Data scientists learned all these statistical algorithms in school, but when they got into the real world, they spent 95% of their time wrestling with data. But the data scientists thought that data warehouses were beneath them.
-
Squeezing data into a data warehouse. A data warehouse was just a bunch of data squeezed together. But no one wanted to get their hands dirty squeezing that data.
Some of these efforts were well funded and well advertised. Other efforts were merely casual. But all of them failed to kill the data warehouse.
In fact, some of these efforts actually added to and bolstered data warehouse architecture.
The Resuscitation of Data Warehouses
People began to realize — data warehouses wouldn't die. Data warehousing was not dead. In fact, people found that adding data marts to a data warehouse was a very good thing to do. Data marts allowed you to customize data and, at the same time, ensure the data's integrity. So, Ralph Kimball’s contribution of data marts and the dimensional model unintentionally added to a data warehouse's value.
Big Data added a dimension of scalability for data warehouses that had not existed before. The data in the data warehouse with a low probability of access fit quite conveniently in Big Data. The people that promulgated Big Data never saw it that way, but that was a positive consequence of Big Data — one more unintentional value-add for data warehouses.
The people who pushed for data lakes inadvertently pushed new kinds of data into the data warehouse. With a data lake, analog, IoT, and textual data all found their way into a data warehouse.
The people who tried to kill data warehouses ended up unintentionally expanding the capabilities and usefulness of data warehouses.
So, here lies the data warehouse — RIP.
Resilient Information Processing, not Rest In Peace.
The data warehouse lives on despite the best efforts to kill or ignore it.
With Integrate.io, you get the best of all the worlds — a new ETL platform with blazing fast CDC, reverse ETL, and deep Ecommerce capabilities. Schedule an intro call to learn more.
Bill Inmon, the father of the data warehouse, has authored 65 books and was named by Computerworld as one of the ten most influential people in the history of computing. Bill's company, Forest Rim Technology, is a Castle Rock, Colorado company. Bill Inmon and Forest Rim Technology provide a service to companies by helping them hear their customers' voices. See more at www.forestrimtech.com