Learning Library

← Back to Library

Enterprise Data Warehouse Overview

Key Points

  • Luv Aggarwal (IBM Data Platform Solution Engineer) explains that an enterprise data warehouse (EDW) is a purpose‑specific, organized collection of clean business data, distinct from a data lake’s raw dump and a data mart’s domain‑specific subset.
  • The EDW serves as the organization’s single source of truth, ingesting diverse raw data from transactional systems, relational databases, CRMs, ERPs, supply‑chain feeds, etc., and converting it into high‑quality, analytics‑ready data via ETL processes.
  • Once loaded, the warehouse enables business analysts, data scientists, and data engineers to perform reporting, BI, predictive analytics, and machine‑learning using built‑in tools or external platforms.
  • IBM highlights three primary deployment models for data warehouses—on‑premise, cloud‑based, and hybrid—each offering different trade‑offs for scalability, control, and cost.

Full Transcript

# Enterprise Data Warehouse Overview **Source:** [https://www.youtube.com/watch?v=k4tK2ttdSDg](https://www.youtube.com/watch?v=k4tK2ttdSDg) **Duration:** 00:08:20 ## Summary - Luv Aggarwal (IBM Data Platform Solution Engineer) explains that an enterprise data warehouse (EDW) is a purpose‑specific, organized collection of clean business data, distinct from a data lake’s raw dump and a data mart’s domain‑specific subset. - The EDW serves as the organization’s single source of truth, ingesting diverse raw data from transactional systems, relational databases, CRMs, ERPs, supply‑chain feeds, etc., and converting it into high‑quality, analytics‑ready data via ETL processes. - Once loaded, the warehouse enables business analysts, data scientists, and data engineers to perform reporting, BI, predictive analytics, and machine‑learning using built‑in tools or external platforms. - IBM highlights three primary deployment models for data warehouses—on‑premise, cloud‑based, and hybrid—each offering different trade‑offs for scalability, control, and cost. ## Sections - [00:00:00](https://www.youtube.com/watch?v=k4tK2ttdSDg&t=0s) **Enterprise Data Warehouse Basics** - Luv Aggarwal explains the distinction between data lakes, warehouses, and marts, emphasizing that a data warehouse is a purpose‑specific, organized collection serving as an organization’s single source of truth. - [00:03:32](https://www.youtube.com/watch?v=k4tK2ttdSDg&t=212s) **On-Premise Data Warehouse Options** - The speaker outlines three on‑premises deployment styles—commodity hardware using MPP or SMP, and purpose‑built appliances—highlighting their architectures, benefits like control and performance, and the upfront cost trade‑off. - [00:06:43](https://www.youtube.com/watch?v=k4tK2ttdSDg&t=403s) **Hybrid On-Prem and Cloud Data Warehousing** - The segment explains how combining on‑premises and cloud data warehouses lets enterprises leverage cloud‑born data, support disaster‑recovery, and maintain mission‑critical workloads. ## Full Transcript
0:00Hey, what's up, everyone? My  name is Luv Aggarwal and I'm 0:03a Data Platform Solution Engineer for IBM. 0:06Data warehouses. Their prevalence across  enterprises has grown significantly 0:10over the past 20+ years. But with  multiple modern advancements, 0:15the numerous options out there  are now much more complex. 0:19So, let's talk about what an enterprise data  warehouse, or "EDW", is. So, first and foremost, 0:25there's often confusion between "data lakes"  and "data warehouses" and even "data marts". 0:46So, I like to think of a data warehouse as being  more purpose-specific than a data lake. So, 0:52while a data lake is a great place to dump all  sorts of raw, structured and unstructured data 0:57in a quick way to clean and organize later, a  data warehouse, on the other hand, is a large 1:02collection of organized and clean business data,  ready to help an organization make decisions. 1:09And a data mart is like a subset of a  data warehouse that's more specific to a 1:14particular business domain. So, for example,  you could have a finance data mart. 1:19But for today, let's focus on the data warehouse. 1:22So, we'll get rid of data lakes and data marts,  and we'll make this a little bit bigger. 1:22But for today, we'll focus on the data warehouse.  So, let's get rid of data lakes and data marts, 1:24and make our data warehouse  a little bit bigger. 1:27So, the data warehouse serves as the single source  of truth for an organization across multiple 1:32knowledge domains. And data in the warehouse  comes from multiple different source systems. 1:43And is transformed from raw  data to high quality data, 1:48optimized for analytics via various different  ETL, or "Extract, Transform and Load" tools. 1:58So, as I mentioned, data that's  in our source systems can be in 2:04different types. It could be transactional  systems, it can be relational databases, 2:08and they can cover a wide  variety of business domains. 2:12So, the data could cover things like customer  data from our CRMs. We could have sales data. 2:22We could have data from our ERP systems.  We could even have supply chain data. 2:30And the list goes on and on. Right. 2:34So, once data has been cleaned, transformed and 2:38loaded into our data warehouse, it's  now ready for us to expose to our users, 2:45who can then start to take it and do analytics  and machine learning on these data sets. 2:52So, who are our users? Our users can be folks  like business analysts. We can have data 3:03scientists. We could even have data engineers. And  these folks can now start leveraging these data 3:16sets, either using the built-in analytics tools in  the data warehouse or using a variety of different 3:25business intelligence or predictive  analytics and machine learning platforms. 3:34OK, so now that we know what an  enterprise data warehouse is, 3:38let's talk about the different ways  in which it can be implemented. 3:42So, three common ways in which a  data warehouse can be deployed. 3:46The first way is on-premises. Now,  a couple different ways in which an 3:52on-prem data warehouse can be configured,  we could have our data warehouse running on 3:59commodity hardware. Now, this could be set up  and structured using either MPP, or "Massively 4:08Parallel Processing", architecture where we just  add more compute nodes as our workload grows, 4:15or using SMP, or "Symmetric Multi-Processing",  architecture where, typically, we have a 4:23tightly coupled, multi-CPU system that shares  resources from one common operating system. 4:30Now, the other way is through a  purpose-built appliance format. 4:38Now, this is typically an integrated  stack of CPU, memory storage software, 4:46all purpose-built and optimized for a data  warehouse workload from a single vendor. 4:51So, what are some of the benefits  of having an on-prem data warehouse? 4:56Well, first you get to maintain complete  control over the entire tech stack, right? 5:03Second, you can leverage your local network  speeds and perhaps avoid some bandwidth challenges 5:11typically associated with the cloud. You can also  leverage high availability, and we can maintain 5:20strict governance and regulatory compliance, but  on the other hand, an on-prem data warehouse does 5:27come with an upfront investment and the  need for ongoing support and maintenance. 5:33Now, the other way in which a  data warehouse can be deployed 5:36is through a cloud-based data warehouse,  where our data warehouse is delivered as 5:43a managed to SaaS offering via the  multiple public cloud providers. 5:50So, moving data warehouses to the cloud is  the next frontier for a lot of enterprises 5:56and for valid reasons. Some of the benefits  include being able to free up resources 6:03to focus on other high value analytics tasks,  right, instead of just managing systems. 6:10Another benefit can also be the  ability to scale easily. Right, 6:15because we don't have to go  out and procure new hardware 6:19and we get to leverage automatic upgrades. Right.  Now, on the other hand, oftentimes a cloud-based 6:31data warehouse can take a performance hit due to  how it's fine tuned for that specific workload, 6:37and there can be some unanticipated high costs  due to how cloud data warehouse is scaled. 6:44OK, the third option is actually a hybrid  approach. So, this takes the best of on-prem 6:54and cloud and brings them together. And a lot  of enterprises choose to run both their on-prem 6:59and cloud data warehouses in conjunction. And this  can be done for a couple of different reasons. 7:05So, one benefit can be that this allows us to  explore new use-cases. Right. So as an enterprise, 7:13we may have certain data sources that  were born in the cloud. So, it can be 7:18beneficial to start leveraging a cloud data  warehouse for analytics against those use-cases 7:24while still maintaining their mission  critical workloads on-prem. 7:30Another benefit can be for a disaster  recovery and backup scenario. 7:38This is where we would use both our environments  in conjunction for DR and backup reasons. 7:44So, if we take a step back, we can see that  we've barely started to scratch the surface of 7:49enterprise data warehouses and how they fit into  an overall enterprise architecture. But I hope 7:55this video has given us a good idea of how data  warehouses fit in and what they're used for. Thank 8:02you. If you have any questions, please drop us a  line below. If you want to see more videos like 8:08this in the future, please like and subscribe. And  don't forget, if you want to learn more about any 8:13of the IBM data solutions we've discussed today,  please feel free to check out the link below.