Learning Library

← Back to Library

Data Products Explained with Grocery Analogy

Key Points

  • Organizations are overwhelmed by data silos, limited access, low data literacy, and trust concerns, which hinder timely, reliable insights for AI and analytics.
  • A data product is a curated bundle of multiple data assets designed to be easily discovered and consumed, similar to a grocery item composed of several ingredients.
  • Core attributes of a data product include multi‑asset composition, reusability across varied use‑cases, and a clearly defined domain (e.g., sales, HR, operations) to aid discoverability.
  • Data products reside in a centralized marketplace or catalog, enabling users to find, access, and trust the right data in a format that’s ready for analysis or model training.

Full Transcript

# Data Products Explained with Grocery Analogy **Source:** [https://www.youtube.com/watch?v=7w7_QWPS9L8](https://www.youtube.com/watch?v=7w7_QWPS9L8) **Duration:** 00:05:54 ## Summary - Organizations are overwhelmed by data silos, limited access, low data literacy, and trust concerns, which hinder timely, reliable insights for AI and analytics. - A data product is a curated bundle of multiple data assets designed to be easily discovered and consumed, similar to a grocery item composed of several ingredients. - Core attributes of a data product include multi‑asset composition, reusability across varied use‑cases, and a clearly defined domain (e.g., sales, HR, operations) to aid discoverability. - Data products reside in a centralized marketplace or catalog, enabling users to find, access, and trust the right data in a format that’s ready for analysis or model training. ## Sections - [00:00:00](https://www.youtube.com/watch?v=7w7_QWPS9L8&t=0s) **Understanding Data Products and Their Challenges** - The speaker introduces data products, outlines their key characteristics with a grocery‑store analogy, and highlights current market pains such as data silos, accessibility issues, and data‑literacy challenges. - [00:03:14](https://www.youtube.com/watch?v=7w7_QWPS9L8&t=194s) **Data Product Access & Quality Governance** - The speaker likens managing data products to grocery store practices, explaining how access controls protect sensitive information and how quality and lifecycle checks ensure data remains fresh and usable. ## Full Transcript
0:00Today, we're going to be talking about data products, 0:02a trending topic in the data management space. 0:05So, at a high level, we'll be covering what are data products, 0:09what are the key characteristics that make up a data product, 0:11and where do these data products live? 0:13And throughout this video, we'll be using a grocery store shopping analogy 0:17to help simplify the topics and help connect the dots. 0:20But before we do, I'd like to set the stage with our market perspective. 0:23So, currently right now, organizations have vast amounts of data 0:28and there's an extreme demand for AI and data driven insights than ever before. 0:34Now, with that comes some complications or struggles. 0:39First is data silos. 0:42So, having different data and assets and databases and it not being easily 0:46accessible or visible to the rest of the organization. 0:50Think of a cookie jar in a particular room of a house. 0:52Only people within that room have access to the cookie jar. 0:56Now, since we're talking about access, and consumers really struggle 0:59being able to have access to the right data at the right time when they need it. 1:08And once they finally get their hands on the right data, 1:11then there's an issue with being able to understand this data. 1:14So, they spend a lot of time massaging and getting this data into a format that 1:19they can understand, that the lines of business can understand 1:21and be able to actually derive value from that, and that's data literacy. 1:28Now, once end consumers find the data they need, they get access to the data, 1:36they have it in the format they can understand, 1:39now there's a question of the quality of that data, 1:41can I trust this data? 1:43Is this the most up to date version of the data? 1:46What type of transformations happened along the way as this was delivered to me? 1:49And in today's world of generative AI and traditional machine learning, you really need to be able 1:52to trust your data to be able to trust the outputs of your machine learning models. 1:56So, now that we've covered the market landscape 1:59and some of the pain points that organizations are facing today, 2:01I'm going to define what a data product is by running through 2:03the key characteristics that make up a data product. 2:07First, multiple assets. 2:09Data products are not made up of one asset, 2:11it's made up of multiple different assets, 2:14similar to how products within a grocery store, 2:17they're not made up of just one ingredient, 2:18they're made up of multiple different ingredients to create that product. 2:20Next, data products are meant to be reusable for multiple different use cases. 2:26Just like how you can buy a bag of apples, 2:29you can eat one apple, you can use three of those apples 2:31to make an apple pie, and you can use the remainder to make an apple sauce. 2:33It's reusable for multiple different use cases. 2:36Next, data products need to have a defined domain. 2:39This helps it to where end users that are coming into the marketplace 2:42can be able to find the products that they're looking for. 2:45So, a domain could be sales, human resources, operations, 2:49just like how a grocery store has different departments and different aisles. 2:55Data products are meant to be packaged in a user -friendly packaging, 3:04and this packaging explains to the end user how to use it, terms and conditions, 3:09the value of the product, and similar to how when we look at packaging of a product 3:11in a grocery store, we can see the ingredients in it, 3:13we can understand its expiration date. 3:14It gives us information about what that product is. 3:17The same applies to data products. 3:20Next is access control. 3:23When you're working with data products, 3:24you're working with a lot of different data and assets. 3:27Some of this data could be, it could be client information. 3:31It could be social security numbers. 3:33It could be credit card numbers, addresses. 3:36This is all information that you're going to want to have protected, 3:38and you're going to want to be able to govern this. 3:40We do that through different levels of access control 3:43to where if an end user or an end consumer 3:46is trying to grab a particular data product that has PI information in it, 3:51they're going to have to request that information. 3:53Same as if I'm in the grocery store and I want to buy some wine, 3:57I'm going to have to probably walk up to a clerk, 3:59ask them to either unlock the cabinet for the wine, 4:01or I'm going to have to show my ID as I check out. 4:04It gives that access control. 4:05The same applies to data products. 4:07Next is quality and lifecycle management. 4:10As you go into a grocery store, 4:12you know the produce on the shelves are going to be not expired. 4:17They're going to be FDA approved. 4:19The store employees are going to be making sure 4:23they're taking off the expiration, the expired products, 4:24and ensuring what's there is actually fresh. 4:26Same applies to data products within the marketplace. 4:30You don't want to have a bunch of data products that are 4:33poor quality or are not usable anymore. 4:36So the data producers who are creating these 4:39data products ultimately are responsible 4:41for ensuring the quality of this data and for 4:43following certain service lines agreements saying, 4:44Hey, I'll refresh this data product, you know, every 30 days. 4:47This ensures that there's a good lifecycle management of the data product 4:51and that the data products that are inside the marketplace are of high quality and usable. 4:56Last but not least is delivery channels. 5:00So you would want to have multiple different delivery channels 5:03within your marketplace for a variety of different reasons. 5:06First and foremost is because there are different consumers 5:10that are going to need different delivery mechanisms. 5:11Some are going to need to download it. 5:13Some just need to view this data product, just like customers in a grocery store, 5:17they're going to some on curbside pickup, others are going to want to do self checkout. 5:22Some are going to use a company to order the groceries online 5:25and have it delivered to them. 5:27But it's really important to have a variety of different delivery mechanisms 5:30because at the end of the day, you're working with a variety of different end consumers. 5:33So to wrap it up based on everything discussed, 5:36adopting a data product approach helps organizations break down data silos, 5:39enables their end users to have access to high quality data that they can understand. 5:45Ultimately, this unlocks the full potential of an organization's data, 5:49enables them to make more informed decisions with better business outcomes. 5:53Thanks for watching.