Learning Library

← Back to Library

Data Contracts to Prevent Downstream Errors

Key Points

  • A new data engineer discovered that downstream users were missing critical data because the problem originated in an upstream system, not his own team.
  • The speaker recommends using **data contracts**—formal agreements between data producers and consumers—to improve documentation, data quality, and service‑level agreements.
  • Implementing data contracts helps lower AI costs by preventing “garbage‑in‑garbage‑out” scenarios and reducing the need for frequent model retraining.
  • The **Open Data Contract Standard**, backed by the Linux Foundation, defines eight sections (demographics, dataset & schema, quality rules, pricing, stakeholders, security, SLA, and custom properties) to structure these agreements.
  • Applying such contracts would have given the son clear quality rules and SLAs, preventing the issue and ensuring downstream users received the data they needed.

Full Transcript

# Data Contracts to Prevent Downstream Errors **Source:** [https://www.youtube.com/watch?v=-n3OD-ml_k0](https://www.youtube.com/watch?v=-n3OD-ml_k0) **Duration:** 00:03:01 ## Summary - A new data engineer discovered that downstream users were missing critical data because the problem originated in an upstream system, not his own team. - The speaker recommends using **data contracts**—formal agreements between data producers and consumers—to improve documentation, data quality, and service‑level agreements. - Implementing data contracts helps lower AI costs by preventing “garbage‑in‑garbage‑out” scenarios and reducing the need for frequent model retraining. - The **Open Data Contract Standard**, backed by the Linux Foundation, defines eight sections (demographics, dataset & schema, quality rules, pricing, stakeholders, security, SLA, and custom properties) to structure these agreements. - Applying such contracts would have given the son clear quality rules and SLAs, preventing the issue and ensuring downstream users received the data they needed. ## Sections - [00:00:00](https://www.youtube.com/watch?v=-n3OD-ml_k0&t=0s) **Upstream Data Contracts Resolve Breakdowns** - The speaker explains that using the Open Data Contract standard—an agreement between data producers and consumers—improves documentation, quality, and SLAs, preventing downstream data shortages and reducing AI retraining costs. ## Full Transcript
0:00my son started a new job as a data 0:02engineer the other day he called me in 0:04the middle of the afternoon a little 0:06panicked and he never really 0:08calls his Downstream users were not very 0:11happy the problem they weren't getting 0:14the data they needed for sensitive 0:16reports but little did they know that 0:18the issue was not with his team but it 0:21was coming from the Upstream system does 0:25this sound familiar to you I have heard 0:28similar stories many many many times one 0:31way to solve this issue is to use data 0:35contracts so what is a data contract 0:38it's an agreement between a data 0:41producer and one or 0:46many data 0:48consumers and they 0:51share a data 0:55contract so the data contract why we do 0:58that it's because we 1:00want better 1:02documentation we want better data 1:05quality and we want better slas and why 1:10is the ultimate goal of that is to 1:13really lower the cost of AI you don't 1:17have to retrain your models you get 1:19better data in your system so you don't 1:22have to garbage in garbage 1:24out to do that we follow a standard 1:27called open data contract standard and 1:33it is a standard backed by the Linux 1:37Foundation it is composed of eight 1:41sections demographics which is really 1:43your name U 1:46version uh detailed information about 1:48your data contract then you've got your 1:51data set and schema representing the 1:54what the data is about Associated to 1:57your data quality rules you've got the 1:59the pricing section which currently is 2:02experimental but if you want to share 2:04your data within your organization or 2:06outside your organization you can 2:08specify rules you've got stakeholders 2:11where you see how the contract was being 2:14evolved by the different people you 2:16involved in the creation and maintenance 2:18of the contract you've got rules for 2:21security access service level agreement 2:24and custom property for future 2:28reference and extension 2:30so coming back to my son a this would 2:33have helped his problem by having better 2:36data quality rules and better 2:39slas he would have prevented the issues 2:43and given his customers the data as they 2:46expected and I hopefully next time he 2:49calls me it's just to say hello Papa 2:52thanks for watching before you leave 2:54please remember to hit like And 2:58subscribe