Learning Library

← Back to Library

Data Observability Explained with Train Analogy

Key Points

  • Ryan introduces the IBM Technology Channel video, asks viewers to like, subscribe, and share, and promises a train‑analogy demo to illustrate data pipelines and observability.
  • He outlines the rapid evolution of software engineering over the past 5‑8 years—CI/CD, DevOps, infrastructure‑as‑code, cloud microservices—making observability a standard practice for application performance monitoring (APM).
  • Ryan points out that just as every organization became a software company, today every organization is becoming a data company, leading many software engineers to transition into data‑engineering roles.
  • This shift has created a new “data observability” movement, applying the same monitoring, tracing, and alerting principles from APM to data pipelines to ensure data quality, reliability, and timely issue resolution.

Full Transcript

# Data Observability Explained with Train Analogy **Source:** [https://www.youtube.com/watch?v=jfg9wBJBtKk](https://www.youtube.com/watch?v=jfg9wBJBtKk) **Duration:** 00:11:57 ## Summary - Ryan introduces the IBM Technology Channel video, asks viewers to like, subscribe, and share, and promises a train‑analogy demo to illustrate data pipelines and observability. - He outlines the rapid evolution of software engineering over the past 5‑8 years—CI/CD, DevOps, infrastructure‑as‑code, cloud microservices—making observability a standard practice for application performance monitoring (APM). - Ryan points out that just as every organization became a software company, today every organization is becoming a data company, leading many software engineers to transition into data‑engineering roles. - This shift has created a new “data observability” movement, applying the same monitoring, tracing, and alerting principles from APM to data pipelines to ensure data quality, reliability, and timely issue resolution. ## Sections - [00:00:00](https://www.youtube.com/watch?v=jfg9wBJBtKk&t=0s) **Data Observability Intro with Train Analogy** - Ryan from IBM introduces a video on data observability—encouraging channel interaction, teasing a train‑themed illustration of pipelines, and briefly contextualizing the trend within the rise of CI/CD in software engineering. - [00:03:02](https://www.youtube.com/watch?v=jfg9wBJBtKk&t=182s) **Bridging Software and Data Engineering** - The speaker explains how software engineers are shifting to data‑engineer roles, using familiar coding tools to build data pipelines, but unlike application development they lack mature observability, making data observability the next essential capability. - [00:06:04](https://www.youtube.com/watch?v=jfg9wBJBtKk&t=364s) **Observability Challenges in Data Pipelines** - The speaker explains how ML pipelines aim to deliver trustworthy data but frequent failures force engineers to spend half their time on maintenance, highlighting the need for observability, illustrated with a train‑movement analogy. - [00:09:05](https://www.youtube.com/watch?v=jfg9wBJBtKk&t=545s) **Data Pipeline Observability and Lineage** - The speaker explains how monitoring process quality, data quality, and lineage in data pipelines enables rapid detection of issues and alerts downstream users to prevent cascading failures. ## Full Transcript
0:00Hey, everyone, this is Ryan with IBM. 0:02Excited you came by the IBM Technology Channel today. 0:04We're gonna be talking about what is data observability. 0:07One of the hottest topics in the data space today. 0:10But before we get there, I want to remind you, 0:12please subscribe, like the channel, interact with us in the comments. 0:15It really helps us produce the videos you want to see from us in the future. 0:20So go ahead. 0:21Like it, share it. 0:22Send it to your Mom and Dad. 0:23Let's get the word out there. 0:25All right. 0:26So before we get into data 0:27observability, I want to tease you a 0:29little bit. 0:30I'm going to use a 0:32train analogy later on 0:34in this demonstration. 0:36So what I want you to do 0:38is if you are really excited about 0:39trains like I am, I love trains. 0:42I used to play with trains all the 0:43time during Christmas time with my 0:44grandpa growing up. 0:46This is going to be a cool example 0:48to show you how we're connecting 0:49everything together with data 0:51pipelines, data engineering 0:52and also observability. 0:53So we're going to get back to that 0:54promise. 0:56Okay. So let's give a little quick 0:58history lesson of what's going 1:00on in the industry of how we're 1:01getting to this data observability 1:03moment that we're seeing right now. 1:05And it really comes down to 1:08something going on that 1:10is compared 1:12to the software engineering 1:14group and 1:17explosion that's happened the last 1:195 to 8 years, which is 1:21the software engineers are basically 1:22ruling the world. 1:23These developers are ruling 1:25with frameworks and 1:28methodologies like CI and CD. 1:30They're pioneering things like dev 1:32ops. 1:33They're doing things like infra 1:36as code, all these 1:38really advanced things 1:41in the software development space. 1:42It's really blown up. 1:43And these are all like table stakes 1:45nowadays. If you're a software 1:46engineer, you're basically doing all 1:47these things and you're doing that 1:49with building applications, like 1:51building applications in the cloud, 1:52you're building microservices, 1:55you're also building applications. 1:57So all these things are great. 1:59And recently, around five 2:00years ago, there was a blow 2:02up in the space around observability 2:04itself. There's another video that 2:06we've done at IBM around 2:07observability that I encourage 2:09you to check that out. 2:10And it's really around this idea 2:12of application performance 2:15monitoring or APM. 2:18APM is really around being able 2:20to detect problems and performance 2:22issues in your application so 2:23developers can be alerted right away 2:25and go resolve something. 2:27Whether a server was down or 2:29you had an application hiccup 2:32in production, you know right away 2:34through the trace logs and issues 2:36that they know exactly where to go 2:37to fix that, right? 2:38And that's awesome that they've got 2:40all these tools kind of helping them 2:41out. 2:42Well, what's going on now is it 2:44used to be that everywhere 2:46everyone was a software company. 2:47Every company is now a software 2:48company. Well, now every 2:50company is now a data company. 2:52And what's going on is that a lot of 2:54these software engineering 2:56folks or developers or engineers, 2:58they're actually moving to become 3:01data engineers. 3:02And so they're using a lot of the 3:03same skill sets. 3:04They're, you know, code heavy 3:07skill sets like Python and 3:09using frameworks like DevOps, all 3:10these things that they're used to as 3:12software engineers and now becoming 3:14data engineers to take the 3:16same things we're doing around 3:17continuous delivery for application 3:19development now into 3:23data development within the 3:24organization today. 3:26So where software engineers 3:28really care about the applications 3:29that they're building like 3:31cloud microservices and 3:33applications out there, what data 3:35engineers are actually really 3:37focused on is their data 3:39pipelines. 3:40These data pipelines are the things 3:41that are moving the data from the 3:43source all the way to the end 3:45state of the consumer. 3:46And they're in charge of around 80 3:48to 90% of all 3:50the data flow within their 3:51organization today. 3:52But the problem is this, is that 3:54whereas software engineers have 3:56APM application performance 3:57monitoring as an observability tool 3:59to help them find things right away, 4:01alert them, resolve them; 4:03data engineering really isn't there 4:05yet. And this is where 4:07observability comes into play. 4:09Observability is basically, data 4:11observability is the 4:13next step for data engineers to 4:15operationalize any instant detection 4:17that they have within their data 4:18pipelines. 4:19So let's walk through and we're 4:21going to use the train analogy as 4:23a as a pipeline. 4:24So I view this train as an 4:26actual pipeline that's going to be 4:27moving down the tracks here. 4:29But this is what engineers will 4:31do. They are building pipelines, 4:34they're orchestrating the data 4:36from the source data all the way 4:38to their end consumer. 4:39And he's going to have hundreds to 4:40thousands of different data 4:41pipelines that they're they're using 4:43most commonly, they're using open 4:45source tools like Apache Airflow to 4:47to take the data and move it. 4:49And just to illustrate real quick, 4:51this becomes a complicated problem 4:53because what you're doing is 4:55you're taking all different 4:57types of sources that you can, maybe 4:59can and cannot control, 5:02from third party applications, from, 5:05you know, web server APIs, things 5:07like that... You're taking these 5:08things and then we're expected to 5:10funnel them into 5:12storage areas like a 5:14data warehouse. 5:17Or something that's unstructured, 5:19like a data lake. 5:23So data engineers. 5:24Again, they're in charge of moving 5:26this data to eventually, 5:29when you get the data in 5:31the right state, they want to be 5:32able to power the data products 5:35that you're building today. 5:36And those can be things like 5:38business analytics, finance, 5:40marketing, sales, having 5:42predictive analytics around, you 5:44know, the success of their business 5:45and making decisions off of that 5:46data. 5:47That's one use case. 5:48Another use case is actually just 5:49building data products themselves, 5:52building applications 5:54that are mobile or web or 5:55have high volume of transactions 5:57that are associated with those. 5:58Like, for example, like a sports 6:00betting company. 6:02That would be an example of a 6:03building, a data product. 6:04ML deep learning pipelines is 6:06another one. 6:07Using these pipelines to 6:10really drive and take the business 6:12to the next step in their AI journey 6:14for trustworthy data. 6:15So all this at the end, the goal, 6:17though, what they're really trying 6:18to do is they're trying to deliver 6:20the data in a trustworthy way 6:23to eventually get to 6:25their consumers. 6:28Now, this would be great if 6:30everything works well in these these 6:32pipelines, right? 6:33But I hate to break it to you, 6:34but that's not how things work. 6:36We also know that 6:38data engineers are spending around 6:4050% of their time actually 6:41maintaining these pipelines because 6:43things break and things are 6:45constantly halting them 6:47for holding them from building 6:49these really cool, data driven 6:51products in their organization 6:52today. 6:53So where does observability come 6:55in? Well, all goes back to being 6:57able to monitor an observer data 6:58pipeline. So let's walk through and 7:00examine the examples of this, the 7:02problems of these data engineers, 7:03these teams will face 7:05with this train analogy that I wrote 7:07up here. So the first one is this If 7:09you're on a train 7:11and you're sitting there waiting for 7:13it to move, the first thing you're 7:14going to ask is, is the train moving 7:15or not? 7:17You want to be understanding right 7:18away. Hey, is the train moving? 7:19And that relates back to a pipeline. 7:21Is a pipeline actually operational? 7:23Is the pipeline executing 7:25or is it failing or is a halt? 7:27Is it stalled if the data 7:29pipeline is not moving correctly? 7:30If it's not moving at all, this data 7:33cannot get to the end consumer. 7:35The next question is like, well, how 7:37fast is this train going? 7:38You know, is it is it going at 90%, 7:4080%, 20%? 7:43Did it did it take an hour 7:44when it should've taken 5 minutes to 7:46take 10 hours? 7:47Mission taking one hour. 7:48This is a problem. 7:49If we don't understand how 7:51the if we don't know exactly when 7:53the data is going to be first 7:56operationalize and running and 7:58orchestration with with the 7:59pipelines. But also if we have a 8:01data SLA that needs to get there at 8:02a certain time. 8:03Well, we've got to know that. 8:04We've got to know that that's going 8:05to that's going to be a problem. 8:06Right. So that's the first part. 8:08The second part is around the cargo 8:10on the train. 8:11So all trains, hopefully they're 8:12taking cargo. 8:13Maybe they got cars, maybe they got 8:16computers, whatever it is. 8:17Well, the next question is, okay, if 8:19our pipeline is running really good. 8:21Well, what's going on 8:22with the actual data sets there? 8:24It's basically the cargo on that 8:26pipeline. 8:27This is what gets into understanding 8:29if there is any things going on, 8:31anything going on with the data at 8:32the data set level. 8:33So for example, there could have 8:34been a schema 8:36change to this. 8:37So we had no idea about. 8:39So we were expecting ten columns and 8:41we got 11 we're expecting ten 8:42columns and we got nine. 8:44That's a problem because that data 8:46is going to be impacted downstream 8:47with something that's going to 8:48consume that data set. 8:50We also have since 8:52a no record through. 8:54I will say it was a really high 8:56value piece of data that we expected 8:58to come through every single day and 8:59didn't happen. 9:00That's a problem. 9:01Or if the data actually change, if 9:03all these are checks on ones and X, 9:04that's a problem. 9:05It's going to corrupt the data 9:07downstream. 9:08With the data, we're moving in these 9:10data pipelines. 9:11So the first one is really around 9:12the process quality. 9:13The second one is around the data 9:15quality within the actual pipeline, 9:17and the next one is around the 9:19lineage. So 9:21if you are on a train, the train is 9:23going to go somewhere. 9:25Eventually it's going to drop off 9:26the cargo, it's going to go on 9:27different tracks. It may hook up, 9:29change trains may go to you if 9:31you're going from Georgia to York 9:32City, maybe even a stop in Delaware 9:34for an example. 9:35As an example, this really gets into 9:37data lineage, which is 9:39how are things connected 9:41to dependent pipelines 9:43and data sets downstream. 9:46So for example, if we can 9:48make if we know the pipeline is is 9:50working good, but then we find that 9:51there's a problem in the actual 9:53data with the pipeline. 9:55The next question is how does this 9:57this problem here 9:59impact something downstream 10:01of another data set that's consuming 10:04from another pipeline? 10:05So we want to know right away if 10:06this fails, we want to be able to 10:07know that it's going to impact this 10:09this person down here to let them 10:11know. 10:12And this is what that observability 10:14is all about. 10:15What we're doing is we're 10:16operationalizing the 10:18incident detection. 10:20So whenever we see a problem 10:21at the source, the warehouse, 10:24even downstream, at the at the 10:25product level, we want to be able to 10:27alert up into the data 10:29engineering team so they can 10:31notify be notified right away 10:33when a problem occurs, they can fix 10:35it and prevent it from getting it 10:38from impacting downstream consumers 10:40and ultimately having big, costly 10:42impacts to the business. 10:44And so really the tenets of data 10:46observability is to detect 10:48earlier, get those get those 10:50sections right away at the source, 10:51resolve them faster, know exactly 10:53where they are and how they affect 10:55other people downstream, and then 10:57ultimately deliver that trust from 10:59the data to hit your expected data. 11:00SLA As for your end consumer, 11:03so let's wrap 11:05it up. 11:05Talked a little bit about the 11:07industry of what's going on, law, 11:08software engineers moving data and 11:10data engineering. They need the 11:11tools to help them be able to 11:13operationalize any incidents that 11:15occur within their data pipelines. 11:17We gave a simplistic example of of a very complex workflow 11:21that engineers are constantly dealing with, which is 11:24pumping source data all the way to 11:26their end consumers and dealing with 11:27tons of different tools in between 11:30that could prevent a 11:32bad issue or a 11:34a dirty data problem 11:36downstream to the consumers. 11:37And we also had a little fun talking 11:39about how this connects into a 11:41train analogy. 11:42So I really hope you enjoyed this video about what is data observability 11:45and check back for the channel for more videos in the future. 11:49Thanks, everyone. 11:50If you like this video and want to see more like it, please like and subscribe. 11:55If you have any questions, please drop them in the comments below.