Learning Library

← Back to Library

Enterprise Data Streaming Architecture Overview

Key Points

  • Data is likened to “the new oil,” and harnessing the massive, fast‑moving streams that enterprises generate (e.g., a 737 aircraft produces ~20 TB in an hour) is critical for informed, competitive decision‑making.
  • A streaming architecture consists of three core layers: **origin** (the source of continuous data, often paired with a messaging protocol like MQTT), **processor** (where the data is filtered, analyzed, and contextualized), and **destination** (where the refined data is stored or presented for downstream consumers).
  • The primary advantage of this architecture is minimizing data staleness—delivering value as quickly as possible, often described as “real‑time” streaming, to enable rapid insight and action.
  • The presenter will later provide a deeper technical dive into the underlying mechanisms that power these streaming pipelines.

Full Transcript

# Enterprise Data Streaming Architecture Overview **Source:** [https://www.youtube.com/watch?v=aBIxpJ1_EyY](https://www.youtube.com/watch?v=aBIxpJ1_EyY) **Duration:** 00:09:23 ## Summary - Data is likened to “the new oil,” and harnessing the massive, fast‑moving streams that enterprises generate (e.g., a 737 aircraft produces ~20 TB in an hour) is critical for informed, competitive decision‑making. - A streaming architecture consists of three core layers: **origin** (the source of continuous data, often paired with a messaging protocol like MQTT), **processor** (where the data is filtered, analyzed, and contextualized), and **destination** (where the refined data is stored or presented for downstream consumers). - The primary advantage of this architecture is minimizing data staleness—delivering value as quickly as possible, often described as “real‑time” streaming, to enable rapid insight and action. - The presenter will later provide a deeper technical dive into the underlying mechanisms that power these streaming pipelines. ## Sections - [00:00:00](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=0s) **Streaming Data: The New Oil** - The speaker introduces data streaming concepts, emphasizing the massive, fast‑moving data generated by enterprises (e.g., a 737 plane) and outlines a three‑part streaming architecture. - [00:04:37](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=277s) **Enriching and Analyzing Streaming Sensor Data** - The speaker outlines a three‑step process—filtering high‑velocity sensor streams, adding contextual metadata such as asset and location, and then applying machine‑learning techniques to detect patterns. - [00:07:49](https://www.youtube.com/watch?v=aBIxpJ1_EyY&t=469s) **Horizontal Scaling of Data Processing** - The speaker describes how a processing engine can horizontally expand across multiple compute nodes—adding processing sections, destinations, or receivers—to handle data spikes, keep up with wire speed, and maximize real‑time value. ## Full Transcript
0:00All around us every day is data and in some cases  this data is moving really fast and really with a 0:10lot of information. You know, in 2006, uh,  a phrase was coined, data is the new oil. 0:19And this mathematician uh really just kind  of struck the nail on the head because 0:26uh when you look at the an enterprise and and  all the data that it generates and creates, 0:32using that data to make better informed business  decisions is absolutely paramount in being 0:42um a you know a leader and an innovator in your  uh in that you know given uh uh area of of the 0:51business. So just to give you an example, a 737  plane creates about 20 terabytes of data in just 1:02one hour of use. Now a lot of that information can  be fairly benign. Um but imagine if you will um if 1:14if you were faced and tasked with the problem of  we need to leverage all of this information, all 1:21of this data and some of it is just voluminous.  How do we do it? Well, let's talk about that. 1:28Today we're going to be talking about streaming  and data streaming concepts. And I'm going to talk 1:35first about an architecture uh for a streaming uh  you know streaming data in an enterprise and then 1:42in a future video we'll do a deeper dive on what  really kind of happens underneath the covers. But 1:49first let's get started. Okay. So in a streaming  architecture there are essentially three areas. 1:55First of all you have an origin and that's where  the data actually comes from. Could be the sensor, 2:03could be a machine itself, could uh could be  anything that um produces or emits data and 2:09remember this data is coming all the time and  constant. Uh sometimes the origin is paired 2:15with a messaging system for example like MQTT  technology uh that allows that telemetry uh to to 2:23get delivered to some other system. So we have an  origin. The next thing we have is processor. The 2:34processor is actually a place where in the uh in  the overall architecture where we take action and 2:40handle the data and try to um in some cases trim  it down but in in many cases we try to understand 2:48what the story is with the data that we're  given. Lastly is a destination. The destination 2:55is actually where we're going to land the data in  this streaming architecture that can allow people 3:02who are consumers uh down the stream to uh to  leverage it and and do it at their own pace. Um 3:09the key value point with a streaming architecture  is to avoid the stale. So in a graph you can see 3:20that if I were to plot where value is up here  and time is down here our graph for our data 3:32starts about here and then goes off like that.  What we do with a streaming architecture is we 3:41try to capitalize on this right here. the ability  to maximize our value in the lowest amount of time 3:52and many call that real time but it is essentially  streaming. So let's dig in. So from an origin 4:01perspective we're going to take that information  we're going to deliver that into a system and so 4:08we're going to just essentially we're just going  to call this ingest. We're going to take that data 4:15from the origins and then we're going to take  it into our streaming analytics and streaming 4:19platform. The next thing is the processor. So in  the processor when you look at the things that 4:25happen really the most typical things are first  of all we're going to um potentially filter. 4:37We're then next going to enrich 4:45Then we're going to analyze. 4:53So in these three steps, what we have is an  ability to take all this voluminous data at 5:00the speed that it's coming on the wire and we're  going to filter it to get rid of things that we're 5:04not interested in. We're going to add context such  as where's this data coming from? What machine? 5:10what what location, what is it uh currently  in the business, what are the operations, 5:16uh all of that just it doesn't come on the sensor.  Actually, when we when the record arrives, most 5:20of the time it's just going to have a time stamp  and some rudimentary readings like temperature and 5:25pressure, something of that nature. What we need  to add to it is where is this coming from? Give 5:31me some context of what this sensor is actually  attached to. This could be a vehicle. It could be 5:36uh your vehicle. And when we're you when  we're looking at this, we need to actually 5:42kind of put it into context. Then the next  step is we're going to analyze it. This is 5:46where we're going to apply machine learning  potential uh potentially uh traditional AI, 5:51maybe generative AI, but we're going to look  at this and and try to analyze the readings, 5:57whether they be temperature readings or pressure  readings over time and try to find patterns. We'll 6:03try to find the patterns either they're going up,  they're going down, wherever we're interested in. 6:09Obviously, if we're looking at um costs and  you know, money related things, we want to 6:16u make decisions when it's optimal for us to uh  to you know, make a purchase so that the money 6:22the the cost is down. But if we're operating  a machine and the temperature is going up, 6:28that might be leading to a failure. So we want  to get to that sooner rather than later. So once 6:34we analyze this information, we get context about  what's the data telling us, what is happening in 6:39the data. Then lastly, we're going to take  and we're going to egress that information 6:49for somebody else that might be interested  in another area of the business. But overall, 6:57a streaming uh you know, a streaming architecture  contains a way for you to capitalize on maximum 7:05value. So, we're doing this in real time at  wire speed as the data is coming across. We're 7:11ingesting it from the origins. We're processing  it, filtering, enriching, and analyzing. And then 7:17we're egressing that points of interest. You  know, I had a mentor tell me a long time ago 7:22about how companies had become data hoarders.  We really, you know, this is a system and an 7:28architecture that could help you prevent, you  know, really not be a data hoarder. You can 7:33um avoid caping, you know, hundreds of thousands  of records that have the same reading and just 7:38really only persist those records that have the  anomaly or have the variant that are points of 7:44interest that could, you know, really impact maybe  a maintenance decision or an operations decision. 7:49So we're going to store the things that we're  interested in for later use. Now last point, 7:54how does this scale? What does this look like? So  in many cases you can have a a processing engine 8:01that can uh you know when we look at an instance  of this we can have you know multiple numbers of 8:11engines scaled across different compute so  that we're be a you know we're we're scaling 8:18horizontally to take in that amount of data  so that we can keep up with the wire speed. 8:23uh in many cases we're not talking about that  level of data. We're just sort of able to scale 8:31to points where we have maybe um you know spikes  in data and so in that the the actual engine 8:40itself will um scale out and have n number of  process sections or n number of destinations or 8:51n number of receivers to take in from our origin  data. Either way, we scale to meet the speed of 9:00the data so that we can always keep our eye on  the north star, which is maximizing the value 9:06in the real time that the data is emitted.  Thanks for watching. I hope this was helpful.