Distinguishing Observability, Monitoring, and APM

9m • Unknown Channel • devops • tutorial • intermediate • Watch on YouTube ↗

Sections

00:00:00 Differentiating Observability, Monitoring, APM - The speaker explains how observability, monitoring, and APM differ by illustrating their roles using a legacy Java EE application example.
00:03:40 Collect, Monitor, Analyze Observability Workflow - The speaker outlines a three‑step observability approach—collecting Kubernetes metrics and logs, visualizing them through dashboards for monitoring, and analyzing the data to troubleshoot application bugs.
00:07:02 Automation, Context, and Action in Observability - The speaker emphasizes the need for automated context provisioning during upgrades, tracing issues back to source, and a structured analyze‑and‑fix cycle using modern observability tools.

Full Transcript

# Distinguishing Observability, Monitoring, and APM **Source:** [https://www.youtube.com/watch?v=CAQ_a2-9UOI](https://www.youtube.com/watch?v=CAQ_a2-9UOI) **Duration:** 00:09:40 ## Sections - [00:00:00](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=0s) **Differentiating Observability, Monitoring, APM** - The speaker explains how observability, monitoring, and APM differ by illustrating their roles using a legacy Java EE application example. - [00:03:40](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=220s) **Collect, Monitor, Analyze Observability Workflow** - The speaker outlines a three‑step observability approach—collecting Kubernetes metrics and logs, visualizing them through dashboards for monitoring, and analyzing the data to troubleshoot application bugs. - [00:07:02](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=422s) **Automation, Context, and Action in Observability** - The speaker emphasizes the need for automated context provisioning during upgrades, tracing issues back to source, and a structured analyze‑and‑fix cycle using modern observability tools. ## Full Transcript

0:00These days I hear the terms Observability, Monitoring, and APM, 0:04or Application Performance Management thrown around seemingly interchangeably, 0:08but these terms actually mean quite different things. So let's dive in head first and see an 0:13example of how exactly these things differ. So to start I'm going to start with kind of a Java EE 0:19application, it's kind of old school, we'll go back you know maybe a decade. And let's say 0:25that we've got some components in this Java EE app that actually power it. So something important to 0:30remember here although we might be using a SOA, or service oriented architecture, this is not exactly 0:35microservices. So they're not communicating over Rest APIs. So you have some inherent advantages 0:41here, for example you can take advantage of like the framework the Java EE framework to 0:46output log files which will probably all come out into the same directory and the timestamps 0:51match up so things are good. In addition, you could take advantage of something like an APM 0:55solution which is kind of like a one size fits all set and forget so you install it and it'll 1:01kind of get rich analytics and data and metrics about the running services within the application. 1:07So essentially what we've done is we've made our system observable so that you know our 1:12Ops teams were then able to kind of look into it and identify problems and figure out you 1:19know if anything needed to be done. So for the business objectives back then this was essentially 1:24good enough, but this tends to fall apart very quickly when you start to move to a more cloud 1:30native approach where you have multiple run times and multiple kind of layers to the architecture. 1:36So let's say we have an example app here. So we'll say we'll start with node as a front end. Let's 1:42say we also have a Java backend application. And then finally let's say we also have a Python app 1:50which is doing some data processing. So let's see how these things work with each other so 1:54the front-end app probably talks to the Java app and also the Python app for some data processing. 2:00The Java app probably communicates with a database and then the Python app probably talks to the Java 2:05app for kind of crud operations. So this is kind of my quick sketch, kind of a dummy layout for a 2:12microservices based application. You can take it a step further and even say that this is all running 2:18within Kubernetes. So we've got these container-based applications running in a cluster. 2:25So immediately the first problem I can see here is that with multiple runtimes 2:28we now have to think about multiple different agents or ways to collect data. 2:32So instead of just one APM tool we might have to start thinking about pulling in multiple 2:37so how would we con consolidate all that data right so that's a challenge. 2:41In addition, let's think about things like logging. So each of these runtimes probably 2:45outputting logs in a different place, and you know, we have to figure out how we consolidate 2:49all those. Maybe we use a log streaming service. Regardless you can see the complexity starts to 2:53grow. And finally, as you add more services and microservices components to this architecture, 2:58say a user comes in where try to actually access one of these services and they run into an error 3:04you need to trace that request through the multiple services. Well unless you have the 3:08right architecture infrastructure in place, you know something like headers on requests, 3:14maybe a way to handle web sockets, things are going to start to get messy and you can see how 3:18the technical complexity grows quite large. So here's where Observability comes in and actually 3:25differs, and differs itself from kind of standard APM tools. It thinks about the more holistic cloud 3:30cloud-native approach for being able to do things like logging and monitoring and that 3:35kind of thing. So I'll say there's three major steps for any sort of Observability solution. 3:40We'll start with the first one we'll call it collect, because we need to collect data. 3:46Then we'll go to monitor, and we'll talk about this because this is you know part of monitoring. 3:51And finally we'll end with analyze, kind of doing something with the actual data that you have so 3:58with the collect step, you know first thing let's say that we actually made our system observable. 4:03So the great thing is with Kubernetes you get some CPU memory data automatically. So let's say 4:07we get some of that, we get some logs from the application all streaming to the same location 4:13and let's say we even get some other stuff like high availability numbers or average latency, 4:17you know things that we want to be able to track and monitor. 4:21So that brings me to my next step. So once we have this data available 4:26we need to be able to actually do something with it, at least visualizing it maybe if we're not 4:31actually even solving problems yet what do we do with this data. Well maybe we create 4:36some dashboards to be able to monitor the health of our application, and say we create 4:41multiple dashboards to be able to track different services or kind of different business objectives, 4:47high availability versus latency, that kind of thing. Now the final thing that I want to talk 4:52about here is what do we do next. So say we found some bug in the application by kind of looking 4:59at our monitoring dashboards and we need to dive in deeper and fix the problem with the node app. 5:05Well the great thing about that is an Observability solution should allow you to do 5:10just that, it allows you to actually take it even a step further because these days with Kubernetes 5:16you're getting a lot of that information from the Kubernetes layer. So this is something I want to 5:20quickly pause and talk about. so with APM tools in the past they were really kind of focused on kind 5:25of like resource constraints, CPU usage, memory usage, that kind of thing. These days that's been 5:30offloaded to the Kubernetes layer, so you know Observability kind of took APM and evolved it 5:36to the next stage, pulled it a step up and enables our users to focus on things like 5:42SLOs and SLIs, Service Level Objectives and Service Level Indicators. 5:47So these will enable you to actually focus on things that matter to your business. 5:51So things like making sure that latencies are low or that application uptime is 5:55high. So I think that's kind of the crucial three steps for any sort of observability solution. 6:02Let's take a step back again. These things can be hard to set up on your own 6:06with open source projects and capabilities pulling all the different things together, 6:12so you might be looking at an Enterprise Observability Solution and so when you're 6:17comparing competitors and looking at building out your enterprise observability capability 6:22I would look at kind of three main things. Now let's start with automation. 6:29Now every step of the way we need to make sure that automation is there to make things easier 6:33so let's say that our dev team pushes out a new version of the node app and go from v1 to v2. 6:40Now let's say they inadvertently introduced a bug. Instead of making a bulk API call they now 6:46make individual API calls to the Python app. So in our monitoring dashboard our Ops team's like 6:51oh guys something's wrong, the DB app is getting a lot of requests what's going on? Well you need to 6:56be able to kind of automatically go back and trace through the requests and identify what happened. 7:02That actually brings me to my second point as well, which is context. It's always important, 7:07I can spell, to have that context. So automation is important here because when upgrading to the 7:13new version a node you want to make sure that the right agent is automatically installed and kind 7:17of the instrumentation is in place so your dev team doesn't quite have to do that, and 7:22as new services get added you want your monitoring dashboards to be automatically updated as well. 7:27And that context is extremely crucial as with this example we needed to be able to trace that 7:32request back to the source of the problem. So once we've traced that request back to the source with 7:37that context that we have the third step here and I think probably one of the most important 7:42is action. What do we actually do now? And that brings me to my last step here the analyze phase, 7:48which remember we talked about was kind of an evolution of traditional 7:51APM tools to kind of the the way that Observability tools implement that today. 7:56So when you get to this step you'll probably want to look at maybe the SLIs within the node app. 8:01Maybe dive in deeper, right. So maybe you look in and you identify that you need to look at 8:05application trace logs. So you look in the trace logs and you identify some problems and you figure 8:10out what the what the fix is you tell it to your dev team you know maybe the last step here is fix 8:17and then rinse and repeat for any other issues that might come up in the future. 8:22So I think Enterprise Observability is extremely crucial here when we're kind of looking at 8:26the bigger picture because it's not just about having the individual pieces, 8:30which again like I said might be quite hard to set up with purely open source approaches, 8:34but you want to think about automation to make sure things are kind of set up seamlessly to 8:39reduce the overhead on your side. make sure you have context to be able to see how services work 8:44with each other maybe even generate things like dependency graphs to see the broader view because 8:50you might not always have a light board like this to see the architecture so cleanly. And 8:54finally being able to take action when you do find a problem. So making sure that your Observability 9:00solution has a way to automatically pull together data from multiple sources, multiple services, 9:06and then figure out what's valid and necessary for you to be able to make that fix happen. So 9:13IBM is invested in making sure our clients can effectively set up Enterprise Observability 9:17with the recent acquisition of Instanta. To learn more about the acquisition, 9:21or to get a showcase of the capabilities be sure to check out the links in the description below. 9:26As always thanks for watching our videos. If you liked the video or have any questions or comments, 9:31be sure to drop a like and a question or comment below. Be sure to subscribe and 9:35stay tuned for more videos like this in the future. Thank you.