Distinguishing Observability, Monitoring, and APM
Sections
- Differentiating Observability, Monitoring, APM - The speaker explains how observability, monitoring, and APM differ by illustrating their roles using a legacy Java EE application example.
- Collect, Monitor, Analyze Observability Workflow - The speaker outlines a three‑step observability approach—collecting Kubernetes metrics and logs, visualizing them through dashboards for monitoring, and analyzing the data to troubleshoot application bugs.
- Automation, Context, and Action in Observability - The speaker emphasizes the need for automated context provisioning during upgrades, tracing issues back to source, and a structured analyze‑and‑fix cycle using modern observability tools.
Full Transcript
# Distinguishing Observability, Monitoring, and APM **Source:** [https://www.youtube.com/watch?v=CAQ_a2-9UOI](https://www.youtube.com/watch?v=CAQ_a2-9UOI) **Duration:** 00:09:40 ## Sections - [00:00:00](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=0s) **Differentiating Observability, Monitoring, APM** - The speaker explains how observability, monitoring, and APM differ by illustrating their roles using a legacy Java EE application example. - [00:03:40](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=220s) **Collect, Monitor, Analyze Observability Workflow** - The speaker outlines a three‑step observability approach—collecting Kubernetes metrics and logs, visualizing them through dashboards for monitoring, and analyzing the data to troubleshoot application bugs. - [00:07:02](https://www.youtube.com/watch?v=CAQ_a2-9UOI&t=422s) **Automation, Context, and Action in Observability** - The speaker emphasizes the need for automated context provisioning during upgrades, tracing issues back to source, and a structured analyze‑and‑fix cycle using modern observability tools. ## Full Transcript
These days I hear the terms Observability, Monitoring, and APM,
or Application Performance Management thrown around seemingly interchangeably,
but these terms actually mean quite different things. So let's dive in head first and see an
example of how exactly these things differ. So to start I'm going to start with kind of a Java EE
application, it's kind of old school, we'll go back you know maybe a decade. And let's say
that we've got some components in this Java EE app that actually power it. So something important to
remember here although we might be using a SOA, or service oriented architecture, this is not exactly
microservices. So they're not communicating over Rest APIs. So you have some inherent advantages
here, for example you can take advantage of like the framework the Java EE framework to
output log files which will probably all come out into the same directory and the timestamps
match up so things are good. In addition, you could take advantage of something like an APM
solution which is kind of like a one size fits all set and forget so you install it and it'll
kind of get rich analytics and data and metrics about the running services within the application.
So essentially what we've done is we've made our system observable so that you know our
Ops teams were then able to kind of look into it and identify problems and figure out you
know if anything needed to be done. So for the business objectives back then this was essentially
good enough, but this tends to fall apart very quickly when you start to move to a more cloud
native approach where you have multiple run times and multiple kind of layers to the architecture.
So let's say we have an example app here. So we'll say we'll start with node as a front end. Let's
say we also have a Java backend application. And then finally let's say we also have a Python app
which is doing some data processing. So let's see how these things work with each other so
the front-end app probably talks to the Java app and also the Python app for some data processing.
The Java app probably communicates with a database and then the Python app probably talks to the Java
app for kind of crud operations. So this is kind of my quick sketch, kind of a dummy layout for a
microservices based application. You can take it a step further and even say that this is all running
within Kubernetes. So we've got these container-based applications running in a cluster.
So immediately the first problem I can see here is that with multiple runtimes
we now have to think about multiple different agents or ways to collect data.
So instead of just one APM tool we might have to start thinking about pulling in multiple
so how would we con consolidate all that data right so that's a challenge.
In addition, let's think about things like logging. So each of these runtimes probably
outputting logs in a different place, and you know, we have to figure out how we consolidate
all those. Maybe we use a log streaming service. Regardless you can see the complexity starts to
grow. And finally, as you add more services and microservices components to this architecture,
say a user comes in where try to actually access one of these services and they run into an error
you need to trace that request through the multiple services. Well unless you have the
right architecture infrastructure in place, you know something like headers on requests,
maybe a way to handle web sockets, things are going to start to get messy and you can see how
the technical complexity grows quite large. So here's where Observability comes in and actually
differs, and differs itself from kind of standard APM tools. It thinks about the more holistic cloud
cloud-native approach for being able to do things like logging and monitoring and that
kind of thing. So I'll say there's three major steps for any sort of Observability solution.
We'll start with the first one we'll call it collect, because we need to collect data.
Then we'll go to monitor, and we'll talk about this because this is you know part of monitoring.
And finally we'll end with analyze, kind of doing something with the actual data that you have so
with the collect step, you know first thing let's say that we actually made our system observable.
So the great thing is with Kubernetes you get some CPU memory data automatically. So let's say
we get some of that, we get some logs from the application all streaming to the same location
and let's say we even get some other stuff like high availability numbers or average latency,
you know things that we want to be able to track and monitor.
So that brings me to my next step. So once we have this data available
we need to be able to actually do something with it, at least visualizing it maybe if we're not
actually even solving problems yet what do we do with this data. Well maybe we create
some dashboards to be able to monitor the health of our application, and say we create
multiple dashboards to be able to track different services or kind of different business objectives,
high availability versus latency, that kind of thing. Now the final thing that I want to talk
about here is what do we do next. So say we found some bug in the application by kind of looking
at our monitoring dashboards and we need to dive in deeper and fix the problem with the node app.
Well the great thing about that is an Observability solution should allow you to do
just that, it allows you to actually take it even a step further because these days with Kubernetes
you're getting a lot of that information from the Kubernetes layer. So this is something I want to
quickly pause and talk about. so with APM tools in the past they were really kind of focused on kind
of like resource constraints, CPU usage, memory usage, that kind of thing. These days that's been
offloaded to the Kubernetes layer, so you know Observability kind of took APM and evolved it
to the next stage, pulled it a step up and enables our users to focus on things like
SLOs and SLIs, Service Level Objectives and Service Level Indicators.
So these will enable you to actually focus on things that matter to your business.
So things like making sure that latencies are low or that application uptime is
high. So I think that's kind of the crucial three steps for any sort of observability solution.
Let's take a step back again. These things can be hard to set up on your own
with open source projects and capabilities pulling all the different things together,
so you might be looking at an Enterprise Observability Solution and so when you're
comparing competitors and looking at building out your enterprise observability capability
I would look at kind of three main things. Now let's start with automation.
Now every step of the way we need to make sure that automation is there to make things easier
so let's say that our dev team pushes out a new version of the node app and go from v1 to v2.
Now let's say they inadvertently introduced a bug. Instead of making a bulk API call they now
make individual API calls to the Python app. So in our monitoring dashboard our Ops team's like
oh guys something's wrong, the DB app is getting a lot of requests what's going on? Well you need to
be able to kind of automatically go back and trace through the requests and identify what happened.
That actually brings me to my second point as well, which is context. It's always important,
I can spell, to have that context. So automation is important here because when upgrading to the
new version a node you want to make sure that the right agent is automatically installed and kind
of the instrumentation is in place so your dev team doesn't quite have to do that, and
as new services get added you want your monitoring dashboards to be automatically updated as well.
And that context is extremely crucial as with this example we needed to be able to trace that
request back to the source of the problem. So once we've traced that request back to the source with
that context that we have the third step here and I think probably one of the most important
is action. What do we actually do now? And that brings me to my last step here the analyze phase,
which remember we talked about was kind of an evolution of traditional
APM tools to kind of the the way that Observability tools implement that today.
So when you get to this step you'll probably want to look at maybe the SLIs within the node app.
Maybe dive in deeper, right. So maybe you look in and you identify that you need to look at
application trace logs. So you look in the trace logs and you identify some problems and you figure
out what the what the fix is you tell it to your dev team you know maybe the last step here is fix
and then rinse and repeat for any other issues that might come up in the future.
So I think Enterprise Observability is extremely crucial here when we're kind of looking at
the bigger picture because it's not just about having the individual pieces,
which again like I said might be quite hard to set up with purely open source approaches,
but you want to think about automation to make sure things are kind of set up seamlessly to
reduce the overhead on your side. make sure you have context to be able to see how services work
with each other maybe even generate things like dependency graphs to see the broader view because
you might not always have a light board like this to see the architecture so cleanly. And
finally being able to take action when you do find a problem. So making sure that your Observability
solution has a way to automatically pull together data from multiple sources, multiple services,
and then figure out what's valid and necessary for you to be able to make that fix happen. So
IBM is invested in making sure our clients can effectively set up Enterprise Observability
with the recent acquisition of Instanta. To learn more about the acquisition,
or to get a showcase of the capabilities be sure to check out the links in the description below.
As always thanks for watching our videos. If you liked the video or have any questions or comments,
be sure to drop a like and a question or comment below. Be sure to subscribe and
stay tuned for more videos like this in the future. Thank you.