Decoupling Cloud Applications with Kafka
Key Points
- Real‑time experiences in modern cloud apps are delivered by Apache Kafka, an open‑source distributed streaming platform that continuously produces and consumes data streams.
- Kafka’s clustered architecture provides high throughput, ordered record handling, strong data accuracy, replication, and fault‑tolerance, ensuring low‑latency performance at scale.
- Traditional monolithic integrations (e.g., checkout → shipment) become cumbersome as applications grow, creating tight inter‑service dependencies and slowing development.
- By streaming events (e.g., checkout events) that any service can subscribe to, Kafka cleanly decouples system components, eliminating complex point‑to‑point integrations.
- This event‑driven, message‑based approach works both for retrofitting existing applications and for building cloud‑native services from the ground up.
Sections
- Apache Kafka Enables Real‑Time Streaming - Whitney Lee describes how Kafka’s distributed, fault‑tolerant, ordered streaming architecture delivers low‑latency, high‑volume data processing for modern cloud applications, contrasting it with legacy integration approaches.
- Kafka Use Cases: Decoupling, Tracking, Analytics - The speaker explains how Kafka streams events such as checkout actions, ride‑share driver locations, and user activity, allowing independent services to subscribe and react, thereby illustrating decoupled architecture, real‑time tracking, and analytics collection.
- Kafka APIs Overview and Integration - The passage outlines Kafka’s consumer, streams, and connector APIs, explaining how persistent topics are consumed, transformed in real time, and how reusable connectors simplify integrating external data sources.
Full Transcript
# Decoupling Cloud Applications with Kafka **Source:** [https://www.youtube.com/watch?v=aj9CDZm0Glc](https://www.youtube.com/watch?v=aj9CDZm0Glc) **Duration:** 00:09:16 ## Summary - Real‑time experiences in modern cloud apps are delivered by Apache Kafka, an open‑source distributed streaming platform that continuously produces and consumes data streams. - Kafka’s clustered architecture provides high throughput, ordered record handling, strong data accuracy, replication, and fault‑tolerance, ensuring low‑latency performance at scale. - Traditional monolithic integrations (e.g., checkout → shipment) become cumbersome as applications grow, creating tight inter‑service dependencies and slowing development. - By streaming events (e.g., checkout events) that any service can subscribe to, Kafka cleanly decouples system components, eliminating complex point‑to‑point integrations. - This event‑driven, message‑based approach works both for retrofitting existing applications and for building cloud‑native services from the ground up. ## Sections - [00:00:00](https://www.youtube.com/watch?v=aj9CDZm0Glc&t=0s) **Apache Kafka Enables Real‑Time Streaming** - Whitney Lee describes how Kafka’s distributed, fault‑tolerant, ordered streaming architecture delivers low‑latency, high‑volume data processing for modern cloud applications, contrasting it with legacy integration approaches. - [00:03:17](https://www.youtube.com/watch?v=aj9CDZm0Glc&t=197s) **Kafka Use Cases: Decoupling, Tracking, Analytics** - The speaker explains how Kafka streams events such as checkout actions, ride‑share driver locations, and user activity, allowing independent services to subscribe and react, thereby illustrating decoupled architecture, real‑time tracking, and analytics collection. - [00:06:34](https://www.youtube.com/watch?v=aj9CDZm0Glc&t=394s) **Kafka APIs Overview and Integration** - The passage outlines Kafka’s consumer, streams, and connector APIs, explaining how persistent topics are consumed, transformed in real time, and how reusable connectors simplify integrating external data sources. ## Full Transcript
Users of modern day cloud applications expect a real-time experience.
How is this achieved?
My name is Whitney Lee, I'm a cloud developer here at IBM.
Apache Kafka is an open source, distributed streaming platform
that allows for the development of real-time event-driven applications.
Specifically, it allows developers to make applications that continuously produce
and consume streams of data records.
Now, Kafka is distributed.
It runs as a cluster that can span multiple servers or even multiple data centers.
The records that are produced are replicated and partitioned in such a way
that allows for a high volume of users to use the application simultaneously
without any perceptible lag in performance.
So, with that, Apache Kafka is super fast.
It also maintains a very high level of accuracy with the data records,
- and Apache Kafka maintains the order of their occurrence,
and, finally, because it's replicated,
Apache Kafka is also resilient and fault-tolerant.
So, these characteristics all together add up to an extremely powerful platform.
Let's talk about some use-cases for this.
Or, actually, before we do, let's talk about how applications used to be made
before event streaming was on the scene.
If the developer wanted to make a retail application, for example,
they would might make a checkout,
and then, with that checkout, when it happens, they want it to trigger a shipment.
So, a user checks out and then the order gets shipped.
They need to write an integration for that to happen,
consider the shape of the data,
the way the data is transported, and the format of the data,
but it's only one integration, so it's not a huge deal.
But, as the application grows, maybe we want to add
an automated email receipt when a checkout happens,
or maybe we want to add an update to the inventory
when a checkout happens.
As front and back end services get added, and the application grows,
more and more integrations need to get built and it can get very messy.
Not only that, but the teams in charge of each of the services
are now reliant upon each other before they can make any changes
and development is slow.
So, one great use case for Apache Kafka is decoupling system dependencies.
So, with Apache Kafka, all the hard integrations go away
and, instead, what we do is the checkout will stream events.
So, every time a checkout happens, that will get streamed,
and the checkout is not concerned with who's listening to that stream.
It's broadcasting those events.
Then the other services - email, shipment, inventory,
they subscribe to that stream, they choose to listen to that one,
and then they get the information they need and it triggers them to act accordingly.
So, this is how Kafka can decouple your system dependencies
and it is also a good use-case for how Kafka can be used for messaging.
So, even if this application was built from the ground up
as a cloud-native application, it could still be built in this way,
and use messaging to move the checkout experience along.
Another use case for Apache Kafka could be location tracking.
An example of this might be a ride share service.
So, a driver in a ride share service using the application
would turn on their app and maybe every, let's say, every second
a new event would get admitted with their current location.
This can be used by the application on a smaller scale,
say, to let an individual user know how close their particular ride is
or on a large scale, to calculate surge pricing,
to show a user a map before they choose which ride they want.
Another way to use Apache Kafka, another use-case
would be data gathering.
This can be used
in a simple way just to collect analytics, to optimize your website,
or it can be used more in a more complex way
with a a music streaming service, for example.
Where one user, every song they listen to can be a stream of records,
and your application could use that stream
to give real-time recommendations to that user.
Or, it can take the data records from all the users,
aggregate them, and then come up with a list of an artist's top songs.
So, this is in no way exhaustive,
but these are some very interesting use-cases
to show how powerful Kafka is and ways things that you can do with it,
but let's give an overview of how Kafka works.
Kafka is built on four core APIs.
The first one is the "producer" API.
The producer API
allows your application to produce, to make, these streams of data.
So, it creates the records and produces them to topics.
A "topic" is an ordered list of events.
Now the topic can persist to disk -
that's where it can be saved for just a matter of minutes if it's going to be consumed immediately
or you can have it saved for hours, days, or even forever.
As long as you have enough storage space that the topics are persisted to physical storage.
Then we have the consumer API.
The consumer API subscribes to one or more topics
and listens and ingests that data.
It can subscribe to topics in real time
or it can consume those old data records that are saved to the topic.
Now producers can produce directly to consumers
and that works for a simple Kafka application where the data doesn't change,
but to transform that data, what we need is the streams API.
The streams API is very powerful.
It leverages the producer and the consumer APIs.
So, it will consume from a topic or topics
and then it will analyze, aggregate, or otherwise transform the data
in real time, and then produce the resulting streams
to a topic - either the same topics or to new topics.
This is really at the core of what makes Kafka so amazing, and what powers
the more complex use-cases like the location tracking or the data gathering.
Finally, we have the connector API.
The connector API enables developers to write connectors,
which are reusable producers and consumers.
So, in a Kafka cluster many developers
might need to integrate the same type of data source,
like a MongoDB, for example.
Well, not every single developer should have to write that integration,
what the connector API allows
is for that integration to get written once, the code is there,
and then all the developer needs to do is configure it
in order to get that data source into their cluster.
So, modern day cloud application users expect a real-time experience
and Kafka is what's behind that technology.
Thank you! If you have questions please drop us a line below.
If you want to see more videos like this in the future
please like and subscribe
and don't forget:
you can grow your skills and earn a badge with IBM Cloud Labs
which are free, browser-based interactive Kubernetes labs.