Control and Data Plane Architecture for Cloud Databases
Key Points
- The control plane/data‑plane distinction is a fundamental design principle for scalable cloud services, influencing everything from routers to Kubernetes‑based platforms.
- In a managed database service, user‑facing clients interact with the data plane for read/write operations, while administrative actions (e.g., backups, version upgrades) are handled through a control‑plane API.
- Automation of resource‑intensive tasks—often via Kubernetes operators—executes user intents on the data plane and records actions in a metadata store that supports billing, user management, and other admin functions.
- This separation enhances security and operational isolation, allowing the platform to protect data access while still exposing flexible management capabilities.
Sections
- Control vs Data Plane in Managed Databases - The speaker explains how control‑plane and data‑plane architectures underpin scalable cloud services, using a managed PostgreSQL offering to illustrate their roles and trade‑offs.
- Isolated Metadata for Secure Scalable Services - The speaker describes how using a dedicated metadata store for backup requests—supporting billing and user management—provides natural network isolation for security and allows independent scaling of control‑plane services versus data‑plane resources in a hosted managed database, a pattern that extends to other cloud platforms like object storage.
- Separating Control and Data Planes for DNS - The speaker explains how isolating the control plane from the data plane enables infrastructure‑specific optimizations and security when designing a DNS hosting service that provisions records via an API and resolves queries at massive scale.
- Benefits of Control/Data Plane Separation - The speaker outlines how applying control‑plane and data‑plane architecture principles to networking and storage platforms yields security enhancements, independent scalability, and performance gains such as lower latency and higher throughput.
- Control vs Data Plane Trade‑offs - The speaker explains how separating control and data planes adds code maintenance and expands the attack surface, weighing the pros and cons and advising when this architecture is appropriate for simple APIs versus complex data‑service platforms.
Full Transcript
# Control and Data Plane Architecture for Cloud Databases **Source:** [https://www.youtube.com/watch?v=Ep1QW-wOmgc](https://www.youtube.com/watch?v=Ep1QW-wOmgc) **Duration:** 00:14:21 ## Summary - The control plane/data‑plane distinction is a fundamental design principle for scalable cloud services, influencing everything from routers to Kubernetes‑based platforms. - In a managed database service, user‑facing clients interact with the data plane for read/write operations, while administrative actions (e.g., backups, version upgrades) are handled through a control‑plane API. - Automation of resource‑intensive tasks—often via Kubernetes operators—executes user intents on the data plane and records actions in a metadata store that supports billing, user management, and other admin functions. - This separation enhances security and operational isolation, allowing the platform to protect data access while still exposing flexible management capabilities. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Ep1QW-wOmgc&t=0s) **Control vs Data Plane in Managed Databases** - The speaker explains how control‑plane and data‑plane architectures underpin scalable cloud services, using a managed PostgreSQL offering to illustrate their roles and trade‑offs. - [00:03:05](https://www.youtube.com/watch?v=Ep1QW-wOmgc&t=185s) **Isolated Metadata for Secure Scalable Services** - The speaker describes how using a dedicated metadata store for backup requests—supporting billing and user management—provides natural network isolation for security and allows independent scaling of control‑plane services versus data‑plane resources in a hosted managed database, a pattern that extends to other cloud platforms like object storage. - [00:06:30](https://www.youtube.com/watch?v=Ep1QW-wOmgc&t=390s) **Separating Control and Data Planes for DNS** - The speaker explains how isolating the control plane from the data plane enables infrastructure‑specific optimizations and security when designing a DNS hosting service that provisions records via an API and resolves queries at massive scale. - [00:09:46](https://www.youtube.com/watch?v=Ep1QW-wOmgc&t=586s) **Benefits of Control/Data Plane Separation** - The speaker outlines how applying control‑plane and data‑plane architecture principles to networking and storage platforms yields security enhancements, independent scalability, and performance gains such as lower latency and higher throughput. - [00:13:03](https://www.youtube.com/watch?v=Ep1QW-wOmgc&t=783s) **Control vs Data Plane Trade‑offs** - The speaker explains how separating control and data planes adds code maintenance and expands the attack surface, weighing the pros and cons and advising when this architecture is appropriate for simple APIs versus complex data‑service platforms. ## Full Transcript
If you're going to be designing a cloud based data services platform at scale,
you're going to want to know about the control plane and data plane
system architecture design principles.
These software design principles can be found in the construction and design
of hardware systems and software systems alike.
From routers to software defined networking platforms
to storage based cloud platforms
and even in Kubernetes, where it's built into the framework itself.
The control plane and data plane are going to be important themes
as we look at 3 different cloud based platforms.
And as we work through each of them,
you can see how the advantages and disadvantages come out in each.
So let's start off today by talking about a data base hosted managed platform.
Let's say we're offering our customers instances of databases such as Postgres.
The user can connect to their database through a database client.
Maybe it's their application,
maybe they're running a client on their local machine,
really whatever they want to do with it.
That's up to them.
You can learn more about managed database services, by the way,
in this video over here.
Now this instance is going to be running on our data plane infrastructure.
The user can pull data from it, write data to it.
But when they want to interact with the instance itself,
the service that we're running on their behalf,
they're going to want to interact with an API.
A simple interface where they can do more administrative level tasks,
such as upgrading the version of the database, requesting backups, things like that.
So let's say that the customer wants to create a backup.
They're going to hit the API first,
and then that API is going to reach into the data plane
and they're going to create themselves a task.
Now this is going to be a pretty resource intensive task.
So let's have a set of automation
that's going to execute this task on behalf of the users request.
This automation might be comprised of Kubernetes operators, for example.
That's what we do in cloud databases at IBM.
And you can learn more about operators and Kubernetes operators
in this video over here.
So we looked at when the user expresses the intent of creating a backup on their instance.
Let's talk about some of the administrative things we need to do on the platform side.
So they created a backup.
Well, we built them for that because we're a business, right?
We're going to store that backup request in a metadata data store,
and that metadata data store might support a billing system.
The metadata data store might also support user management.
You'll find that these administrative components
are going to be common themes through a number of our platforms that we talk about today.
We've just set up the infrastructure for a hosted managed database service.
Now, a few advantages are going to start to bring themselves to light in this architecture alone.
One of them is going to be security.
Let's say the user is going to
try to reach into their database and use that as an attack vector
to pull all of the data about the other customers of your platform.
They're going to get contained here
because as they try to reach into the billing data
or the metadata store,
they're going to be prevented by this natural network isolation.
Similarly, you're going to have the advantage of scaling
the infrastructure independently for your data plane versus your control plane.
So let's say you're hosting your database service on a virtual machine.
If you need to host another one why not just add another virtual machine?
Now, you don't have to replicate all of this administrative overhead
running as services in your control plane
in each of the machines that are running your data plane.
Let's see what we can do with other cloud based platforms such as object storage.
So instead of databases,
the instances that we're going to be hosting for our customers
are going to be object storage instances
where the user is going to be storing buckets.
So these buckets are going to contain whatever the customer wants to put in there
videos, text files, blobs of who knows what.
It's up to the customer again.
We're going to see how the data plane can offer
our infrastructure optimizations in this case as well.
So by having this natural separation,
we can now provide a thing like a CDN to our data plane,
which will be helpful for the user to read large amounts of data
very quickly, which will utilize caching
and localization to improve reads.
So a CDN will be something that's going to be very helpful for object storage,
where it might not have been as intuitive how it would be applied to a database service.
We'll also want to consider load balancing.
With the object storage platform,
you're going to be receiving thousands to hundreds of thousands times more requests
in traffic to your data plane, reading and writing to these object stores,
much larger packet sizes,
and many more packets coming into this data plane
than you would arriving into your control plane.
So in this case, you can optimize for what infrastructure is present on your data plane.
This is going to handle 10x, 100x, 1000x gigabytes per second
versus this one's load balancer.
So here already, we're starting to see how it's pretty advantageous
to have isolation between the data plane and the control plane
and optimizations that we can apply to
one set of the infrastructure versus the other.
Now let's talk about a networking service that we can
design using the same infrastructure principles,
the control plane and data plane.
In this case we're going to be hosting a DNS platform.
So we're going to be resolving queries
either from within the private network or public network
to this DNS service.
Customers are going to provision DNS instances,
and they're going to be creating their own set of DNS records.
So the API is going to look a little different
from when it was providing the database service
and creating modifying database instances
and object storage instances.
In this case, we might expose endpoints that will
allow the user to create "A" records, "C" records, etc.,
and then these records are going to be propagated into the data plane
where the DNS resolver, the instance that they provisioned on our platform,
will then be able to resolve DNS queries according to how
the user has configured their instance to do so.
So with the DNS service,
now we can also talk about how the security
of this system can be optimized in the data plane versus the control plane,
but in a slightly different way from how I described it earlier.
We've already covered how there's a natural network isolation
between the control plane and data plane infrastructure.
But let's think about what we want to support in terms of traffic
for the data plane versus the control plane for the DNS platform.
In this case, we can actually now support UDP.
So UDP requests are going to be hitting the DNS resolver,
the instance that the customer has provisioned on our platform,
and that will optimize for latency and throughput,
at the expense of some consistency
if you consider the UDP protocol.
Versus if you look at the control plane and the types of protocols that we want to support there
and what we want to be optimizing for in this case,
you're going to want to have a higher level of consistency in this case,
as opposed to the latency.
And you'll sacrifice some latency as well.
When you create these A records
and you have them propagate into the data plane
it will take a little while because it has to make the network jump
and it has to propagate throughout the system.
So we just worked through 3 different examples
on both the networking platform side of things,
as well as the storage platform.
There are some advantages and disadvantages that are going to arise
from applying the control plane / data plane system architecture design principles to your platform,
and you might consider these as somewhat of a double edged sword.
Let's talk about some of those advantages that we saw as common themes through each of these examples.
You're going to have some security gains for sure.
We talked about it with DNS.
An example is allowing UDP where you want it in the control plane.
You can focus on those inherent problems to your your control plane infrastructure v
ersus your data plane infrastructure.
You can really drill down on those and make sure that your systems lock down.
We talked about scalability.
The ability to independently scale the infrastructure
on one side of your system versus the other.
It'll simplify things for you,
and it'll make you more ready to support that increased demand
in the future once you start gaining all of those new customers.
Then you have performance improvements such as latency.
And this ties into the kinds of problems
that we talked about earlier, how, let's say in the data plane,
you want to allow a greater throughput of requests
and respond with a greater volume of responses
in the same amount of time, versus your control plane.
And then more of a meta advantage that you get from this
is the ability to allocate your engineers, your work resources,
your management around these infrastructural planes
and the components within them.
So on the control plane you might have a control plane team
with subject matter experts that focus on things like the API.
Your front end guy will focus on that.
Or somebody who knows about the billing management system
will be your go-to guy for that.
Then you might have somebody on your data plane side who
will focus on the operators or the set of automations,
whether it's Jenkins or Travis, things like that.
And lastly, and really, you can slice and dice it however you want,
you can have engineers focused on the software
that's going to be providing the service itself.
So that helps you organize your team.
There are some disadvantages as well.
The other side of the sword.
And those will ...
well, let's use a different color for that.
And those will be your overhead.
So two types of infrastructure
calls for the need to support the complexity of that infrastructure.
Bridging these two types of infrastructure
call for more code or maintenance of that code.
And another thing to consider is your increase to the attack vector,
or your attack surface, excuse me.
More infrastructure, more types of infrastructure.
Give your adversaries, the people that want to take down your platform,
the ability to exploit different types of problems
that are inherent to each of your two infrastructural planes.
We covered some of the pros and cons, both sides of this double edged sword.
Let's think about whether this system architecture design principle
is applicable for your system.
Are you going to be hosting something simple
like a pet store API,
or are you going to be hosting a platform
which will offer your users data services
where they can run their own analytics, their own data processing systems?
If it's more towards the latter, you're going to want to consider
the control plane / data plane system architecture design principles t
o be applied for your platform.
If you liked this video and want to see more like it,
please like and subscribe!
If you have any questions or want to share your thoughts about this topic,
please leave a comment below.