Learning Library

← Back to Library

Ensuring Consistent Distributed Data with etcd

6m • Unknown Channel • databases • tutorial • intermediate • Watch on YouTube ↗

Key Points

etcd is an open‑source, fully replicated key‑value store that acts as the single source of truth for Kubernetes state, configuration, and metadata.
It achieves strong consistency by using the Raft consensus algorithm, where a leader node coordinates writes and only commits them after a majority of follower nodes have persisted the change.
Clients can read from or write to any cluster node; followers forward read requests to the leader to ensure the most up‑to‑date value is returned.
The system is highly available—if the leader fails, remaining nodes hold an election to promptly select a new leader, allowing the cluster to continue operating without a single point of failure.
This replication and consensus design enables etcd to provide reliable, low‑latency data storage across distributed environments.

Sections

Full Transcript

# Ensuring Consistent Distributed Data with etcd **Source:** [https://www.youtube.com/watch?v=OmphHSaO1sE](https://www.youtube.com/watch?v=OmphHSaO1sE) **Duration:** 00:06:19 ## Summary - etcd is an open‑source, fully replicated key‑value store that acts as the single source of truth for Kubernetes state, configuration, and metadata. - It achieves strong consistency by using the Raft consensus algorithm, where a leader node coordinates writes and only commits them after a majority of follower nodes have persisted the change. - Clients can read from or write to any cluster node; followers forward read requests to the leader to ensure the most up‑to‑date value is returned. - The system is highly available—if the leader fails, remaining nodes hold an election to promptly select a new leader, allowing the cluster to continue operating without a single point of failure. - This replication and consensus design enables etcd to provide reliable, low‑latency data storage across distributed environments. ## Sections - [00:00:00](https://www.youtube.com/watch?v=OmphHSaO1sE&t=0s) **Untitled Section** - - [00:03:23](https://www.youtube.com/watch?v=OmphHSaO1sE&t=203s) **etcd: Consistent, Highly Available Store** - The speaker explains that etcd provides strong consistency, automatic leader election for high availability, fast writes constrained by disk speed, TLS‑secured persistence, a simple HTTP/JSON API, and a watch feature that Kubernetes uses to detect state drift. ## Full Transcript

0:00How can you ensure that your data is stored consistently 0:03and reliably across a distributed system? My name is Whitney Lee and I'm a Cloud 0:09Developer here at IBM. etcd is an open source key value data 0:14store used to manage and store data that help 0:19keep distributed systems running. etcd is most well known for being one of 0:23the core components of Kubernetes, where it stores and manages Kubernetes 0:28state data, configuration data, and metadata. etcd can be relied upon 0:35to be a single source of truth at any given point in time. 0:41Today I'm going to go over some of the features of etcd that allow it to be so 0:45effective in this way. 0:48etcd is fully replicated. 0:55This means that every node in an etcd cluster 0:59has access to the full data store. etcd is also reliably consistent. 1:10Every data read in an etcd cluster is going to return the most recent data 1:15right. Let's talk about how this works. etcd 1:19is built on top of the Raft algorithm that is used for distributed consensus. 1:25So, let's make a very simple etcd cluster of only four nodes. An etcd cluster 1:32always has a leader and then the other nodes in the cluster 1:36are followers. It's a key value data store, so in this 1:40case at key one we have the value of seven. 1:44Let's say a web application comes in 1:49and lets the leader node know at key one we want to store the value of 17 instead 1:55of 7. The leader node does not change its own 2:00local data store, instead it forwards that request to each 2:04of the followers. When a follower changes its local data 2:09store it returns that to the leader, so the 2:12leader knows. When our leader node can see that the 2:16majority of the nodes have been updated to the most current 2:20data that's when the leader will update its own current data store 2:24and return a successful write to the client. 2:29Now client doesn't actually have to concern itself 2:32about which node in the cluster is the leader. The client can make 2:36read and write requests to any node in the cluster. 2:40So, let's say, this all happens over a matter of milliseconds, 2:44but let's say that the client makes a read request to the node that hasn't 2:48updated yet and says what's the value at key one? 2:53Well this follower node knows it's a follower node and knows it's not 2:58authorized to answer the client directly. So what it's going to do is forward that 3:02request into the leader node which will then respond the cluster's 3:07current value at key 1 is 17. And so it will get a response of 17 to 3:14the client. And that's how etcd is replicated. 3:23So every every node in the cluster has access to the full data store 3:28and it's consistent every data read is going to return 3:32the most recent data right. etcd is also highly available. 3:44This means that there's no single point of failure in the etcd cluster. 3:49It can tolerate gracefully tolerate network partitions and hardware failure 3:53too. So, let's say that our leader node goes 3:57down. The followers can declare themselves a 4:00candidate, they'll hold an election where each one 4:03votes based on availability and a new node will be elected the 4:07leader. That leader will go on to manage the 4:10replication for the cluster and the data is unaffected. 4:18etcd is also fast. 4:24etcd is benchmarked at 10,000 writes per second. 4:28With that said, etcd does persist data to disk. 4:32So, etcd's performance is tied to your storage disk speed. 4:37etcd is secure. etcd uses transport layer security with 4:45optional SSL client certificate authentication. 4:49etcd stores vital and highly sensitive configuration data, 4:53so it's important to keep it protected. Finally etcd is simple to use. 5:04A web application can read and write data to etcd uses a 5:07simple http JSON tools. 5:12So the other thing to talk about in etcd that's important 5:15is the watch function. Kubernetes leverages this. 5:19So, as i talked about at the beginning, etcd stores Kubernetes configuration data 5:26and its state data. 5:31So, etcd can use this watch function to compare these to each other. If they 5:39ever go out of sync, etcd will let the Kubernetes 5:42API know and the kubernetes API will reconfigure 5:45the cluster accordingly. 5:49etcd can be used to store your data reliably and consistently across your 5:57distributed system. Thank you. if you have questions please 6:01drop us a line below. If you want to see more videos like this 6:04in the future, please like and subscribe. And don't forget you can 6:09grow your skills and earn a badge with IBM CloudLabs, 6:12which are free browser-based, interactive Kubernetes labs.