Installation
# Download binary (Linux)
ETCD_VER=v3.5.13
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
| tar xz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
# macOS
brew install etcd
# Docker (single node)
docker run -d \
--name etcd \
-p 2379:2379 \
-p 2380:2380 \
quay.io/coreos/etcd:v3.5.13 \
etcd \
--advertise-client-urls http://0.0.0.0:2379 \
--listen-client-urls http://0.0.0.0:2379
# Verify
etcd --version
etcdctl version
Set ETCDCTL API Version
# Always use API v3 (v2 is deprecated)
export ETCDCTL_API=3
# Or prefix every command
ETCDCTL_API=3 etcdctl get foo
Configuration
Single-Node etcd.yaml
name: "etcd-node-1"
data-dir: /var/lib/etcd
# Client communication
listen-client-urls: https://0.0.0.0:2379
advertise-client-urls: https://etcd-node-1.example.com:2379
# Peer communication
listen-peer-urls: https://0.0.0.0:2380
initial-advertise-peer-urls: https://etcd-node-1.example.com:2380
# Cluster bootstrap
initial-cluster: etcd-node-1=https://etcd-node-1.example.com:2380
initial-cluster-token: etcd-cluster-prod
initial-cluster-state: new
# TLS — clients
client-transport-security:
cert-file: /etc/etcd/pki/server.crt
key-file: /etc/etcd/pki/server.key
trusted-ca-file: /etc/etcd/pki/ca.crt
client-cert-auth: true
# TLS — peers
peer-transport-security:
cert-file: /etc/etcd/pki/peer.crt
key-file: /etc/etcd/pki/peer.key
trusted-ca-file: /etc/etcd/pki/ca.crt
peer-client-cert-auth: true
# Performance
heartbeat-interval: 100
election-timeout: 1000
snapshot-count: 10000
max-request-bytes: 1572864 # 1.5MB
quota-backend-bytes: 8589934592 # 8GB
# Logging
log-level: info
logger: zap
Three-Node Cluster Config
# Node 1
etcd --name node1 \
--data-dir /var/lib/etcd \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://node1:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-advertise-peer-urls http://node1:2380 \
--initial-cluster node1=http://node1:2380,node2=http://node2:2380,node3=http://node3:2380 \
--initial-cluster-token cluster-token \
--initial-cluster-state new
# Nodes 2 and 3: same flags with different --name and --advertise URLs
etcdctl Connection Flags
# Shorthand env vars to avoid repeating flags
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://node1:2379,https://node2:2379,https://node3:2379
export ETCDCTL_CACERT=/etc/etcd/pki/ca.crt
export ETCDCTL_CERT=/etc/etcd/pki/client.crt
export ETCDCTL_KEY=/etc/etcd/pki/client.key
Core Commands
Key-Value Operations
| Command | Description |
|---|
etcdctl put key value | Set a key |
etcdctl get key | Get a key |
etcdctl get key --print-value-only | Value only |
etcdctl get key --hex | Raw bytes as hex |
etcdctl get "" --prefix --keys-only | List all keys |
etcdctl get /app/ --prefix | Get keys with prefix |
etcdctl get key1 key2 | Get key range [key1, key2) |
etcdctl del key | Delete a key |
etcdctl del /app/ --prefix | Delete all keys with prefix |
etcdctl del key1 key2 | Delete key range |
Watching Keys
| Command | Description |
|---|
etcdctl watch key | Watch a key for changes |
etcdctl watch /app/ --prefix | Watch all keys with prefix |
etcdctl watch key --rev=5 | Watch from revision 5 |
etcdctl watch -i | Interactive watch mode |
etcdctl watch key --prev-kv | Show previous value on change |
Cluster Management
| Command | Description |
|---|
etcdctl endpoint status | Status of each endpoint |
etcdctl endpoint status --write-out=table | Pretty table |
etcdctl endpoint health | Health check |
etcdctl member list | List cluster members |
etcdctl member list --write-out=table | Pretty table |
etcdctl member add node4 --peer-urls=http://node4:2380 | Add member |
etcdctl member remove <member-id> | Remove member |
etcdctl member update <id> --peer-urls=... | Update peer URL |
etcdctl move-leader <member-id> | Transfer leadership |
Snapshots and Backup
| Command | Description |
|---|
etcdctl snapshot save backup.db | Save snapshot |
etcdctl snapshot status backup.db | Snapshot info |
etcdctl snapshot status backup.db --write-out=table | Pretty status |
etcdctl snapshot restore backup.db --data-dir=/var/lib/etcd-restore | Restore snapshot |
Compaction and Defragmentation
| Command | Description |
|---|
etcdctl compaction <revision> | Compact to revision |
etcdctl defrag | Defragment storage |
etcdctl defrag --endpoints=all | Defrag all members |
Auth and RBAC
| Command | Description |
|---|
etcdctl auth enable | Enable authentication |
etcdctl auth disable | Disable authentication |
etcdctl user add alice | Create user |
etcdctl user get alice | Get user info |
etcdctl user list | List users |
etcdctl user delete alice | Delete user |
etcdctl user grant-role alice viewer | Assign role to user |
etcdctl user revoke-role alice viewer | Remove role from user |
etcdctl role add viewer | Create role |
etcdctl role get viewer | Get role permissions |
etcdctl role list | List roles |
etcdctl role delete viewer | Delete role |
etcdctl role grant-permission viewer read /app/ | Grant read on prefix |
etcdctl role revoke-permission viewer /app/ | Revoke permission |
Advanced Usage
Leases (TTL Keys)
# Create a lease with 60s TTL
etcdctl lease grant 60
# Returns: lease 694d5765fc0a1567 granted with TTL(60s)
# Attach key to lease
etcdctl put session/user123 "data" --lease=694d5765fc0a1567
# Keep-alive (renew) a lease
etcdctl lease keep-alive 694d5765fc0a1567
# Revoke lease (immediately expires all attached keys)
etcdctl lease revoke 694d5765fc0a1567
# Get lease info
etcdctl lease timetolive 694d5765fc0a1567
etcdctl lease timetolive 694d5765fc0a1567 --keys # show attached keys
Transactions (Atomic Compare-and-Swap)
# etcdctl interactive transaction
etcdctl txn --interactive
# Or one-liner:
etcdctl txn <<EOF
compares:
value("lock") = ""
success requests:
put lock "owner-1"
failure requests:
get lock
EOF
Using the Go client:
// Compare-and-swap (CAS) in Go
resp, err := client.Txn(ctx).
If(clientv3.Compare(clientv3.Value("lock"), "=", "")).
Then(clientv3.OpPut("lock", "owner-1", clientv3.WithLease(leaseID))).
Else(clientv3.OpGet("lock")).
Commit()
if resp.Succeeded {
fmt.Println("Lock acquired")
} else {
fmt.Println("Lock already held by:", resp.Responses[0].GetResponseRange().Kvs[0].Value)
}
Watching with Python Client
import etcd3
client = etcd3.client(host='localhost', port=2379)
# Put and get
client.put('/app/config', '{"timeout": 30}')
value, metadata = client.get('/app/config')
print(value.decode())
# Watch prefix
events_iterator, cancel = client.watch_prefix('/app/')
for event in events_iterator:
print(f"Event: {event.key} = {event.value}")
if should_stop:
cancel()
break
# Lease
lease = client.lease(60)
client.put('/lock/resource', 'owner', lease=lease)
lease.refresh()
lease.revoke()
# Transaction
status, responses = client.transaction(
compare=[client.transactions.value('/lock') == b''],
success=[client.transactions.put('/lock', 'owner')],
failure=[client.transactions.get('/lock')]
)
Snapshot Restore Procedure
# 1. Stop etcd on ALL nodes
systemctl stop etcd
# 2. Save snapshot from the healthy node
etcdctl snapshot save /backup/etcd-snapshot-$(date +%F).db
# 3. Verify snapshot
etcdctl snapshot status /backup/etcd-snapshot-2026-05-16.db --write-out=table
# 4. Restore to new data directory on EACH node (different --name per node)
etcdctl snapshot restore /backup/etcd-snapshot-2026-05-16.db \
--name=node1 \
--data-dir=/var/lib/etcd \
--initial-cluster=node1=http://node1:2380,node2=http://node2:2380,node3=http://node3:2380 \
--initial-cluster-token=cluster-token \
--initial-advertise-peer-urls=http://node1:2380
# 5. Start etcd on all nodes
systemctl start etcd
# 6. Verify
etcdctl endpoint status --write-out=table
Compaction Strategy
# Get current revision
etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision'
# Compact to keep last 1 hour of history
# (etcd doesn't do this automatically with relative time — use a script)
CURRENT_REV=$(etcdctl endpoint status --write-out=json | jq -r '.[0].Status.header.revision')
etcdctl compaction $((CURRENT_REV - 10000)) # keep ~10k revisions
# Defrag after compaction to reclaim disk space
etcdctl defrag --endpoints=https://node1:2379,https://node2:2379,https://node3:2379
TLS Setup with cfssl
# Install cfssl
go install github.com/cloudflare/cfssl/cmd/cfssl@latest
go install github.com/cloudflare/cfssl/cmd/cfssljson@latest
# Generate CA
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
# Generate server cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
-config=ca-config.json -profile=server \
server-csr.json | cfssljson -bare server
# Generate peer cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
-config=ca-config.json -profile=peer \
peer-csr.json | cfssljson -bare peer
# Generate client cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
-config=ca-config.json -profile=client \
client-csr.json | cfssljson -bare client
Common Workflows
Distributed Lock
# Acquire lock (etcdctl lock uses lease-based locking)
etcdctl lock my-lock
# Run command under lock
etcdctl lock my-lock -- bash -c 'echo "I have the lock"; sleep 5'
Leader Election
# Campaign for leadership (blocks until leader)
etcdctl elect my-election leader1
# In another terminal
etcdctl elect my-election leader2 # waits until leader1 exits
Kubernetes etcd Backup
# Typical K8s etcd backup (run on control plane node)
ETCDCTL_API=3 etcdctl snapshot save /backup/k8s-etcd.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Schedule via cron
# 0 2 * * * ETCDCTL_API=3 etcdctl snapshot save /backup/k8s-etcd-$(date +\%F).db ...
Monitor etcd with Prometheus
# etcd exposes metrics on :2381/metrics by default
# prometheus.yml
scrape_configs:
- job_name: etcd
scheme: https
tls_config:
ca_file: /etc/prometheus/etcd-ca.crt
cert_file: /etc/prometheus/etcd-client.crt
key_file: /etc/prometheus/etcd-client.key
static_configs:
- targets: ['node1:2381', 'node2:2381', 'node3:2381']
Key metrics to watch:
| Metric | What it signals |
|---|
etcd_server_leader_changes_seen_total | Leader churn (instability) |
etcd_server_proposals_failed_total | Consensus failures |
etcd_disk_wal_fsync_duration_seconds | Disk write latency |
etcd_network_peer_round_trip_time_seconds | Inter-node latency |
etcd_mvcc_db_total_size_in_bytes | DB size (watch quota) |
Tips and Best Practices
- Always set
ETCDCTL_API=3 — the default in some older installs is v2, which has a completely different data model.
- Snapshot daily and test restores — an untested backup is not a backup; practice the restore procedure before you need it.
- Watch the DB size (
etcd_mvcc_db_total_size_in_bytes) — etcd has a default quota of 2GB; compact and defrag regularly.
- Three nodes is the minimum for HA — a 3-node cluster tolerates one failure; 5 nodes tolerates two.
- etcd is not a general-purpose database — it is designed for small configuration data (~1.5MB per value, ~8GB total); do not store large blobs.
- Separate etcd from the workload — run etcd on dedicated nodes with fast SSDs; disk latency directly affects leader election stability.
- Enable TLS everywhere — both client-to-server and peer-to-peer;
client-cert-auth: true prevents unauthorized access.
- Use quorum reads (
--consistency=l) when stale reads are unacceptable — linearizable reads go through the leader.
- Never remove a member without telling the cluster first (
etcdctl member remove) — abrupt node removal can split the cluster.
- Heartbeat and election timeouts — keep
heartbeat-interval at 100ms and election-timeout at 1000ms for LAN; increase by 5–10x for high-latency links.
- Auth must be enabled with a root user first — run
etcdctl user add root and etcdctl auth enable before locking down; otherwise you lock yourself out.
- Compaction does not free disk space alone — always follow compaction with
etcdctl defrag to release pages back to the OS.