etcd

Installation

# Download binary (Linux)
ETCD_VER=v3.5.13
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
  | tar xz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/

# macOS
brew install etcd

# Docker (single node)
docker run -d \
  --name etcd \
  -p 2379:2379 \
  -p 2380:2380 \
  quay.io/coreos/etcd:v3.5.13 \
  etcd \
  --advertise-client-urls http://0.0.0.0:2379 \
  --listen-client-urls http://0.0.0.0:2379

# Verify
etcd --version
etcdctl version

Set ETCDCTL API Version

# Always use API v3 (v2 is deprecated)
export ETCDCTL_API=3

# Or prefix every command
ETCDCTL_API=3 etcdctl get foo

Configuration

Single-Node `etcd.yaml`

name: "etcd-node-1"
data-dir: /var/lib/etcd

# Client communication
listen-client-urls: https://0.0.0.0:2379
advertise-client-urls: https://etcd-node-1.example.com:2379

# Peer communication
listen-peer-urls: https://0.0.0.0:2380
initial-advertise-peer-urls: https://etcd-node-1.example.com:2380

# Cluster bootstrap
initial-cluster: etcd-node-1=https://etcd-node-1.example.com:2380
initial-cluster-token: etcd-cluster-prod
initial-cluster-state: new

# TLS — clients
client-transport-security:
  cert-file: /etc/etcd/pki/server.crt
  key-file: /etc/etcd/pki/server.key
  trusted-ca-file: /etc/etcd/pki/ca.crt
  client-cert-auth: true

# TLS — peers
peer-transport-security:
  cert-file: /etc/etcd/pki/peer.crt
  key-file: /etc/etcd/pki/peer.key
  trusted-ca-file: /etc/etcd/pki/ca.crt
  peer-client-cert-auth: true

# Performance
heartbeat-interval: 100
election-timeout: 1000
snapshot-count: 10000
max-request-bytes: 1572864   # 1.5MB
quota-backend-bytes: 8589934592  # 8GB

# Logging
log-level: info
logger: zap

Three-Node Cluster Config

# Node 1
etcd --name node1 \
  --data-dir /var/lib/etcd \
  --listen-client-urls http://0.0.0.0:2379 \
  --advertise-client-urls http://node1:2379 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --initial-advertise-peer-urls http://node1:2380 \
  --initial-cluster node1=http://node1:2380,node2=http://node2:2380,node3=http://node3:2380 \
  --initial-cluster-token cluster-token \
  --initial-cluster-state new

# Nodes 2 and 3: same flags with different --name and --advertise URLs

etcdctl Connection Flags

# Shorthand env vars to avoid repeating flags
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://node1:2379,https://node2:2379,https://node3:2379
export ETCDCTL_CACERT=/etc/etcd/pki/ca.crt
export ETCDCTL_CERT=/etc/etcd/pki/client.crt
export ETCDCTL_KEY=/etc/etcd/pki/client.key

Core Commands

Key-Value Operations

Command	Description
`etcdctl put key value`	Set a key
`etcdctl get key`	Get a key
`etcdctl get key --print-value-only`	Value only
`etcdctl get key --hex`	Raw bytes as hex
`etcdctl get "" --prefix --keys-only`	List all keys
`etcdctl get /app/ --prefix`	Get keys with prefix
`etcdctl get key1 key2`	Get key range [key1, key2)
`etcdctl del key`	Delete a key
`etcdctl del /app/ --prefix`	Delete all keys with prefix
`etcdctl del key1 key2`	Delete key range

Watching Keys

Command	Description
`etcdctl watch key`	Watch a key for changes
`etcdctl watch /app/ --prefix`	Watch all keys with prefix
`etcdctl watch key --rev=5`	Watch from revision 5
`etcdctl watch -i`	Interactive watch mode
`etcdctl watch key --prev-kv`	Show previous value on change

Cluster Management

Command	Description
`etcdctl endpoint status`	Status of each endpoint
`etcdctl endpoint status --write-out=table`	Pretty table
`etcdctl endpoint health`	Health check
`etcdctl member list`	List cluster members
`etcdctl member list --write-out=table`	Pretty table
`etcdctl member add node4 --peer-urls=http://node4:2380`	Add member
`etcdctl member remove <member-id>`	Remove member
`etcdctl member update <id> --peer-urls=...`	Update peer URL
`etcdctl move-leader <member-id>`	Transfer leadership

Snapshots and Backup

Command	Description
`etcdctl snapshot save backup.db`	Save snapshot
`etcdctl snapshot status backup.db`	Snapshot info
`etcdctl snapshot status backup.db --write-out=table`	Pretty status
`etcdctl snapshot restore backup.db --data-dir=/var/lib/etcd-restore`	Restore snapshot

Compaction and Defragmentation

Command	Description
`etcdctl compaction <revision>`	Compact to revision
`etcdctl defrag`	Defragment storage
`etcdctl defrag --endpoints=all`	Defrag all members

Auth and RBAC

Command	Description
`etcdctl auth enable`	Enable authentication
`etcdctl auth disable`	Disable authentication
`etcdctl user add alice`	Create user
`etcdctl user get alice`	Get user info
`etcdctl user list`	List users
`etcdctl user delete alice`	Delete user
`etcdctl user grant-role alice viewer`	Assign role to user
`etcdctl user revoke-role alice viewer`	Remove role from user
`etcdctl role add viewer`	Create role
`etcdctl role get viewer`	Get role permissions
`etcdctl role list`	List roles
`etcdctl role delete viewer`	Delete role
`etcdctl role grant-permission viewer read /app/`	Grant read on prefix
`etcdctl role revoke-permission viewer /app/`	Revoke permission

Advanced Usage

Leases (TTL Keys)

# Create a lease with 60s TTL
etcdctl lease grant 60
# Returns: lease 694d5765fc0a1567 granted with TTL(60s)

# Attach key to lease
etcdctl put session/user123 "data" --lease=694d5765fc0a1567

# Keep-alive (renew) a lease
etcdctl lease keep-alive 694d5765fc0a1567

# Revoke lease (immediately expires all attached keys)
etcdctl lease revoke 694d5765fc0a1567

# Get lease info
etcdctl lease timetolive 694d5765fc0a1567
etcdctl lease timetolive 694d5765fc0a1567 --keys  # show attached keys

Transactions (Atomic Compare-and-Swap)

# etcdctl interactive transaction
etcdctl txn --interactive

# Or one-liner:
etcdctl txn <<EOF
compares:
value("lock") = ""

success requests:
put lock "owner-1"

failure requests:
get lock

EOF

Using the Go client:

// Compare-and-swap (CAS) in Go
resp, err := client.Txn(ctx).
    If(clientv3.Compare(clientv3.Value("lock"), "=", "")).
    Then(clientv3.OpPut("lock", "owner-1", clientv3.WithLease(leaseID))).
    Else(clientv3.OpGet("lock")).
    Commit()

if resp.Succeeded {
    fmt.Println("Lock acquired")
} else {
    fmt.Println("Lock already held by:", resp.Responses[0].GetResponseRange().Kvs[0].Value)
}

Watching with Python Client

import etcd3

client = etcd3.client(host='localhost', port=2379)

# Put and get
client.put('/app/config', '{"timeout": 30}')
value, metadata = client.get('/app/config')
print(value.decode())

# Watch prefix
events_iterator, cancel = client.watch_prefix('/app/')
for event in events_iterator:
    print(f"Event: {event.key} = {event.value}")
    if should_stop:
        cancel()
        break

# Lease
lease = client.lease(60)
client.put('/lock/resource', 'owner', lease=lease)
lease.refresh()
lease.revoke()

# Transaction
status, responses = client.transaction(
    compare=[client.transactions.value('/lock') == b''],
    success=[client.transactions.put('/lock', 'owner')],
    failure=[client.transactions.get('/lock')]
)

Snapshot Restore Procedure

# 1. Stop etcd on ALL nodes
systemctl stop etcd

# 2. Save snapshot from the healthy node
etcdctl snapshot save /backup/etcd-snapshot-$(date +%F).db

# 3. Verify snapshot
etcdctl snapshot status /backup/etcd-snapshot-2026-05-16.db --write-out=table

# 4. Restore to new data directory on EACH node (different --name per node)
etcdctl snapshot restore /backup/etcd-snapshot-2026-05-16.db \
  --name=node1 \
  --data-dir=/var/lib/etcd \
  --initial-cluster=node1=http://node1:2380,node2=http://node2:2380,node3=http://node3:2380 \
  --initial-cluster-token=cluster-token \
  --initial-advertise-peer-urls=http://node1:2380

# 5. Start etcd on all nodes
systemctl start etcd

# 6. Verify
etcdctl endpoint status --write-out=table

Compaction Strategy

# Get current revision
etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision'

# Compact to keep last 1 hour of history
# (etcd doesn't do this automatically with relative time — use a script)
CURRENT_REV=$(etcdctl endpoint status --write-out=json | jq -r '.[0].Status.header.revision')
etcdctl compaction $((CURRENT_REV - 10000))    # keep ~10k revisions

# Defrag after compaction to reclaim disk space
etcdctl defrag --endpoints=https://node1:2379,https://node2:2379,https://node3:2379

TLS Setup with cfssl

# Install cfssl
go install github.com/cloudflare/cfssl/cmd/cfssl@latest
go install github.com/cloudflare/cfssl/cmd/cfssljson@latest

# Generate CA
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

# Generate server cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
  -config=ca-config.json -profile=server \
  server-csr.json | cfssljson -bare server

# Generate peer cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
  -config=ca-config.json -profile=peer \
  peer-csr.json | cfssljson -bare peer

# Generate client cert
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
  -config=ca-config.json -profile=client \
  client-csr.json | cfssljson -bare client

Common Workflows

Distributed Lock

# Acquire lock (etcdctl lock uses lease-based locking)
etcdctl lock my-lock

# Run command under lock
etcdctl lock my-lock -- bash -c 'echo "I have the lock"; sleep 5'

Leader Election

# Campaign for leadership (blocks until leader)
etcdctl elect my-election leader1

# In another terminal
etcdctl elect my-election leader2  # waits until leader1 exits

Kubernetes etcd Backup

# Typical K8s etcd backup (run on control plane node)
ETCDCTL_API=3 etcdctl snapshot save /backup/k8s-etcd.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Schedule via cron
# 0 2 * * * ETCDCTL_API=3 etcdctl snapshot save /backup/k8s-etcd-$(date +\%F).db ...

Monitor etcd with Prometheus

# etcd exposes metrics on :2381/metrics by default
# prometheus.yml
scrape_configs:
  - job_name: etcd
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/etcd-ca.crt
      cert_file: /etc/prometheus/etcd-client.crt
      key_file: /etc/prometheus/etcd-client.key
    static_configs:
      - targets: ['node1:2381', 'node2:2381', 'node3:2381']

Key metrics to watch:

Metric	What it signals
`etcd_server_leader_changes_seen_total`	Leader churn (instability)
`etcd_server_proposals_failed_total`	Consensus failures
`etcd_disk_wal_fsync_duration_seconds`	Disk write latency
`etcd_network_peer_round_trip_time_seconds`	Inter-node latency
`etcd_mvcc_db_total_size_in_bytes`	DB size (watch quota)

Tips and Best Practices

Always set ETCDCTL_API=3 — the default in some older installs is v2, which has a completely different data model.
Snapshot daily and test restores — an untested backup is not a backup; practice the restore procedure before you need it.
Watch the DB size (etcd_mvcc_db_total_size_in_bytes) — etcd has a default quota of 2GB; compact and defrag regularly.
Three nodes is the minimum for HA — a 3-node cluster tolerates one failure; 5 nodes tolerates two.
etcd is not a general-purpose database — it is designed for small configuration data (~1.5MB per value, ~8GB total); do not store large blobs.
Separate etcd from the workload — run etcd on dedicated nodes with fast SSDs; disk latency directly affects leader election stability.
Enable TLS everywhere — both client-to-server and peer-to-peer; client-cert-auth: true prevents unauthorized access.
Use quorum reads (--consistency=l) when stale reads are unacceptable — linearizable reads go through the leader.
Never remove a member without telling the cluster first (etcdctl member remove) — abrupt node removal can split the cluster.
Heartbeat and election timeouts — keep heartbeat-interval at 100ms and election-timeout at 1000ms for LAN; increase by 5–10x for high-latency links.
Auth must be enabled with a root user first — run etcdctl user add root and etcdctl auth enable before locking down; otherwise you lock yourself out.
Compaction does not free disk space alone — always follow compaction with etcdctl defrag to release pages back to the OS.

etcd

Installation

Set ETCDCTL API Version

Configuration

Single-Node etcd.yaml

Three-Node Cluster Config

etcdctl Connection Flags

Core Commands

Key-Value Operations

Watching Keys

Cluster Management

Snapshots and Backup

Compaction and Defragmentation

Auth and RBAC

Advanced Usage

Leases (TTL Keys)

Transactions (Atomic Compare-and-Swap)

Watching with Python Client

Snapshot Restore Procedure

Compaction Strategy

TLS Setup with cfssl

Common Workflows

Distributed Lock

Leader Election

Kubernetes etcd Backup

Monitor etcd with Prometheus

Tips and Best Practices

Single-Node `etcd.yaml`