Ceph Cheat Sheet

Overview

Ceph is an open-source distributed storage platform that provides unified object storage (RADOS Gateway/RGW, S3/Swift compatible), block storage (RBD for VMs and containers), and file system storage (CephFS, POSIX-compliant) on a single cluster. Built on RADOS (Reliable Autonomic Distributed Object Store), Ceph distributes data across commodity hardware using the CRUSH algorithm, eliminating single points of failure. It automatically rebalances data when nodes are added or removed and self-heals when hardware fails.

Ceph’s architecture consists of Monitors (MON) for cluster state and consensus, Managers (MGR) for metrics and dashboards, Object Storage Daemons (OSD) that store actual data on disks, and Metadata Servers (MDS) for CephFS file metadata. Modern Ceph deployments use cephadm with container orchestration for streamlined management. Ceph is the default storage backend for many OpenStack and Kubernetes deployments, powering petabyte-scale storage in production across enterprises, research institutions, and cloud providers.

Installation

Cephadm Bootstrap (Recommended)

# Install cephadm
curl --silent --remote-name --location https://download.ceph.com/rpm-reef/el9/noarch/cephadm
chmod +x cephadm
sudo mv cephadm /usr/local/bin/

# Bootstrap new cluster
sudo cephadm bootstrap \
  --mon-ip 10.0.0.1 \
  --initial-dashboard-user admin \
  --initial-dashboard-password changeme \
  --dashboard-password-noupdate

# Install ceph CLI tools
sudo cephadm install ceph-common

# Add hosts to cluster
sudo ceph orch host add node2 10.0.0.2
sudo ceph orch host add node3 10.0.0.3

# Add all available disks as OSDs
sudo ceph orch apply osd --all-available-devices

Core Commands

Command	Description
`ceph status`	Cluster health and summary
`ceph health detail`	Detailed health information
`ceph osd status`	OSD status overview
`ceph osd tree`	OSD tree with hosts and weights
`ceph df`	Cluster disk usage
`ceph osd pool ls`	List storage pools
`ceph mon stat`	Monitor status
`ceph mgr services`	Manager endpoints (dashboard URL)
`ceph orch ls`	List orchestrated services
`ceph orch ps`	List running daemons
`ceph pg stat`	Placement group statistics

Cluster Health and Status

# Quick status
ceph -s

# Detailed health
ceph health detail

# Watch cluster events in real time
ceph -w

# Performance counters
ceph osd perf

# Cluster capacity
ceph df detail

# OSD utilization
ceph osd df tree

Pool Management

# Create replicated pool
ceph osd pool create mypool 128 128 replicated

# Create erasure-coded pool (more space efficient)
ceph osd pool create ec-pool 128 128 erasure

# Set pool replication size
ceph osd pool set mypool size 3
ceph osd pool set mypool min_size 2

# Enable pool application
ceph osd pool application enable mypool rbd
ceph osd pool application enable mypool rgw
ceph osd pool application enable mypool cephfs

# Pool statistics
ceph osd pool stats mypool

# Set pool quotas
ceph osd pool set-quota mypool max_bytes 1099511627776  # 1TB
ceph osd pool set-quota mypool max_objects 1000000

# List pools with details
ceph osd pool ls detail

# Delete pool (requires confirmation)
ceph osd pool delete mypool mypool --yes-i-really-really-mean-it

Block Storage (RBD)

# Create RBD image
rbd create --size 100G mypool/myimage

# List images
rbd ls mypool

# Show image info
rbd info mypool/myimage

# Map image to block device
sudo rbd map mypool/myimage
# Creates /dev/rbd0

# Format and mount
sudo mkfs.xfs /dev/rbd0
sudo mount /dev/rbd0 /mnt/rbd

# Create snapshot
rbd snap create mypool/myimage@snap1

# List snapshots
rbd snap ls mypool/myimage

# Clone from snapshot
rbd snap protect mypool/myimage@snap1
rbd clone mypool/myimage@snap1 mypool/clone1

# Resize image
rbd resize --size 200G mypool/myimage

# Unmap
sudo umount /mnt/rbd
sudo rbd unmap /dev/rbd0

File System (CephFS)

# Create CephFS
ceph fs volume create myfs

# List file systems
ceph fs ls

# Show status
ceph fs status myfs

# Mount CephFS (kernel client)
sudo mount -t ceph mon1:6789:/ /mnt/cephfs -o name=admin,secret=<key>

# Mount CephFS (FUSE client)
sudo ceph-fuse /mnt/cephfs

# Create subvolume
ceph fs subvolume create myfs mysub

# Set quota on directory
setfattr -n ceph.quota.max_bytes -v 107374182400 /mnt/cephfs/data  # 100GB

Object Storage (RGW)

# Deploy RGW service
ceph orch apply rgw myrgw --placement="3 node1 node2 node3" --port=8080

# Create RGW user
radosgw-admin user create --uid=myuser --display-name="My User" --access-key=MYACCESSKEY --secret-key=MYSECRETKEY

# List users
radosgw-admin user list

# Create bucket (using s3cmd or aws-cli)
aws --endpoint-url http://localhost:8080 s3 mb s3://mybucket

# Get user info
radosgw-admin user info --uid=myuser

# Usage statistics
radosgw-admin usage show --uid=myuser

Configuration

CRUSH Rules

# Show CRUSH map
ceph osd crush dump

# Create rule for SSD placement
ceph osd crush rule create-replicated ssd-rule default host ssd

# Apply rule to pool
ceph osd pool set fast-pool crush_rule ssd-rule

# Set OSD device class
ceph osd crush set-device-class ssd osd.0 osd.1 osd.2

Ceph Dashboard

# Get dashboard URL
ceph mgr services

# Enable dashboard
ceph mgr module enable dashboard
ceph dashboard create-self-signed-cert

# Set credentials
ceph dashboard ac-user-create admin -i <password-file> administrator

# Enable monitoring stack
ceph mgr module enable prometheus
ceph orch apply prometheus
ceph orch apply grafana
ceph orch apply alertmanager

Advanced Usage

Erasure Coding Profiles

# Create erasure code profile (k=4 data, m=2 parity)
ceph osd erasure-code-profile set myprofile k=4 m=2 crush-failure-domain=host

# Create pool with profile
ceph osd pool create ec-data 128 128 erasure myprofile

# Show profile
ceph osd erasure-code-profile get myprofile

OSD Management

# Add OSD manually
ceph orch daemon add osd node1:/dev/sdb

# Remove OSD (graceful)
ceph osd out osd.5
ceph orch osd rm 5

# Check OSD removal status
ceph orch osd rm status

# Set OSD flags
ceph osd set noout       # Prevent rebalancing during maintenance
ceph osd unset noout     # Re-enable

# Repair inconsistent PGs
ceph pg repair 1.2a

Performance Tuning

# Set OSD recovery priority
ceph config set osd osd_recovery_max_active 3
ceph config set osd osd_recovery_sleep 0.1

# Enable BlueStore compression
ceph osd pool set mypool compression_algorithm snappy
ceph osd pool set mypool compression_mode aggressive

# Set PG autoscaler
ceph osd pool set mypool pg_autoscale_mode on

Troubleshooting

Issue	Solution
`HEALTH_WARN: too few PGs per OSD`	Increase PG count or enable pg_autoscale_mode
OSD keeps crashing	Check `ceph crash ls`; review OSD logs at `/var/log/ceph/`
Slow operations	Run `ceph daemon osd.X dump_ops_in_flight`; check disk latency
Recovery blocking I/O	Lower recovery priority: `ceph osd set-recovery-options max-active=1`
`HEALTH_ERR: X osds down`	Check hardware, network, and OSD logs; restart with `ceph orch daemon restart osd.X`
Inconsistent PGs	Run `ceph pg repair <pgid>` after checking `ceph health detail`
Full OSDs	Set `nearfull_ratio` alert; add disks or reweight with `ceph osd reweight`