Thanos Cheat Sheet
Overview
Thanos is an open-source project that extends Prometheus with long-term storage capability, high availability, and a global query view across multiple Prometheus instances. It seamlessly integrates with existing Prometheus deployments by running as a sidecar alongside each Prometheus server, uploading TSDB blocks to object storage (S3, GCS, Azure Blob, MinIO). Thanos provides a unified query interface that can fan out queries to all connected Prometheus instances and object storage simultaneously.
The Thanos architecture consists of several components: Sidecar (uploads blocks and serves real-time data), Store Gateway (serves historical data from object storage), Query (aggregates data from all sources with deduplication), Compactor (downsamples and compacts blocks for efficiency), Ruler (evaluates recording and alerting rules), and Receive (implements Prometheus remote-write for push-based ingestion). Each component is independently scalable, making Thanos suitable for deployments ranging from a few servers to thousands of Prometheus instances across multiple clusters.
Installation
Binary Installation
# Download Thanos
wget https://github.com/thanos-io/thanos/releases/download/v0.35.0/thanos-0.35.0.linux-amd64.tar.gz
tar xzf thanos-0.35.0.linux-amd64.tar.gz
sudo mv thanos-0.35.0.linux-amd64/thanos /usr/local/bin/
# Verify
thanos --version
Helm (Kubernetes)
# Bitnami Thanos chart
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install thanos bitnami/thanos \
--namespace monitoring --create-namespace \
--set objstoreConfig="$(cat objstore.yaml)"
Object Store Configuration
# objstore.yaml
type: S3
config:
bucket: thanos-metrics
endpoint: s3.amazonaws.com
region: us-east-1
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# GCS
type: GCS
config:
bucket: thanos-metrics
service_account: /etc/thanos/gcs-sa.json
# MinIO / S3-compatible
type: S3
config:
bucket: thanos-metrics
endpoint: minio.monitoring:9000
access_key: minioadmin
secret_key: minioadmin
insecure: true
Components
Sidecar (alongside Prometheus)
thanos sidecar \
--tsdb.path=/var/prometheus/data \
--prometheus.url=http://localhost:9090 \
--objstore.config-file=/etc/thanos/objstore.yaml \
--grpc-address=0.0.0.0:10901 \
--http-address=0.0.0.0:10902
Prometheus config for Thanos:
# prometheus.yml - required settings
global:
external_labels:
cluster: production
replica: prometheus-0
scrape_interval: 15s
# Enable admin API for sidecar
# Start Prometheus with: --storage.tsdb.min-block-duration=2h --storage.tsdb.max-block-duration=2h
Query (Global Querier)
thanos query \
--http-address=0.0.0.0:9090 \
--grpc-address=0.0.0.0:10901 \
--store=prometheus-sidecar-1:10901 \
--store=prometheus-sidecar-2:10901 \
--store=thanos-store-gateway:10901 \
--store=thanos-ruler:10901 \
--query.replica-label=replica \
--query.auto-downsampling
Store Gateway (Object Storage)
thanos store \
--data-dir=/var/thanos/store \
--objstore.config-file=/etc/thanos/objstore.yaml \
--grpc-address=0.0.0.0:10901 \
--http-address=0.0.0.0:10902 \
--index-cache-size=500MB \
--chunk-pool-size=2GB
Compactor
thanos compact \
--data-dir=/var/thanos/compact \
--objstore.config-file=/etc/thanos/objstore.yaml \
--http-address=0.0.0.0:10902 \
--retention.resolution-raw=30d \
--retention.resolution-5m=90d \
--retention.resolution-1h=365d \
--compact.concurrency=2 \
--downsample.concurrency=2 \
--wait
Ruler
thanos rule \
--data-dir=/var/thanos/ruler \
--objstore.config-file=/etc/thanos/objstore.yaml \
--rule-file=/etc/thanos/rules/*.yaml \
--query=thanos-query:9090 \
--alertmanagers.url=http://alertmanager:9093 \
--grpc-address=0.0.0.0:10901 \
--http-address=0.0.0.0:10902 \
--label='ruler_cluster="production"'
Receive (Push-Based Ingestion)
thanos receive \
--tsdb.path=/var/thanos/receive \
--objstore.config-file=/etc/thanos/objstore.yaml \
--grpc-address=0.0.0.0:10901 \
--http-address=0.0.0.0:10902 \
--remote-write.address=0.0.0.0:19291 \
--label='receive_cluster="production"' \
--receive.hashrings-file=/etc/thanos/hashrings.json
Kubernetes Deployment
# Thanos Sidecar in Prometheus StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.51.0
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.min-block-duration=2h"
- "--storage.tsdb.max-block-duration=2h"
- "--storage.tsdb.retention.time=6h"
- "--web.enable-lifecycle"
volumeMounts:
- name: data
mountPath: /prometheus
- name: thanos-sidecar
image: quay.io/thanos/thanos:v0.35.0
args:
- sidecar
- "--tsdb.path=/prometheus"
- "--prometheus.url=http://localhost:9090"
- "--objstore.config-file=/etc/thanos/objstore.yaml"
ports:
- name: grpc
containerPort: 10901
- name: http
containerPort: 10902
Core Commands
| Command | Description |
|---|---|
thanos sidecar | Run sidecar alongside Prometheus |
thanos query | Run global query layer |
thanos store | Run store gateway for object storage |
thanos compact | Run compactor for downsampling and retention |
thanos rule | Run ruler for alerting/recording rules |
thanos receive | Run push-based receiver |
thanos tools bucket verify | Verify object store bucket integrity |
thanos tools bucket inspect | Inspect blocks in bucket |
thanos tools bucket ls | List blocks in bucket |
Bucket Tools
# List blocks in bucket
thanos tools bucket ls --objstore.config-file=objstore.yaml
# Verify bucket integrity
thanos tools bucket verify --objstore.config-file=objstore.yaml
# Inspect block details
thanos tools bucket inspect --objstore.config-file=objstore.yaml
# Clean up partial uploads
thanos tools bucket cleanup --objstore.config-file=objstore.yaml
Advanced Usage
Query Frontend (Caching)
thanos query-frontend \
--http-address=0.0.0.0:9090 \
--query-frontend.downstream-url=http://thanos-query:9090 \
--query-range.split-interval=24h \
--query-range.max-retries-per-request=3 \
--query-frontend.log-queries-longer-than=10s \
--cache-compression-type=snappy
Multi-Cluster Setup
# Cluster A: Prometheus + Sidecar with labels
# external_labels: {cluster: "cluster-a", replica: "0"}
# Cluster B: Prometheus + Sidecar with labels
# external_labels: {cluster: "cluster-b", replica: "0"}
# Central Query across clusters
thanos query \
--store=cluster-a-sidecar:10901 \
--store=cluster-b-sidecar:10901 \
--store=store-gateway:10901 \
--query.replica-label=replica
Troubleshooting
| Issue | Solution |
|---|---|
| Sidecar not uploading blocks | Ensure --storage.tsdb.min-block-duration=2h on Prometheus |
| Gaps in historical data | Run thanos tools bucket verify and check compactor logs |
| Query returns duplicates | Set --query.replica-label matching Prometheus replica external label |
| Store gateway high memory | Reduce --index-cache-size; enable --store.index-header-lazy-reader |
| Compactor halted | Check for overlapping blocks; use thanos tools bucket verify --repair |
| Slow queries on old data | Ensure compactor downsampling is working; enable query frontend caching |
| Object store permission errors | Verify IAM/SA credentials in objstore.yaml config |