Zum Inhalt springen

Thanos Cheat Sheet

Overview

Thanos is an open-source project that extends Prometheus with long-term storage capability, high availability, and a global query view across multiple Prometheus instances. It seamlessly integrates with existing Prometheus deployments by running as a sidecar alongside each Prometheus server, uploading TSDB blocks to object storage (S3, GCS, Azure Blob, MinIO). Thanos provides a unified query interface that can fan out queries to all connected Prometheus instances and object storage simultaneously.

The Thanos architecture consists of several components: Sidecar (uploads blocks and serves real-time data), Store Gateway (serves historical data from object storage), Query (aggregates data from all sources with deduplication), Compactor (downsamples and compacts blocks for efficiency), Ruler (evaluates recording and alerting rules), and Receive (implements Prometheus remote-write for push-based ingestion). Each component is independently scalable, making Thanos suitable for deployments ranging from a few servers to thousands of Prometheus instances across multiple clusters.

Installation

Binary Installation

# Download Thanos
wget https://github.com/thanos-io/thanos/releases/download/v0.35.0/thanos-0.35.0.linux-amd64.tar.gz
tar xzf thanos-0.35.0.linux-amd64.tar.gz
sudo mv thanos-0.35.0.linux-amd64/thanos /usr/local/bin/

# Verify
thanos --version

Helm (Kubernetes)

# Bitnami Thanos chart
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install thanos bitnami/thanos \
  --namespace monitoring --create-namespace \
  --set objstoreConfig="$(cat objstore.yaml)"

Object Store Configuration

# objstore.yaml
type: S3
config:
  bucket: thanos-metrics
  endpoint: s3.amazonaws.com
  region: us-east-1
  access_key: AKIAIOSFODNN7EXAMPLE
  secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# GCS
type: GCS
config:
  bucket: thanos-metrics
  service_account: /etc/thanos/gcs-sa.json
# MinIO / S3-compatible
type: S3
config:
  bucket: thanos-metrics
  endpoint: minio.monitoring:9000
  access_key: minioadmin
  secret_key: minioadmin
  insecure: true

Components

Sidecar (alongside Prometheus)

thanos sidecar \
  --tsdb.path=/var/prometheus/data \
  --prometheus.url=http://localhost:9090 \
  --objstore.config-file=/etc/thanos/objstore.yaml \
  --grpc-address=0.0.0.0:10901 \
  --http-address=0.0.0.0:10902

Prometheus config for Thanos:

# prometheus.yml - required settings
global:
  external_labels:
    cluster: production
    replica: prometheus-0
  scrape_interval: 15s

# Enable admin API for sidecar
# Start Prometheus with: --storage.tsdb.min-block-duration=2h --storage.tsdb.max-block-duration=2h

Query (Global Querier)

thanos query \
  --http-address=0.0.0.0:9090 \
  --grpc-address=0.0.0.0:10901 \
  --store=prometheus-sidecar-1:10901 \
  --store=prometheus-sidecar-2:10901 \
  --store=thanos-store-gateway:10901 \
  --store=thanos-ruler:10901 \
  --query.replica-label=replica \
  --query.auto-downsampling

Store Gateway (Object Storage)

thanos store \
  --data-dir=/var/thanos/store \
  --objstore.config-file=/etc/thanos/objstore.yaml \
  --grpc-address=0.0.0.0:10901 \
  --http-address=0.0.0.0:10902 \
  --index-cache-size=500MB \
  --chunk-pool-size=2GB

Compactor

thanos compact \
  --data-dir=/var/thanos/compact \
  --objstore.config-file=/etc/thanos/objstore.yaml \
  --http-address=0.0.0.0:10902 \
  --retention.resolution-raw=30d \
  --retention.resolution-5m=90d \
  --retention.resolution-1h=365d \
  --compact.concurrency=2 \
  --downsample.concurrency=2 \
  --wait

Ruler

thanos rule \
  --data-dir=/var/thanos/ruler \
  --objstore.config-file=/etc/thanos/objstore.yaml \
  --rule-file=/etc/thanos/rules/*.yaml \
  --query=thanos-query:9090 \
  --alertmanagers.url=http://alertmanager:9093 \
  --grpc-address=0.0.0.0:10901 \
  --http-address=0.0.0.0:10902 \
  --label='ruler_cluster="production"'

Receive (Push-Based Ingestion)

thanos receive \
  --tsdb.path=/var/thanos/receive \
  --objstore.config-file=/etc/thanos/objstore.yaml \
  --grpc-address=0.0.0.0:10901 \
  --http-address=0.0.0.0:10902 \
  --remote-write.address=0.0.0.0:19291 \
  --label='receive_cluster="production"' \
  --receive.hashrings-file=/etc/thanos/hashrings.json

Kubernetes Deployment

# Thanos Sidecar in Prometheus StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  template:
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.51.0
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.min-block-duration=2h"
            - "--storage.tsdb.max-block-duration=2h"
            - "--storage.tsdb.retention.time=6h"
            - "--web.enable-lifecycle"
          volumeMounts:
            - name: data
              mountPath: /prometheus

        - name: thanos-sidecar
          image: quay.io/thanos/thanos:v0.35.0
          args:
            - sidecar
            - "--tsdb.path=/prometheus"
            - "--prometheus.url=http://localhost:9090"
            - "--objstore.config-file=/etc/thanos/objstore.yaml"
          ports:
            - name: grpc
              containerPort: 10901
            - name: http
              containerPort: 10902

Core Commands

CommandDescription
thanos sidecarRun sidecar alongside Prometheus
thanos queryRun global query layer
thanos storeRun store gateway for object storage
thanos compactRun compactor for downsampling and retention
thanos ruleRun ruler for alerting/recording rules
thanos receiveRun push-based receiver
thanos tools bucket verifyVerify object store bucket integrity
thanos tools bucket inspectInspect blocks in bucket
thanos tools bucket lsList blocks in bucket

Bucket Tools

# List blocks in bucket
thanos tools bucket ls --objstore.config-file=objstore.yaml

# Verify bucket integrity
thanos tools bucket verify --objstore.config-file=objstore.yaml

# Inspect block details
thanos tools bucket inspect --objstore.config-file=objstore.yaml

# Clean up partial uploads
thanos tools bucket cleanup --objstore.config-file=objstore.yaml

Advanced Usage

Query Frontend (Caching)

thanos query-frontend \
  --http-address=0.0.0.0:9090 \
  --query-frontend.downstream-url=http://thanos-query:9090 \
  --query-range.split-interval=24h \
  --query-range.max-retries-per-request=3 \
  --query-frontend.log-queries-longer-than=10s \
  --cache-compression-type=snappy

Multi-Cluster Setup

# Cluster A: Prometheus + Sidecar with labels
# external_labels: {cluster: "cluster-a", replica: "0"}

# Cluster B: Prometheus + Sidecar with labels
# external_labels: {cluster: "cluster-b", replica: "0"}

# Central Query across clusters
thanos query \
  --store=cluster-a-sidecar:10901 \
  --store=cluster-b-sidecar:10901 \
  --store=store-gateway:10901 \
  --query.replica-label=replica

Troubleshooting

IssueSolution
Sidecar not uploading blocksEnsure --storage.tsdb.min-block-duration=2h on Prometheus
Gaps in historical dataRun thanos tools bucket verify and check compactor logs
Query returns duplicatesSet --query.replica-label matching Prometheus replica external label
Store gateway high memoryReduce --index-cache-size; enable --store.index-header-lazy-reader
Compactor haltedCheck for overlapping blocks; use thanos tools bucket verify --repair
Slow queries on old dataEnsure compactor downsampling is working; enable query frontend caching
Object store permission errorsVerify IAM/SA credentials in objstore.yaml config