콘텐츠로 이동

Grafana Tempo Cheat Sheet

Overview

Grafana Tempo is an open-source, high-scale distributed tracing backend designed for cost efficiency and operational simplicity. Unlike traditional tracing systems that require dedicated databases, Tempo stores traces in object storage (S3, GCS, Azure Blob, MinIO) and only requires an index for trace IDs, dramatically reducing infrastructure costs. It accepts traces in multiple formats: OpenTelemetry (OTLP), Jaeger, Zipkin, and OpenCensus protocols, making it compatible with virtually any instrumentation library.

Tempo integrates deeply with the Grafana observability stack. It connects with Loki for trace-to-log correlation and Prometheus/Mimir for trace-to-metrics linking through exemplars. TraceQL, Tempo’s query language, enables searching traces by span attributes, duration, and structural patterns without full-text indexing. Tempo can run as a single binary for development or in microservices mode (distributor, ingester, compactor, querier, query-frontend) for production deployments handling millions of spans per second.

Installation

Docker

docker run -d --name tempo \
  -p 3200:3200 \
  -p 4317:4317 \
  -p 4318:4318 \
  -v $(pwd)/tempo-config.yaml:/etc/tempo/config.yaml \
  grafana/tempo:2.5.0 \
  -config.file=/etc/tempo/config.yaml

Helm (Kubernetes)

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Distributed deployment
helm install tempo grafana/tempo-distributed \
  --namespace tracing --create-namespace \
  -f tempo-values.yaml

# Simple single-binary
helm install tempo grafana/tempo \
  --namespace tracing --create-namespace

Binary

wget https://github.com/grafana/tempo/releases/download/v2.5.0/tempo_2.5.0_linux_amd64.tar.gz
tar xzf tempo_2.5.0_linux_amd64.tar.gz
sudo mv tempo-linux-amd64 /usr/local/bin/tempo

tempo -config.file=config.yaml

Configuration

Local Development Config

# tempo-config.yaml
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
    jaeger:
      protocols:
        thrift_http:
          endpoint: 0.0.0.0:14268
        grpc:
          endpoint: 0.0.0.0:14250
    zipkin:
      endpoint: 0.0.0.0:9411

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/traces
    wal:
      path: /var/tempo/wal
    pool:
      max_workers: 100

metrics_generator:
  registry:
    external_labels:
      source: tempo
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]

Production Config (S3)

server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

ingester:
  max_block_duration: 5m
  max_block_bytes: 10485760

storage:
  trace:
    backend: s3
    s3:
      bucket: tempo-traces
      endpoint: s3.amazonaws.com
      region: us-east-1
      access_key: ${AWS_ACCESS_KEY_ID}
      secret_key: ${AWS_SECRET_ACCESS_KEY}
    wal:
      path: /var/tempo/wal
    block:
      bloom_filter_false_positive: 0.05

compactor:
  compaction:
    block_retention: 720h  # 30 days
  ring:
    kvstore:
      store: memberlist

querier:
  max_concurrent_queries: 20

query_frontend:
  search:
    max_duration: 720h
  trace_by_id:
    query_shards: 50

memberlist:
  join_members:
    - tempo-0:7946
    - tempo-1:7946
    - tempo-2:7946

TraceQL Query Language

Basic Queries

# Find traces by service name
{ resource.service.name = "api-gateway" }

# Find error spans
{ status = error }

# Find spans with specific attribute
{ span.http.method = "POST" && span.http.status_code >= 400 }

# Find slow spans
{ duration > 2s }

# Find spans by name
{ name = "HTTP GET /api/users" }

Structural Queries

# Find traces where parent span is slow
{ duration > 1s } >> { name = "db.query" }

# Child span relationship
{ resource.service.name = "frontend" } >> { resource.service.name = "backend" }

# Sibling spans
{ name = "auth" } ~ { name = "fetch-user" }

# Ancestor relationship (any depth)
{ resource.service.name = "api" } >> { span.db.system = "postgresql" && duration > 500ms }

Aggregate Queries

# Count traces by service
{ } | count() by (resource.service.name)

# Average duration by endpoint
{ span.http.route != nil } | avg(duration) by (span.http.route)

# P95 latency
{ resource.service.name = "api" } | quantile_over_time(duration, 0.95)

# Error rate by service
{ status = error } | rate() by (resource.service.name)

Sending Traces

OpenTelemetry SDK (Python)

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://tempo:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("my-service")
with tracer.start_as_current_span("my-operation"):
    # your code here
    pass

OpenTelemetry Collector Config

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

API Endpoints

EndpointDescription
GET /api/traces/{traceID}Retrieve trace by ID
GET /api/searchSearch traces
GET /api/search/tagsList available tag names
GET /api/search/tag/{tag}/valuesList values for a tag
GET /api/v2/search/tagsList tags (v2 with scope)
GET /readyReadiness check
GET /metricsPrometheus metrics
GET /status/configCurrent configuration
# Search traces via API
curl "http://tempo:3200/api/search?q=%7Bresource.service.name%3D%22api%22%7D&limit=10"

# Get trace by ID
curl "http://tempo:3200/api/traces/abc123def456"

# List tags
curl "http://tempo:3200/api/search/tags"

Advanced Usage

Metrics Generator (RED Metrics from Traces)

metrics_generator:
  ring:
    kvstore:
      store: memberlist
  processor:
    service_graphs:
      dimensions: [http.method, http.status_code]
      max_items: 10000
    span_metrics:
      dimensions: [http.method, http.route, http.status_code]
  registry:
    external_labels:
      source: tempo
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://mimir:9009/api/v1/push

Grafana Datasource Configuration

# Grafana provisioning
apiVersion: 1
datasources:
  - name: Tempo
    type: tempo
    url: http://tempo:3200
    jsonData:
      tracesToLogs:
        datasourceUid: loki
        tags: ['service.name']
      tracesToMetrics:
        datasourceUid: prometheus
      serviceMap:
        datasourceUid: prometheus
      search:
        hide: false

Troubleshooting

IssueSolution
Traces not appearingVerify distributor receivers are configured; check ingester logs
trace not foundWait for ingester flush (default 5 min); check object storage connectivity
High ingester memoryReduce max_block_duration and max_block_bytes
Search returning no resultsEnsure search is enabled in query-frontend; check tag indexing
TraceQL syntax errorsUse Grafana Explore with TraceQL autocomplete for validation
Spans droppedCheck tempo_distributor_spans_received_total vs _discarded_total metrics
Object store timeoutIncrease storage timeouts; check network connectivity to S3/GCS