Grafana Tempo Cheat Sheet
Overview
Grafana Tempo is an open-source, high-scale distributed tracing backend designed for cost efficiency and operational simplicity. Unlike traditional tracing systems that require dedicated databases, Tempo stores traces in object storage (S3, GCS, Azure Blob, MinIO) and only requires an index for trace IDs, dramatically reducing infrastructure costs. It accepts traces in multiple formats: OpenTelemetry (OTLP), Jaeger, Zipkin, and OpenCensus protocols, making it compatible with virtually any instrumentation library.
Tempo integrates deeply with the Grafana observability stack. It connects with Loki for trace-to-log correlation and Prometheus/Mimir for trace-to-metrics linking through exemplars. TraceQL, Tempo’s query language, enables searching traces by span attributes, duration, and structural patterns without full-text indexing. Tempo can run as a single binary for development or in microservices mode (distributor, ingester, compactor, querier, query-frontend) for production deployments handling millions of spans per second.
Installation
Docker
docker run -d --name tempo \
-p 3200:3200 \
-p 4317:4317 \
-p 4318:4318 \
-v $(pwd)/tempo-config.yaml:/etc/tempo/config.yaml \
grafana/tempo:2.5.0 \
-config.file=/etc/tempo/config.yaml
Helm (Kubernetes)
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Distributed deployment
helm install tempo grafana/tempo-distributed \
--namespace tracing --create-namespace \
-f tempo-values.yaml
# Simple single-binary
helm install tempo grafana/tempo \
--namespace tracing --create-namespace
Binary
wget https://github.com/grafana/tempo/releases/download/v2.5.0/tempo_2.5.0_linux_amd64.tar.gz
tar xzf tempo_2.5.0_linux_amd64.tar.gz
sudo mv tempo-linux-amd64 /usr/local/bin/tempo
tempo -config.file=config.yaml
Configuration
Local Development Config
# tempo-config.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
thrift_http:
endpoint: 0.0.0.0:14268
grpc:
endpoint: 0.0.0.0:14250
zipkin:
endpoint: 0.0.0.0:9411
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
pool:
max_workers: 100
metrics_generator:
registry:
external_labels:
source: tempo
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics]
Production Config (S3)
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
ingester:
max_block_duration: 5m
max_block_bytes: 10485760
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: s3.amazonaws.com
region: us-east-1
access_key: ${AWS_ACCESS_KEY_ID}
secret_key: ${AWS_SECRET_ACCESS_KEY}
wal:
path: /var/tempo/wal
block:
bloom_filter_false_positive: 0.05
compactor:
compaction:
block_retention: 720h # 30 days
ring:
kvstore:
store: memberlist
querier:
max_concurrent_queries: 20
query_frontend:
search:
max_duration: 720h
trace_by_id:
query_shards: 50
memberlist:
join_members:
- tempo-0:7946
- tempo-1:7946
- tempo-2:7946
TraceQL Query Language
Basic Queries
# Find traces by service name
{ resource.service.name = "api-gateway" }
# Find error spans
{ status = error }
# Find spans with specific attribute
{ span.http.method = "POST" && span.http.status_code >= 400 }
# Find slow spans
{ duration > 2s }
# Find spans by name
{ name = "HTTP GET /api/users" }
Structural Queries
# Find traces where parent span is slow
{ duration > 1s } >> { name = "db.query" }
# Child span relationship
{ resource.service.name = "frontend" } >> { resource.service.name = "backend" }
# Sibling spans
{ name = "auth" } ~ { name = "fetch-user" }
# Ancestor relationship (any depth)
{ resource.service.name = "api" } >> { span.db.system = "postgresql" && duration > 500ms }
Aggregate Queries
# Count traces by service
{ } | count() by (resource.service.name)
# Average duration by endpoint
{ span.http.route != nil } | avg(duration) by (span.http.route)
# P95 latency
{ resource.service.name = "api" } | quantile_over_time(duration, 0.95)
# Error rate by service
{ status = error } | rate() by (resource.service.name)
Sending Traces
OpenTelemetry SDK (Python)
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://tempo:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("my-service")
with tracer.start_as_current_span("my-operation"):
# your code here
pass
OpenTelemetry Collector Config
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
otlp:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/traces/{traceID} | Retrieve trace by ID |
GET /api/search | Search traces |
GET /api/search/tags | List available tag names |
GET /api/search/tag/{tag}/values | List values for a tag |
GET /api/v2/search/tags | List tags (v2 with scope) |
GET /ready | Readiness check |
GET /metrics | Prometheus metrics |
GET /status/config | Current configuration |
# Search traces via API
curl "http://tempo:3200/api/search?q=%7Bresource.service.name%3D%22api%22%7D&limit=10"
# Get trace by ID
curl "http://tempo:3200/api/traces/abc123def456"
# List tags
curl "http://tempo:3200/api/search/tags"
Advanced Usage
Metrics Generator (RED Metrics from Traces)
metrics_generator:
ring:
kvstore:
store: memberlist
processor:
service_graphs:
dimensions: [http.method, http.status_code]
max_items: 10000
span_metrics:
dimensions: [http.method, http.route, http.status_code]
registry:
external_labels:
source: tempo
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://mimir:9009/api/v1/push
Grafana Datasource Configuration
# Grafana provisioning
apiVersion: 1
datasources:
- name: Tempo
type: tempo
url: http://tempo:3200
jsonData:
tracesToLogs:
datasourceUid: loki
tags: ['service.name']
tracesToMetrics:
datasourceUid: prometheus
serviceMap:
datasourceUid: prometheus
search:
hide: false
Troubleshooting
| Issue | Solution |
|---|---|
| Traces not appearing | Verify distributor receivers are configured; check ingester logs |
trace not found | Wait for ingester flush (default 5 min); check object storage connectivity |
| High ingester memory | Reduce max_block_duration and max_block_bytes |
| Search returning no results | Ensure search is enabled in query-frontend; check tag indexing |
| TraceQL syntax errors | Use Grafana Explore with TraceQL autocomplete for validation |
| Spans dropped | Check tempo_distributor_spans_received_total vs _discarded_total metrics |
| Object store timeout | Increase storage timeouts; check network connectivity to S3/GCS |