Pyrra Cheat Sheet
Overview
Pyrra is an open-source SLO (Service Level Objective) monitoring tool that provides a visual dashboard for tracking error budgets, burn rates, and SLO compliance using Prometheus metrics. It runs as a Kubernetes-native application and can automatically generate multi-window, multi-burn-rate Prometheus recording and alerting rules from simple SLO definitions specified as Kubernetes custom resources or YAML files.
Pyrra differentiates itself with an intuitive web UI that displays SLO compliance as visual error budget charts, making it easy for both engineers and stakeholders to understand how much error budget remains and how quickly it is being consumed. The dashboard shows historical SLO performance, identifies periods of elevated error rates, and provides drill-down capabilities to investigate specific time windows where budget was burned.
Installation
Kubernetes Installation via Helm
# Add Pyrra Helm repo
helm repo add pyrra https://pyrra-dev.github.io/pyrra
helm repo update
# Install Pyrra (API + dashboard)
helm install pyrra pyrra/pyrra \
--namespace monitoring \
--create-namespace \
--set prometheusUrl="http://prometheus-server.monitoring:9090" \
--set prometheusExternalUrl="http://prometheus.example.com"
# Verify installation
kubectl get pods -n monitoring -l app.kubernetes.io/name=pyrra
Kubernetes Manifest Installation
# Install CRDs
kubectl apply -f https://raw.githubusercontent.com/pyrra-dev/pyrra/main/config/crd/bases/pyrra.dev_servicelevelobjectives.yaml
# Install Pyrra components
kubectl apply -f https://raw.githubusercontent.com/pyrra-dev/pyrra/main/config/default/pyrra.yaml
# Or deploy with specific Prometheus URL
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: pyrra-api
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: pyrra-api
template:
metadata:
labels:
app: pyrra-api
spec:
containers:
- name: pyrra
image: ghcr.io/pyrra-dev/pyrra:latest
args:
- api
- --prometheus-url=http://prometheus-server:9090
- --api-url=http://pyrra-kubernetes:9444
ports:
- containerPort: 9099
name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: pyrra-kubernetes
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: pyrra-kubernetes
template:
metadata:
labels:
app: pyrra-kubernetes
spec:
serviceAccountName: pyrra
containers:
- name: pyrra
image: ghcr.io/pyrra-dev/pyrra:latest
args:
- kubernetes
- --prometheus-url=http://prometheus-server:9090
ports:
- containerPort: 9444
name: http
EOF
Standalone (Filesystem Mode)
# Run Pyrra without Kubernetes (filesystem mode)
docker run -d \
--name pyrra \
-p 9099:9099 \
-v $(pwd)/slos:/etc/pyrra/slos \
-v $(pwd)/output:/etc/pyrra/output \
ghcr.io/pyrra-dev/pyrra:latest \
filesystem \
--prometheus-url=http://prometheus:9090 \
--config-files=/etc/pyrra/slos \
--prometheus-folder=/etc/pyrra/output
# Binary installation
curl -L "https://github.com/pyrra-dev/pyrra/releases/latest/download/pyrra_linux_amd64" -o /usr/local/bin/pyrra
chmod +x /usr/local/bin/pyrra
# Run in filesystem mode
pyrra filesystem \
--prometheus-url=http://localhost:9090 \
--config-files=./slos/ \
--prometheus-folder=./prometheus-rules/
Core Commands — SLO Definitions
Kubernetes CRD (ServiceLevelObjective)
# availability-slo.yaml
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: payment-api-availability
namespace: monitoring
labels:
team: payments
tier: "0"
spec:
target: "99.9"
window: 30d
description: "99.9% of payment API requests return successfully"
indicator:
ratio:
errors:
metric: http_requests_total{job="payment-api",code=~"5.."}
total:
metric: http_requests_total{job="payment-api"}
alerting:
name: PaymentAPIAvailability
disabled: false
burnrates: true
# Apply the SLO
kubectl apply -f availability-slo.yaml
# List all SLOs
kubectl get servicelevelobjectives -n monitoring
# Describe an SLO
kubectl describe slo payment-api-availability -n monitoring
# Delete an SLO
kubectl delete slo payment-api-availability -n monitoring
Latency SLO
# latency-slo.yaml
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: checkout-api-latency
namespace: monitoring
labels:
team: commerce
spec:
target: "99.0"
window: 30d
description: "99% of checkout requests complete within 500ms"
indicator:
latency:
success:
metric: http_request_duration_seconds_bucket{job="checkout-api",le="0.5"}
total:
metric: http_request_duration_seconds_count{job="checkout-api"}
alerting:
name: CheckoutAPILatency
Boolean/Probe SLO
# probe-slo.yaml
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: website-uptime
namespace: monitoring
spec:
target: "99.95"
window: 30d
description: "Website responds to health checks 99.95% of the time"
indicator:
bool_gauge:
metric: probe_success{job="blackbox",instance="https://www.example.com"}
alerting:
name: WebsiteUptime
Filesystem Mode SLO
# slos/api-gateway.yaml (for filesystem mode)
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: api-gateway-availability
labels:
team: platform
environment: production
spec:
target: "99.95"
window: 30d
description: "API Gateway availability"
indicator:
ratio:
errors:
metric: envoy_cluster_upstream_rq{response_code_class!="2xx",cluster="api-gateway"}
total:
metric: envoy_cluster_upstream_rq{cluster="api-gateway"}
alerting:
name: APIGatewayAvailability
burnrates: true
Configuration
Helm Values
# pyrra-values.yaml
image:
repository: ghcr.io/pyrra-dev/pyrra
tag: latest
prometheusUrl: "http://prometheus-server.monitoring:9090"
prometheusExternalUrl: "https://prometheus.example.com"
api:
replicas: 2
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
kubernetes:
replicas: 1
resources:
requests:
cpu: 100m
memory: 128Mi
service:
type: ClusterIP
port: 9099
ingress:
enabled: true
className: nginx
hosts:
- host: pyrra.internal.company.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: pyrra-tls
hosts:
- pyrra.internal.company.com
# RBAC for reading/writing SLO CRDs
rbac:
create: true
Prometheus Integration
# prometheus.yml — Include Pyrra-generated rules
rule_files:
- "/etc/pyrra/output/*.yaml"
# If using Prometheus Operator, Pyrra generates PrometheusRule CRs
# that are automatically discovered
Generated Recording Rules
# Pyrra generates these recording rules automatically:
# Error ratio at different windows
# pyrra:slo:payment-api-availability:error_ratio_rate5m
# pyrra:slo:payment-api-availability:error_ratio_rate30m
# pyrra:slo:payment-api-availability:error_ratio_rate1h
# pyrra:slo:payment-api-availability:error_ratio_rate2h
# pyrra:slo:payment-api-availability:error_ratio_rate6h
# pyrra:slo:payment-api-availability:error_ratio_rate1d
# pyrra:slo:payment-api-availability:error_ratio_rate3d
# pyrra:slo:payment-api-availability:error_ratio_rate30d
# Burn rate alerts (multi-window, multi-burn-rate)
# Fast burn: 14.4x burn rate over 1h (and 5m confirmation)
# Slow burn: 6x burn rate over 6h (and 30m confirmation)
# Medium burn: 3x burn rate over 1d (and 2h confirmation)
# Low burn: 1x burn rate over 3d (and 6h confirmation)
Advanced Usage
Multiple Services SLO Collection
# Define SLOs for all critical services
---
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: auth-service-availability
namespace: monitoring
labels:
team: identity
tier: "0"
spec:
target: "99.99"
window: 30d
indicator:
ratio:
errors:
metric: grpc_server_handled_total{grpc_service="auth.AuthService",grpc_code!="OK"}
total:
metric: grpc_server_handled_total{grpc_service="auth.AuthService"}
alerting:
name: AuthServiceAvailability
---
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: search-service-latency
namespace: monitoring
labels:
team: search
tier: "1"
spec:
target: "95.0"
window: 30d
indicator:
latency:
success:
metric: http_request_duration_seconds_bucket{job="search-api",le="0.3"}
total:
metric: http_request_duration_seconds_count{job="search-api"}
alerting:
name: SearchServiceLatency
---
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: notification-service-availability
namespace: monitoring
labels:
team: messaging
tier: "1"
spec:
target: "99.5"
window: 30d
indicator:
ratio:
errors:
metric: notifications_send_total{status="failed"}
total:
metric: notifications_send_total
alerting:
name: NotificationServiceAvailability
Grafana Integration
# Pyrra provides its own web UI but also works with Grafana
# PromQL queries for Grafana dashboards:
# Current error budget remaining (%)
# 1 - (pyrra:slo:payment-api-availability:error_ratio_rate30d / (1 - 0.999))
# Burn rate (how fast budget is consumed)
# pyrra:slo:payment-api-availability:error_ratio_rate1h / (1 - 0.999)
# Error budget consumed over time
# pyrra:slo:payment-api-availability:error_ratio_rate30d * 30 * 24 * 60
# SLO compliance (boolean: 1 = meeting target)
# pyrra:slo:payment-api-availability:error_ratio_rate30d <= (1 - 0.999)
API Endpoints
# Pyrra exposes an API for querying SLO data
# List all SLOs
curl -s "http://pyrra:9099/api/objectives" | jq '.[] | {name: .labels.pyrra_dev_name, target, window}'
# Get specific SLO details
curl -s "http://pyrra:9099/api/objectives/payment-api-availability" | jq '.'
# Get SLO status (current error budget)
curl -s "http://pyrra:9099/api/objectives/payment-api-availability/status" | jq '{
availability: .availability,
budget: .budget,
burnrate: .burnrate
}'
# Get alerts for an SLO
curl -s "http://pyrra:9099/api/objectives/payment-api-availability/alerts" | jq '.'
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| SLO dashboard shows no data | Prometheus URL misconfigured | Verify --prometheus-url points to correct Prometheus instance |
| Recording rules not generated | CRD not installed or controller not running | Check kubectl get crds | grep pyrra and controller pods |
| Error budget always at 100% | SLI metrics returning 0 errors | Verify error metric labels match actual metrics |
| Alerts not firing | Alerting rules not loaded by Prometheus | Check Prometheus rule_files path or PrometheusRule CRs |
| Dashboard shows wrong window | Window not matching SLO spec | Verify window: 30d in SLO spec matches expectations |
| Latency SLO not working | Histogram bucket mismatch | Verify histogram has the le bucket specified in the SLO |
| Bool gauge SLO inaccurate | Probe interval too long | Increase probe frequency for more accurate measurement |
| API returning 500 | Prometheus query timeout | Increase Prometheus query timeout or reduce SLO window |
# Debug: check Pyrra controller logs
kubectl logs -n monitoring -l app.kubernetes.io/name=pyrra-kubernetes --tail=50
# Debug: check API server logs
kubectl logs -n monitoring -l app.kubernetes.io/name=pyrra-api --tail=50
# Verify CRD is installed
kubectl get crds | grep pyrra
# Check SLO status
kubectl get servicelevelobjectives -n monitoring -o wide
# Verify Prometheus has the recording rules
curl -s "http://prometheus:9090/api/v1/rules" | jq '.data.groups[] | select(.name | contains("pyrra"))'
# Test SLI queries directly in Prometheus
# Error query:
curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total{code=~'5..'}[5m]))"
# Total query:
curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total[5m]))"
# Access Pyrra UI
kubectl port-forward -n monitoring svc/pyrra 9099:9099
# Open http://localhost:9099