콘텐츠로 이동

Pyrra Cheat Sheet

Overview

Pyrra is an open-source SLO (Service Level Objective) monitoring tool that provides a visual dashboard for tracking error budgets, burn rates, and SLO compliance using Prometheus metrics. It runs as a Kubernetes-native application and can automatically generate multi-window, multi-burn-rate Prometheus recording and alerting rules from simple SLO definitions specified as Kubernetes custom resources or YAML files.

Pyrra differentiates itself with an intuitive web UI that displays SLO compliance as visual error budget charts, making it easy for both engineers and stakeholders to understand how much error budget remains and how quickly it is being consumed. The dashboard shows historical SLO performance, identifies periods of elevated error rates, and provides drill-down capabilities to investigate specific time windows where budget was burned.

Installation

Kubernetes Installation via Helm

# Add Pyrra Helm repo
helm repo add pyrra https://pyrra-dev.github.io/pyrra
helm repo update

# Install Pyrra (API + dashboard)
helm install pyrra pyrra/pyrra \
  --namespace monitoring \
  --create-namespace \
  --set prometheusUrl="http://prometheus-server.monitoring:9090" \
  --set prometheusExternalUrl="http://prometheus.example.com"

# Verify installation
kubectl get pods -n monitoring -l app.kubernetes.io/name=pyrra

Kubernetes Manifest Installation

# Install CRDs
kubectl apply -f https://raw.githubusercontent.com/pyrra-dev/pyrra/main/config/crd/bases/pyrra.dev_servicelevelobjectives.yaml

# Install Pyrra components
kubectl apply -f https://raw.githubusercontent.com/pyrra-dev/pyrra/main/config/default/pyrra.yaml

# Or deploy with specific Prometheus URL
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pyrra-api
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pyrra-api
  template:
    metadata:
      labels:
        app: pyrra-api
    spec:
      containers:
        - name: pyrra
          image: ghcr.io/pyrra-dev/pyrra:latest
          args:
            - api
            - --prometheus-url=http://prometheus-server:9090
            - --api-url=http://pyrra-kubernetes:9444
          ports:
            - containerPort: 9099
              name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pyrra-kubernetes
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pyrra-kubernetes
  template:
    metadata:
      labels:
        app: pyrra-kubernetes
    spec:
      serviceAccountName: pyrra
      containers:
        - name: pyrra
          image: ghcr.io/pyrra-dev/pyrra:latest
          args:
            - kubernetes
            - --prometheus-url=http://prometheus-server:9090
          ports:
            - containerPort: 9444
              name: http
EOF

Standalone (Filesystem Mode)

# Run Pyrra without Kubernetes (filesystem mode)
docker run -d \
  --name pyrra \
  -p 9099:9099 \
  -v $(pwd)/slos:/etc/pyrra/slos \
  -v $(pwd)/output:/etc/pyrra/output \
  ghcr.io/pyrra-dev/pyrra:latest \
  filesystem \
  --prometheus-url=http://prometheus:9090 \
  --config-files=/etc/pyrra/slos \
  --prometheus-folder=/etc/pyrra/output

# Binary installation
curl -L "https://github.com/pyrra-dev/pyrra/releases/latest/download/pyrra_linux_amd64" -o /usr/local/bin/pyrra
chmod +x /usr/local/bin/pyrra

# Run in filesystem mode
pyrra filesystem \
  --prometheus-url=http://localhost:9090 \
  --config-files=./slos/ \
  --prometheus-folder=./prometheus-rules/

Core Commands — SLO Definitions

Kubernetes CRD (ServiceLevelObjective)

# availability-slo.yaml
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: payment-api-availability
  namespace: monitoring
  labels:
    team: payments
    tier: "0"
spec:
  target: "99.9"
  window: 30d
  description: "99.9% of payment API requests return successfully"
  indicator:
    ratio:
      errors:
        metric: http_requests_total{job="payment-api",code=~"5.."}
      total:
        metric: http_requests_total{job="payment-api"}
  alerting:
    name: PaymentAPIAvailability
    disabled: false
    burnrates: true
# Apply the SLO
kubectl apply -f availability-slo.yaml

# List all SLOs
kubectl get servicelevelobjectives -n monitoring

# Describe an SLO
kubectl describe slo payment-api-availability -n monitoring

# Delete an SLO
kubectl delete slo payment-api-availability -n monitoring

Latency SLO

# latency-slo.yaml
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: checkout-api-latency
  namespace: monitoring
  labels:
    team: commerce
spec:
  target: "99.0"
  window: 30d
  description: "99% of checkout requests complete within 500ms"
  indicator:
    latency:
      success:
        metric: http_request_duration_seconds_bucket{job="checkout-api",le="0.5"}
      total:
        metric: http_request_duration_seconds_count{job="checkout-api"}
  alerting:
    name: CheckoutAPILatency

Boolean/Probe SLO

# probe-slo.yaml
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: website-uptime
  namespace: monitoring
spec:
  target: "99.95"
  window: 30d
  description: "Website responds to health checks 99.95% of the time"
  indicator:
    bool_gauge:
      metric: probe_success{job="blackbox",instance="https://www.example.com"}
  alerting:
    name: WebsiteUptime

Filesystem Mode SLO

# slos/api-gateway.yaml (for filesystem mode)
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: api-gateway-availability
  labels:
    team: platform
    environment: production
spec:
  target: "99.95"
  window: 30d
  description: "API Gateway availability"
  indicator:
    ratio:
      errors:
        metric: envoy_cluster_upstream_rq{response_code_class!="2xx",cluster="api-gateway"}
      total:
        metric: envoy_cluster_upstream_rq{cluster="api-gateway"}
  alerting:
    name: APIGatewayAvailability
    burnrates: true

Configuration

Helm Values

# pyrra-values.yaml
image:
  repository: ghcr.io/pyrra-dev/pyrra
  tag: latest

prometheusUrl: "http://prometheus-server.monitoring:9090"
prometheusExternalUrl: "https://prometheus.example.com"

api:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi

kubernetes:
  replicas: 1
  resources:
    requests:
      cpu: 100m
      memory: 128Mi

service:
  type: ClusterIP
  port: 9099

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: pyrra.internal.company.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: pyrra-tls
      hosts:
        - pyrra.internal.company.com

# RBAC for reading/writing SLO CRDs
rbac:
  create: true

Prometheus Integration

# prometheus.yml — Include Pyrra-generated rules
rule_files:
  - "/etc/pyrra/output/*.yaml"

# If using Prometheus Operator, Pyrra generates PrometheusRule CRs
# that are automatically discovered

Generated Recording Rules

# Pyrra generates these recording rules automatically:

# Error ratio at different windows
# pyrra:slo:payment-api-availability:error_ratio_rate5m
# pyrra:slo:payment-api-availability:error_ratio_rate30m
# pyrra:slo:payment-api-availability:error_ratio_rate1h
# pyrra:slo:payment-api-availability:error_ratio_rate2h
# pyrra:slo:payment-api-availability:error_ratio_rate6h
# pyrra:slo:payment-api-availability:error_ratio_rate1d
# pyrra:slo:payment-api-availability:error_ratio_rate3d
# pyrra:slo:payment-api-availability:error_ratio_rate30d

# Burn rate alerts (multi-window, multi-burn-rate)
# Fast burn: 14.4x burn rate over 1h (and 5m confirmation)
# Slow burn: 6x burn rate over 6h (and 30m confirmation)
# Medium burn: 3x burn rate over 1d (and 2h confirmation)
# Low burn: 1x burn rate over 3d (and 6h confirmation)

Advanced Usage

Multiple Services SLO Collection

# Define SLOs for all critical services
---
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: auth-service-availability
  namespace: monitoring
  labels:
    team: identity
    tier: "0"
spec:
  target: "99.99"
  window: 30d
  indicator:
    ratio:
      errors:
        metric: grpc_server_handled_total{grpc_service="auth.AuthService",grpc_code!="OK"}
      total:
        metric: grpc_server_handled_total{grpc_service="auth.AuthService"}
  alerting:
    name: AuthServiceAvailability
---
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: search-service-latency
  namespace: monitoring
  labels:
    team: search
    tier: "1"
spec:
  target: "95.0"
  window: 30d
  indicator:
    latency:
      success:
        metric: http_request_duration_seconds_bucket{job="search-api",le="0.3"}
      total:
        metric: http_request_duration_seconds_count{job="search-api"}
  alerting:
    name: SearchServiceLatency
---
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: notification-service-availability
  namespace: monitoring
  labels:
    team: messaging
    tier: "1"
spec:
  target: "99.5"
  window: 30d
  indicator:
    ratio:
      errors:
        metric: notifications_send_total{status="failed"}
      total:
        metric: notifications_send_total
  alerting:
    name: NotificationServiceAvailability

Grafana Integration

# Pyrra provides its own web UI but also works with Grafana

# PromQL queries for Grafana dashboards:

# Current error budget remaining (%)
# 1 - (pyrra:slo:payment-api-availability:error_ratio_rate30d / (1 - 0.999))

# Burn rate (how fast budget is consumed)
# pyrra:slo:payment-api-availability:error_ratio_rate1h / (1 - 0.999)

# Error budget consumed over time
# pyrra:slo:payment-api-availability:error_ratio_rate30d * 30 * 24 * 60

# SLO compliance (boolean: 1 = meeting target)
# pyrra:slo:payment-api-availability:error_ratio_rate30d <= (1 - 0.999)

API Endpoints

# Pyrra exposes an API for querying SLO data

# List all SLOs
curl -s "http://pyrra:9099/api/objectives" | jq '.[] | {name: .labels.pyrra_dev_name, target, window}'

# Get specific SLO details
curl -s "http://pyrra:9099/api/objectives/payment-api-availability" | jq '.'

# Get SLO status (current error budget)
curl -s "http://pyrra:9099/api/objectives/payment-api-availability/status" | jq '{
  availability: .availability,
  budget: .budget,
  burnrate: .burnrate
}'

# Get alerts for an SLO
curl -s "http://pyrra:9099/api/objectives/payment-api-availability/alerts" | jq '.'

Troubleshooting

IssueCauseSolution
SLO dashboard shows no dataPrometheus URL misconfiguredVerify --prometheus-url points to correct Prometheus instance
Recording rules not generatedCRD not installed or controller not runningCheck kubectl get crds | grep pyrra and controller pods
Error budget always at 100%SLI metrics returning 0 errorsVerify error metric labels match actual metrics
Alerts not firingAlerting rules not loaded by PrometheusCheck Prometheus rule_files path or PrometheusRule CRs
Dashboard shows wrong windowWindow not matching SLO specVerify window: 30d in SLO spec matches expectations
Latency SLO not workingHistogram bucket mismatchVerify histogram has the le bucket specified in the SLO
Bool gauge SLO inaccurateProbe interval too longIncrease probe frequency for more accurate measurement
API returning 500Prometheus query timeoutIncrease Prometheus query timeout or reduce SLO window
# Debug: check Pyrra controller logs
kubectl logs -n monitoring -l app.kubernetes.io/name=pyrra-kubernetes --tail=50

# Debug: check API server logs
kubectl logs -n monitoring -l app.kubernetes.io/name=pyrra-api --tail=50

# Verify CRD is installed
kubectl get crds | grep pyrra

# Check SLO status
kubectl get servicelevelobjectives -n monitoring -o wide

# Verify Prometheus has the recording rules
curl -s "http://prometheus:9090/api/v1/rules" | jq '.data.groups[] | select(.name | contains("pyrra"))'

# Test SLI queries directly in Prometheus
# Error query:
curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total{code=~'5..'}[5m]))"
# Total query:
curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total[5m]))"

# Access Pyrra UI
kubectl port-forward -n monitoring svc/pyrra 9099:9099
# Open http://localhost:9099