Zum Inhalt springen

Flagger Cheat Sheet

Overview

Flagger is a progressive delivery operator for Kubernetes that automates the promotion of canary deployments using service mesh routing, ingress controllers, or custom traffic management. It monitors metrics from providers like Prometheus, Datadog, or CloudWatch and progressively shifts traffic to new versions while running automated analysis.

Flagger works with Istio, Linkerd, App Mesh, NGINX, Contour, Gloo, and other service meshes and ingress controllers. It creates the canary and primary deployments automatically, manages traffic splitting, runs automated tests via webhooks, and rolls back on failure — all driven by Kubernetes custom resources.

Installation

# Install with Helm
helm repo add flagger https://flagger.app
helm upgrade -i flagger flagger/flagger \
  --namespace flagger-system \
  --create-namespace \
  --set meshProvider=istio \
  --set metricsServer=http://prometheus.monitoring:9090

# Install Flagger load tester (optional)
helm upgrade -i flagger-loadtester flagger/loadtester \
  --namespace flagger-system

# Install with kubectl
kubectl apply -k github.com/fluxcd/flagger/kustomize/istio

# Verify
kubectl -n flagger-system get pods

Core Concepts

ResourceDescription
CanaryMain CRD defining deployment, analysis, and promotion
MetricTemplateCustom metric queries for analysis
AlertProviderNotification configuration (Slack, Teams, etc.)

Canary Configuration

Basic Canary (Istio)

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: prod
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  service:
    port: 80
    targetPort: 8080
    gateways:
      - public-gateway.istio-system.svc.cluster.local
    hosts:
      - app.example.com
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.flagger-system/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.prod:80/"

Blue-Green (Kubernetes)

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 3
    iterations: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.flagger-system/
        timeout: 30s
        metadata:
          type: bash
          cmd: "curl -s http://my-app-canary.prod:80/healthz | grep ok"

NGINX Ingress Canary

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
spec:
  provider: nginx
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  ingressRef:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    name: my-app
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m

Custom Metrics

MetricTemplate

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: error-rate
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus.monitoring:9090
  query: |
    100 - sum(
      rate(http_requests_total{
        namespace="{{ namespace }}",
        pod=~"{{ target }}-[0-9a-zA-Z]+",
        status!~"5.*"
      }[{{ interval }}])
    ) / sum(
      rate(http_requests_total{
        namespace="{{ namespace }}",
        pod=~"{{ target }}-[0-9a-zA-Z]+",
      }[{{ interval }}])
    ) * 100
# Reference in Canary
analysis:
  metrics:
    - name: error-rate
      templateRef:
        name: error-rate
        namespace: flagger-system
      thresholdRange:
        max: 1
      interval: 1m

Configuration

Webhooks

analysis:
  webhooks:
    # Pre-rollout test
    - name: smoke-test
      type: pre-rollout
      url: http://flagger-loadtester.flagger-system/
      timeout: 15s
      metadata:
        type: bash
        cmd: "curl -sf http://my-app-canary.prod:80/healthz"

    # Load test during canary
    - name: load-test
      type: rollout
      url: http://flagger-loadtester.flagger-system/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.prod:80/"

    # Post-rollout notification
    - name: notify
      type: post-rollout
      url: http://flagger-loadtester.flagger-system/
      metadata:
        type: bash
        cmd: "echo 'Deployment complete'"

    # Confirm promotion (manual gate)
    - name: confirm
      type: confirm-promotion
      url: http://flagger-loadtester.flagger-system/gate/check

Alert Providers

apiVersion: flagger.app/v1beta1
kind: AlertProvider
metadata:
  name: slack
  namespace: flagger-system
spec:
  type: slack
  channel: deployments
  address: https://hooks.slack.com/services/xxx/yyy/zzz
# Reference in Canary
spec:
  analysis:
    alerts:
      - name: slack
        severity: info
        providerRef:
          name: slack
          namespace: flagger-system

Advanced Usage

A/B Testing

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 10
    iterations: 10
    match:
      - headers:
          x-canary:
            exact: "insider"
      - headers:
          cookie:
            regex: "^(.*?;)?(canary=always)(;.*)?$"
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m

HPA Autoscaler Reference

spec:
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: my-app

Manual Gating

# Open the gate for promotion
kubectl -n prod set annotation canary/my-app flagger.app/confirm-promotion=true

# Close the gate (rollback)
kubectl -n prod set annotation canary/my-app flagger.app/confirm-promotion=false

Monitoring Rollouts

# Watch canary status
kubectl -n prod get canary my-app -w

# Describe for events
kubectl -n prod describe canary my-app

# Check Flagger logs
kubectl -n flagger-system logs deployment/flagger -f

# List all canaries
kubectl get canaries --all-namespaces

Troubleshooting

IssueSolution
Canary stuck in ProgressingCheck metrics and webhook responses
Metric query returns no dataVerify Prometheus query and label selectors
Traffic not shiftingCheck service mesh configuration
Webhook timeoutIncrease timeout; check loadtester pod
Rollback loopCheck if metric thresholds are too strict
Primary not updatedVerify Flagger controller has RBAC for target namespace
# Debug Flagger controller
kubectl -n flagger-system logs deployment/flagger --tail=100

# Check canary conditions
kubectl -n prod get canary my-app -o jsonpath='{.status.conditions}'

# View canary events
kubectl -n prod events --for canary/my-app

# Restart Flagger
kubectl -n flagger-system rollout restart deployment/flagger

# Check created resources
kubectl -n prod get deploy,svc,virtualservice -l app=my-app