Flagger Cheat Sheet
Overview
Flagger is a progressive delivery operator for Kubernetes that automates the promotion of canary deployments using service mesh routing, ingress controllers, or custom traffic management. It monitors metrics from providers like Prometheus, Datadog, or CloudWatch and progressively shifts traffic to new versions while running automated analysis.
Flagger works with Istio, Linkerd, App Mesh, NGINX, Contour, Gloo, and other service meshes and ingress controllers. It creates the canary and primary deployments automatically, manages traffic splitting, runs automated tests via webhooks, and rolls back on failure — all driven by Kubernetes custom resources.
Installation
# Install with Helm
helm repo add flagger https://flagger.app
helm upgrade -i flagger flagger/flagger \
--namespace flagger-system \
--create-namespace \
--set meshProvider=istio \
--set metricsServer=http://prometheus.monitoring:9090
# Install Flagger load tester (optional)
helm upgrade -i flagger-loadtester flagger/loadtester \
--namespace flagger-system
# Install with kubectl
kubectl apply -k github.com/fluxcd/flagger/kustomize/istio
# Verify
kubectl -n flagger-system get pods
Core Concepts
| Resource | Description |
|---|---|
Canary | Main CRD defining deployment, analysis, and promotion |
MetricTemplate | Custom metric queries for analysis |
AlertProvider | Notification configuration (Slack, Teams, etc.) |
Canary Configuration
Basic Canary (Istio)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
namespace: prod
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
service:
port: 80
targetPort: 8080
gateways:
- public-gateway.istio-system.svc.cluster.local
hosts:
- app.example.com
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
webhooks:
- name: load-test
url: http://flagger-loadtester.flagger-system/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.prod:80/"
Blue-Green (Kubernetes)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
service:
port: 80
analysis:
interval: 1m
threshold: 3
iterations: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.flagger-system/
timeout: 30s
metadata:
type: bash
cmd: "curl -s http://my-app-canary.prod:80/healthz | grep ok"
NGINX Ingress Canary
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
provider: nginx
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
ingressRef:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: my-app
service:
port: 80
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
Custom Metrics
MetricTemplate
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: error-rate
namespace: flagger-system
spec:
provider:
type: prometheus
address: http://prometheus.monitoring:9090
query: |
100 - sum(
rate(http_requests_total{
namespace="{{ namespace }}",
pod=~"{{ target }}-[0-9a-zA-Z]+",
status!~"5.*"
}[{{ interval }}])
) / sum(
rate(http_requests_total{
namespace="{{ namespace }}",
pod=~"{{ target }}-[0-9a-zA-Z]+",
}[{{ interval }}])
) * 100
# Reference in Canary
analysis:
metrics:
- name: error-rate
templateRef:
name: error-rate
namespace: flagger-system
thresholdRange:
max: 1
interval: 1m
Configuration
Webhooks
analysis:
webhooks:
# Pre-rollout test
- name: smoke-test
type: pre-rollout
url: http://flagger-loadtester.flagger-system/
timeout: 15s
metadata:
type: bash
cmd: "curl -sf http://my-app-canary.prod:80/healthz"
# Load test during canary
- name: load-test
type: rollout
url: http://flagger-loadtester.flagger-system/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.prod:80/"
# Post-rollout notification
- name: notify
type: post-rollout
url: http://flagger-loadtester.flagger-system/
metadata:
type: bash
cmd: "echo 'Deployment complete'"
# Confirm promotion (manual gate)
- name: confirm
type: confirm-promotion
url: http://flagger-loadtester.flagger-system/gate/check
Alert Providers
apiVersion: flagger.app/v1beta1
kind: AlertProvider
metadata:
name: slack
namespace: flagger-system
spec:
type: slack
channel: deployments
address: https://hooks.slack.com/services/xxx/yyy/zzz
# Reference in Canary
spec:
analysis:
alerts:
- name: slack
severity: info
providerRef:
name: slack
namespace: flagger-system
Advanced Usage
A/B Testing
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
service:
port: 80
analysis:
interval: 1m
threshold: 10
iterations: 10
match:
- headers:
x-canary:
exact: "insider"
- headers:
cookie:
regex: "^(.*?;)?(canary=always)(;.*)?$"
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
HPA Autoscaler Reference
spec:
autoscalerRef:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
name: my-app
Manual Gating
# Open the gate for promotion
kubectl -n prod set annotation canary/my-app flagger.app/confirm-promotion=true
# Close the gate (rollback)
kubectl -n prod set annotation canary/my-app flagger.app/confirm-promotion=false
Monitoring Rollouts
# Watch canary status
kubectl -n prod get canary my-app -w
# Describe for events
kubectl -n prod describe canary my-app
# Check Flagger logs
kubectl -n flagger-system logs deployment/flagger -f
# List all canaries
kubectl get canaries --all-namespaces
Troubleshooting
| Issue | Solution |
|---|---|
| Canary stuck in Progressing | Check metrics and webhook responses |
| Metric query returns no data | Verify Prometheus query and label selectors |
| Traffic not shifting | Check service mesh configuration |
| Webhook timeout | Increase timeout; check loadtester pod |
| Rollback loop | Check if metric thresholds are too strict |
| Primary not updated | Verify Flagger controller has RBAC for target namespace |
# Debug Flagger controller
kubectl -n flagger-system logs deployment/flagger --tail=100
# Check canary conditions
kubectl -n prod get canary my-app -o jsonpath='{.status.conditions}'
# View canary events
kubectl -n prod events --for canary/my-app
# Restart Flagger
kubectl -n flagger-system rollout restart deployment/flagger
# Check created resources
kubectl -n prod get deploy,svc,virtualservice -l app=my-app