تخطَّ إلى المحتوى

Komodor Cheat Sheet

Overview

Komodor is a Kubernetes troubleshooting and monitoring platform that provides end-to-end visibility into the entire Kubernetes stack by tracking every change across deployments, configurations, infrastructure, and code. It automatically correlates these changes with issues like pod failures, performance degradation, and service disruptions, enabling teams to quickly identify root causes without manually sifting through logs, events, and metrics across multiple tools.

The platform provides a unified timeline view that shows what changed, when, by whom, and what impact it had on services. Komodor integrates with CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, ArgoCD), monitoring tools (Datadog, Prometheus, PagerDuty), and communication platforms (Slack, Teams) to create a complete picture of system changes. Its automated root cause analysis and intelligent recommendations significantly reduce mean time to resolution (MTTR) for Kubernetes incidents.

Installation

Agent Installation via Helm

# Add Komodor Helm repo
helm repo add komodor https://helm-charts.komodor.io
helm repo update

# Install Komodor agent
helm install komodor-agent komodor/komodor-agent \
  --namespace komodor \
  --create-namespace \
  --set apiKey="your-komodor-api-key" \
  --set clusterName="production-us-east" \
  --set watcher.enableAgentTaskExecution=true

# Install with Prometheus metrics collection
helm install komodor-agent komodor/komodor-agent \
  --namespace komodor \
  --create-namespace \
  --set apiKey="your-komodor-api-key" \
  --set clusterName="production-us-east" \
  --set metrics.enabled=true

# Verify installation
kubectl get pods -n komodor

# Check agent status
kubectl logs -n komodor -l app=komodor-agent --tail=20

Helm Values Configuration

# komodor-values.yaml
apiKey: "your-api-key"
clusterName: "production-us-east"

watcher:
  enableAgentTaskExecution: true
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

  # Namespace filtering
  namespacesDenylist:
    - kube-system
  # Or use allowlist
  # namespacesAllowlist:
  #   - production
  #   - staging

  # Resource types to watch
  watchedResources:
    deployment: true
    statefulset: true
    daemonset: true
    job: true
    cronjob: true
    pod: true
    service: true
    configmap: true
    secret: false  # Disable secret watching for security
    ingress: true
    hpa: true
    pdb: true

metrics:
  enabled: true
  serviceMonitor:
    enabled: false

# Communication settings
communications:
  slack:
    enabled: true
  teams:
    enabled: true

Upgrade Agent

# Upgrade to latest version
helm repo update
helm upgrade komodor-agent komodor/komodor-agent \
  --namespace komodor \
  --values komodor-values.yaml

# Verify upgrade
kubectl get pods -n komodor -w

Core Commands — API

Service and Resource Management

# Set API credentials
export KOMODOR_API_KEY="your-api-key"
export KOMODOR_API="https://app.komodor.com/api/v1"

# List monitored services
curl -s "$KOMODOR_API/services" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.services[] | {name, namespace, cluster}'

# Get service details
curl -s "$KOMODOR_API/services/SERVICE_ID" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.'

# Get service events timeline
curl -s "$KOMODOR_API/services/SERVICE_ID/events?from=2026-05-17T00:00:00Z&to=2026-05-18T23:59:59Z" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.events[] | {type, message, timestamp}'

# Get deployment history
curl -s "$KOMODOR_API/services/SERVICE_ID/deploys" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.deploys[] | {version, status, triggeredBy, timestamp}'

# Get pod status for a service
curl -s "$KOMODOR_API/services/SERVICE_ID/pods" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.pods[] | {name, status, restarts, node}'

Event and Change Tracking

# Get all events across cluster
curl -s "$KOMODOR_API/events?cluster=production-us-east&from=2026-05-18T00:00:00Z" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.events[] | {type, resource, summary, time}'

# Filter events by type
curl -s "$KOMODOR_API/events?cluster=production-us-east&type=deploy" \
  -H "Authorization: Bearer $KOMODOR_API_KEY"

# Get config changes
curl -s "$KOMODOR_API/events?cluster=production-us-east&type=config_change" \
  -H "Authorization: Bearer $KOMODOR_API_KEY"

# Get infrastructure events (node issues, HPA, etc.)
curl -s "$KOMODOR_API/events?cluster=production-us-east&type=infrastructure" \
  -H "Authorization: Bearer $KOMODOR_API_KEY"

# Search events
curl -s "$KOMODOR_API/events?search=OOMKilled&from=2026-05-11T00:00:00Z" \
  -H "Authorization: Bearer $KOMODOR_API_KEY"

Health and Availability

# Get cluster health overview
curl -s "$KOMODOR_API/clusters/production-us-east/health" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '{healthy, unhealthy, warning, total}'

# Get unhealthy workloads
curl -s "$KOMODOR_API/clusters/production-us-east/workloads?status=unhealthy" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.workloads[] | {name, namespace, issue}'

# Get availability metrics for a service
curl -s "$KOMODOR_API/services/SERVICE_ID/availability?window=7d" \
  -H "Authorization: Bearer $KOMODOR_API_KEY"

Configuration

CI/CD Integration (GitHub Actions)

# .github/workflows/deploy.yml — Notify Komodor of deployments
name: Deploy and Notify Komodor
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Kubernetes
        run: kubectl apply -f k8s/

      - name: Notify Komodor
        uses: komodorio/komodor-github-action@v1
        with:
          apiKey: ${{ secrets.KOMODOR_API_KEY }}
          service: "payment-api"
          cluster: "production-us-east"
          namespace: "production"
          status: "success"
          deploymentVersion: ${{ github.sha }}
          description: "Deployed commit ${{ github.sha }}"

ArgoCD Integration

# Komodor automatically detects ArgoCD deployments
# when the agent is installed in the same cluster

# Additional ArgoCD configuration in Helm values:
argocd:
  enabled: true
  # Komodor will track ArgoCD Application resources
  # and correlate them with deployment changes

Alert and Notification Configuration

# Configure in Komodor dashboard or via API

# Slack notification rule
curl -X POST "$KOMODOR_API/notifications/rules" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Critical Pod Failures",
    "enabled": true,
    "conditions": {
      "event_types": ["pod_failure", "oom_kill", "crash_loop"],
      "namespaces": ["production"],
      "severity": ["critical", "high"]
    },
    "channels": [
      {
        "type": "slack",
        "channel": "#k8s-critical",
        "webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
      }
    ]
  }'

# PagerDuty integration
curl -X POST "$KOMODOR_API/integrations/pagerduty" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "integration_key": "pagerduty-routing-key",
    "severity_mapping": {
      "critical": "critical",
      "high": "error",
      "medium": "warning"
    }
  }'

Annotation-Based Configuration

# Add Komodor annotations to deployments
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
  namespace: production
  annotations:
    # Link to source code
    app.komodor.com/git.repository: "https://github.com/org/payment-api"
    app.komodor.com/git.ref: "main"
    
    # Team ownership
    app.komodor.com/team: "payments"
    
    # Service tier
    app.komodor.com/tier: "critical"
    
    # Related resources
    app.komodor.com/relates-to: "postgres-payment,redis-cache"
    
    # Deploy tracking
    app.komodor.com/deploy.link: "https://github.com/org/payment-api/actions/runs/12345"
    app.komodor.com/deploy.user: "deployer@company.com"

Advanced Usage

Automated Remediation

# Komodor supports automated actions via the agent

# Enable agent task execution in Helm values
# watcher.enableAgentTaskExecution: true

# Available automated actions:
# - Rollback deployment to previous version
# - Restart pods
# - Scale deployment
# - Cordon/uncordon nodes

# Configure auto-rollback rule
curl -X POST "$KOMODOR_API/automation/rules" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Auto-rollback on crash loop",
    "enabled": true,
    "trigger": {
      "type": "crash_loop_backoff",
      "threshold": 5,
      "window_minutes": 10
    },
    "action": {
      "type": "rollback",
      "to": "previous_stable"
    },
    "scope": {
      "clusters": ["production-us-east"],
      "namespaces": ["production"],
      "labels": {"tier": "critical"}
    }
  }'

Multi-Cluster Management

# Install agent on each cluster with unique cluster names
for cluster in production-us-east production-eu-west staging; do
  kubectl config use-context "$cluster"
  helm install komodor-agent komodor/komodor-agent \
    --namespace komodor \
    --create-namespace \
    --set apiKey="$KOMODOR_API_KEY" \
    --set clusterName="$cluster"
done

# Query across clusters
curl -s "$KOMODOR_API/events?from=2026-05-18T00:00:00Z" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.events[] | {cluster, type, message}'

# Get cross-cluster service topology
curl -s "$KOMODOR_API/topology?clusters=production-us-east,production-eu-west" \
  -H "Authorization: Bearer $KOMODOR_API_KEY"

Cost Analysis

# Get cost insights (if cost module enabled)
curl -s "$KOMODOR_API/costs/overview?window=30d" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '{totalCost, byNamespace}'

# Get per-service cost
curl -s "$KOMODOR_API/costs/services?window=7d&sort=cost_desc" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.services[] | {name, monthlyCost, efficiency}'

# Right-sizing recommendations
curl -s "$KOMODOR_API/costs/recommendations" \
  -H "Authorization: Bearer $KOMODOR_API_KEY" | jq '.recommendations[] | {service, currentCost, recommendedCost, savings}'

Troubleshooting

IssueCauseSolution
Agent not connectingAPI key invalid or network blockedVerify API key and allow outbound to *.komodor.com:443
Events missingNamespace filtered outCheck namespacesDenylist in Helm values
Deployments not trackedAgent RBAC insufficientVerify ClusterRole has watch/list on deployments
Slack notifications not sendingWebhook URL expiredRegenerate Slack webhook and update in Komodor
Metrics not showingmetrics.enabled set to falseSet metrics.enabled=true in Helm values
Config changes not detectedSecret watching disabledEnable watchedResources.configmap: true
ArgoCD apps not correlatedArgoCD integration not enabledSet argocd.enabled: true in Helm values
High agent resource usageToo many resources being watchedFilter namespaces and resource types
# Check agent logs
kubectl logs -n komodor -l app=komodor-agent --tail=100

# Verify agent connectivity
kubectl exec -n komodor deploy/komodor-agent -- wget -qO- https://app.komodor.com/health

# Check agent version
kubectl get deploy -n komodor komodor-agent -o jsonpath='{.spec.template.spec.containers[0].image}'

# Restart agent
kubectl rollout restart deployment -n komodor komodor-agent

# Verify RBAC permissions
kubectl auth can-i list deployments --as=system:serviceaccount:komodor:komodor-agent

# Debug: check events are being captured
kubectl get events -A --sort-by='.lastTimestamp' | tail -20