Coroot Observability

Open-Source-eBPF-basiertes Observability- und APM-Tool mit Zero-Instrumentation Metrics, Logs, Traces und Continuous Profiling für Kubernetes und Docker Umgebungen.

Installation

Docker Compose (Am schnellsten)

# One-Command-Deployment mit ClickHouse und Prometheus
curl -fsS https://raw.githubusercontent.com/coroot/coroot/main/deploy/docker-compose.yaml | \
  docker compose -f - up -d

# UI zugreifen unter http://localhost:8080

Kubernetes (Helm)

# Coroot Helm Repository hinzufügen
helm repo add coroot https://coroot.github.io/helm-charts
helm repo update coroot

# Coroot Operator installieren
helm install -n coroot --create-namespace coroot-operator coroot/coroot-operator

# Community Edition deployen
helm install -n coroot coroot coroot/coroot-ce

# Mit ClickHouse Replication deployen
helm install -n coroot coroot coroot/coroot-ce \
  --set "clickhouse.shards=2,clickhouse.replicas=2"

# Port Forward, um auf UI zuzugreifen
kubectl port-forward -n coroot service/coroot-coroot 8080:8080

# UI zugreifen unter http://localhost:8080

Docker Swarm

# Coroot Stack deployen
curl -fsS https://raw.githubusercontent.com/coroot/coroot/main/deploy/docker-swarm-stack.yaml | \
  docker stack deploy -c - coroot

Ubuntu/Debian

# Coroot Server installieren
curl -sfL https://raw.githubusercontent.com/coroot/coroot/main/deploy/install.sh | \
  BOOTSTRAP_PROMETHEUS_URL="http://PROMETHEUS_IP:9090" \
  BOOTSTRAP_REFRESH_INTERVAL=15s \
  BOOTSTRAP_CLICKHOUSE_ADDRESS=CLICKHOUSE_IP:9000 \
  sh -

RHEL/CentOS

# Gleicher Installer funktioniert auch für RHEL-basierte Distributionen
curl -sfL https://raw.githubusercontent.com/coroot/coroot/main/deploy/install.sh | \
  BOOTSTRAP_PROMETHEUS_URL="http://PROMETHEUS_IP:9090" \
  BOOTSTRAP_REFRESH_INTERVAL=15s \
  BOOTSTRAP_CLICKHOUSE_ADDRESS=CLICKHOUSE_IP:9000 \
  sh -

Node Agent Installation

Docker

# Node Agent als privilegierten Container ausführen
docker run --detach --name coroot-node-agent \
  --pull=always --privileged --pid host \
  -v /sys/kernel/tracing:/sys/kernel/tracing:rw \
  -v /sys/kernel/debug:/sys/kernel/debug:rw \
  -v /sys/fs/cgroup:/host/sys/fs/cgroup:ro \
  ghcr.io/coroot/coroot-node-agent \
  --cgroupfs-root=/host/sys/fs/cgroup \
  --collector-endpoint=http://COROOT_IP:8080

Linux (systemd)

# Node Agent auf Bare-Metal oder VMs installieren
curl -sfL https://raw.githubusercontent.com/coroot/coroot-node-agent/main/install.sh | \
  COLLECTOR_ENDPOINT=http://COROOT_IP:8080 \
  SCRAPE_INTERVAL=15s \
  sh -

Kubernetes (über Helm Operator)

# Node Agent wird automatisch vom Coroot Operator deployt
# Keine separate Installation nötig, wenn Helm verwendet wird

Grundlegende Befehle

Befehl	Beschreibung
`docker compose up -d`	Coroot mit Docker Compose starten
`docker compose down`	Alle Coroot Services stoppen
`docker compose logs -f`	Coroot Logs folgen
`helm install coroot coroot/coroot-ce`	Coroot auf Kubernetes installieren
`helm upgrade coroot coroot/coroot-ce`	Coroot upgraden
`helm uninstall coroot -n coroot`	Coroot aus Cluster entfernen
`kubectl port-forward svc/coroot-coroot 8080:8080 -n coroot`	UI zugreifen

Konfigurationsparameter

Server-Konfiguration

Variable	Beschreibung	Default
`BOOTSTRAP_PROMETHEUS_URL`	Prometheus Server Endpoint	Erforderlich
`BOOTSTRAP_REFRESH_INTERVAL`	Metrics Collection Interval	`15s`
`BOOTSTRAP_CLICKHOUSE_ADDRESS`	ClickHouse Server Adresse	Erforderlich
`LISTEN_ADDRESS`	HTTP Listen Adresse	`:8080`
`DATA_DIR`	Data Directory Path	`/var/lib/coroot`

Node Agent-Konfiguration

Flag	Beschreibung	Default
`--collector-endpoint`	Coroot Server Endpoint	Erforderlich
`--cgroupfs-root`	Cgroup Filesystem Root Path	`/sys/fs/cgroup`
`--scrape-interval`	Metrics Scrape Interval	`15s`
`--log-level`	Logging Verbosity	`info`

Architektur-Komponenten

Komponente	Rolle
Coroot Server	Zentrales Dashboard, Analyse-Engine, Alerting
Node Agent	eBPF-basierte Metric/Log Collection auf jedem Node
Cluster Agent	Database Monitoring (MySQL, PostgreSQL, Redis)
ClickHouse	Metrics, Logs, Traces und Profiles Storage
Prometheus	Metrics Scraping und Remote Write

Wichtigste Features

Zero-Instrumentation Observability

Feature	Beschreibung
Automatic Discovery	Services werden automatisch via eBPF discovered — keine Code Changes notwendig
Service Map	Live Topology Map zeigt alle Service Dependencies
Distributed Tracing	Request Tracing über Microservices ohne SDK
Log Collection	Automatische Log-Sammlung und Pattern Clustering
Continuous Profiling	CPU/Memory Profiling mit One-Click Aktivation

Monitoring Fähigkeiten

Fähigkeit	Beschreibung
SLO Tracking	Define und Monitor Service Level Objectives
Issue Detection	Automatische Identifikation von 80%+ der Issues
Deployment Tracking	Track Kubernetes Deployments und Rollbacks
Cost Monitoring	AWS, GCP, Azure Resource Cost Analysis
Network Analysis	TCP Connection Metrics, DNS Latency, Retransmits

Unterstützte Protokolle (eBPF)

Protokoll	Collected Metrics
HTTP/HTTPS	Latency, Error Rate, Throughput
gRPC	Method-level Latency und Errors
PostgreSQL	Query Latency, Connections, Errors
MySQL	Query Performance, Slow Queries
Redis	Command Latency, Hit/Miss Ratio
MongoDB	Operation Latency, Connections
Kafka	Producer/Consumer Lag, Throughput
DNS	Resolution Latency, Failure Rate

Helm Chart Values

Häufige Overrides

# Custom ClickHouse Sizing
helm install coroot coroot/coroot-ce \
  --set clickhouse.shards=3 \
  --set clickhouse.replicas=2 \
  --set clickhouse.storage=100Gi

# Custom Prometheus Settings
helm install coroot coroot/coroot-ce \
  --set prometheus.storage=50Gi \
  --set prometheus.retention=30d

# Enable Ingress
helm install coroot coroot/coroot-ce \
  --set ingress.enabled=true \
  --set ingress.host=coroot.example.com

Alerting-Konfiguration

Alert Type	Beschreibung
SLO Breach	Triggered, wenn SLO Target in Gefahr ist
Latency Spike	p99 Latency übersteigt Threshold
Error Rate	Error Percentage übersteigt Threshold
Resource	CPU, Memory oder Disk Usage Anomalie
Deployment	Failed oder Degraded Deployment erkannt

Notification Channels

Channel	Konfiguration
Slack	Webhook URL
PagerDuty	Integration Key
Opsgenie	API Key
Email	SMTP Settings
Webhook	Custom HTTP Endpoint

Troubleshooting

Issue	Lösung
Keine Daten sichtbar	Überprüfen Sie, ob Node Agent `--collector-endpoint` auf Coroot Server verweist
Fehlende Services	Überprüfen Sie, ob Node Agent mit `--privileged` und `--pid host` läuft
eBPF nicht geladen	Stellen Sie sicher, dass Kernel Version 4.16+ mit BTF Support vorhanden ist
Hohe Memory Usage	Reduzieren Sie `--scrape-interval` oder limitieren Sie monitored Namespaces
ClickHouse Connection	Überprüfen Sie, ob ClickHouse läuft und auf Port 9000 erreichbar ist

Best Practices

Node Agents auf jedem Node im Cluster deployen für komplette Visibility
ClickHouse Replication für Produktions-Deployments verwenden (Minimum 2 Replicas)
Sinnvolle SLO Targets setzen, bevor Sie sich auf automatisches Alerting verlassen
Mit Docker Compose zum Evaluieren beginnen, dann zu Helm für Produktion migrieren
Prometheus Remote Write konfigurieren, um Metrics über Pod Restarts zu persistieren
Built-in Profiler verwenden, um CPU/Memory Hotspots vor Scaling zu identifizieren
Deployment Tracking aktivieren, um Performance Changes mit Releases zu korrelieren