Pular para o conteúdo

Coroot Observability

Ferramenta de observabilidade e APM baseada em eBPF de código aberto com métricas de zero-instrumentação, logs, traces e criação de perfil contínuo para ambientes Kubernetes e Docker.

Installation

Docker Compose (Quickest)

# One-command deployment with ClickHouse and Prometheus
curl -fsS https://raw.githubusercontent.com/coroot/coroot/main/deploy/docker-compose.yaml | \
  docker compose -f - up -d

# Access UI at http://localhost:8080

Kubernetes (Helm)

# Add Coroot Helm repository
helm repo add coroot https://coroot.github.io/helm-charts
helm repo update coroot

# Install the Coroot operator
helm install -n coroot --create-namespace coroot-operator coroot/coroot-operator

# Deploy Community Edition
helm install -n coroot coroot coroot/coroot-ce

# Deploy with ClickHouse replication
helm install -n coroot coroot coroot/coroot-ce \
  --set "clickhouse.shards=2,clickhouse.replicas=2"

# Port forward to access UI
kubectl port-forward -n coroot service/coroot-coroot 8080:8080

# Access UI at http://localhost:8080

Docker Swarm

# Deploy Coroot stack
curl -fsS https://raw.githubusercontent.com/coroot/coroot/main/deploy/docker-swarm-stack.yaml | \
  docker stack deploy -c - coroot

Ubuntu/Debian

# Install Coroot server
curl -sfL https://raw.githubusercontent.com/coroot/coroot/main/deploy/install.sh | \
  BOOTSTRAP_PROMETHEUS_URL="http://PROMETHEUS_IP:9090" \
  BOOTSTRAP_REFRESH_INTERVAL=15s \
  BOOTSTRAP_CLICKHOUSE_ADDRESS=CLICKHOUSE_IP:9000 \
  sh -

RHEL/CentOS

# Same installer works for RHEL-based distributions
curl -sfL https://raw.githubusercontent.com/coroot/coroot/main/deploy/install.sh | \
  BOOTSTRAP_PROMETHEUS_URL="http://PROMETHEUS_IP:9090" \
  BOOTSTRAP_REFRESH_INTERVAL=15s \
  BOOTSTRAP_CLICKHOUSE_ADDRESS=CLICKHOUSE_IP:9000 \
  sh -

Node Agent Installation

Docker

# Run node agent as privileged container
docker run --detach --name coroot-node-agent \
  --pull=always --privileged --pid host \
  -v /sys/kernel/tracing:/sys/kernel/tracing:rw \
  -v /sys/kernel/debug:/sys/kernel/debug:rw \
  -v /sys/fs/cgroup:/host/sys/fs/cgroup:ro \
  ghcr.io/coroot/coroot-node-agent \
  --cgroupfs-root=/host/sys/fs/cgroup \
  --collector-endpoint=http://COROOT_IP:8080

Linux (systemd)

# Install node agent on bare-metal or VMs
curl -sfL https://raw.githubusercontent.com/coroot/coroot-node-agent/main/install.sh | \
  COLLECTOR_ENDPOINT=http://COROOT_IP:8080 \
  SCRAPE_INTERVAL=15s \
  sh -

Kubernetes (via Helm operator)

# Node agent is automatically deployed by the Coroot operator
# No separate installation needed when using Helm

Basic Commands

CommandDescription
docker compose up -dIniciar Coroot com Docker Compose
docker compose downParar todos os serviços Coroot
docker compose logs -fSeguir logs Coroot
helm install coroot coroot/coroot-ceInstalar Coroot em Kubernetes
helm upgrade coroot coroot/coroot-ceAtualizar Coroot
helm uninstall coroot -n corootRemover Coroot do cluster
kubectl port-forward svc/coroot-coroot 8080:8080 -n corootAcessar UI Coroot

Configuration Parameters

Server Configuration

VariableDescriptionDefault
BOOTSTRAP_PROMETHEUS_URLPrometheus server endpointRequired
BOOTSTRAP_REFRESH_INTERVALMetrics collection interval15s
BOOTSTRAP_CLICKHOUSE_ADDRESSClickHouse server addressRequired
LISTEN_ADDRESSHTTP listen address:8080
DATA_DIRData directory path/var/lib/coroot

Node Agent Configuration

FlagDescriptionDefault
--collector-endpointCoroot server endpointRequired
--cgroupfs-rootCgroup filesystem root path/sys/fs/cgroup
--scrape-intervalMetrics scrape interval15s
--log-levelLogging verbosityinfo

Architecture Components

ComponentRole
Coroot ServerCentral dashboard, analysis engine, alerting
Node AgenteBPF-based metric/log collection on each node
Cluster AgentDatabase monitoring (MySQL, PostgreSQL, Redis)
ClickHouseMetrics, logs, traces, and profiles storage
PrometheusMetrics scraping and remote write

Key Features

Zero-Instrumentation Observability

FeatureDescription
Automatic DiscoveryServices auto-discovered via eBPF — no code changes needed
Service MapLive topology map showing all service dependencies
Distributed TracingRequest tracing across microservices without SDK
Log CollectionAutomatic log gathering and pattern clustering
Continuous ProfilingCPU/memory profiling with one-click activation

Monitoring Capabilities

CapabilityDescription
SLO TrackingDefine and monitor Service Level Objectives
Issue DetectionAutomatic identification of 80%+ of issues
Deployment TrackingTrack Kubernetes deployments and rollbacks
Cost MonitoringAWS, GCP, Azure resource cost analysis
Network AnalysisTCP connection metrics, DNS latency, retransmits

Supported Protocols (eBPF)

ProtocolMetrics Collected
HTTP/HTTPSLatency, error rate, throughput
gRPCMethod-level latency and errors
PostgreSQLQuery latency, connections, errors
MySQLQuery performance, slow queries
RedisCommand latency, hit/miss ratio
MongoDBOperation latency, connections
KafkaProducer/consumer lag, throughput
DNSResolution latency, failure rate

Helm Chart Values

Common Overrides

# Custom ClickHouse sizing
helm install coroot coroot/coroot-ce \
  --set clickhouse.shards=3 \
  --set clickhouse.replicas=2 \
  --set clickhouse.storage=100Gi

# Custom Prometheus settings
helm install coroot coroot/coroot-ce \
  --set prometheus.storage=50Gi \
  --set prometheus.retention=30d

# Enable ingress
helm install coroot coroot/coroot-ce \
  --set ingress.enabled=true \
  --set ingress.host=coroot.example.com

Alerting Configuration

Alert TypeDescription
SLO BreachTriggered when SLO target is at risk
Latency Spikep99 latency exceeds threshold
Error RateError percentage exceeds threshold
ResourceCPU, memory, or disk usage anomaly
DeploymentFailed or degraded deployment detected

Notification Channels

ChannelConfiguration
SlackWebhook URL
PagerDutyIntegration key
OpsgenieAPI key
EmailSMTP settings
WebhookCustom HTTP endpoint

Troubleshooting

IssueSolution
No data appearingCheck node agent --collector-endpoint points to Coroot server
Missing servicesVerify node agent runs with --privileged and --pid host
eBPF not loadingEnsure kernel version 4.16+ with BTF support
High memory usageReduce --scrape-interval or limit monitored namespaces
ClickHouse connectionVerify ClickHouse is running and accessible on port 9000

Best Practices

  • Deploy node agents on every node in your cluster for complete visibility
  • Use ClickHouse replication for production deployments (minimum 2 replicas)
  • Set meaningful SLO targets before relying on automatic alerting
  • Start with Docker Compose for evaluation, migrate to Helm for production
  • Configure Prometheus remote write to persist metrics beyond pod restarts
  • Use the built-in profiler to identify CPU/memory hotspots before scaling
  • Enable deployment tracking to correlate performance changes with releases