Salta ai contenuti

Grafana Loki Cheat Sheet

Overview

Grafana Loki is a horizontally scalable, highly available log aggregation system designed to be cost-effective and easy to operate. Unlike traditional log systems (Elasticsearch/ELK) that index the full text of every log line, Loki only indexes metadata (labels) about your logs and stores compressed log content in cheap object storage. This approach dramatically reduces storage costs and operational complexity while still allowing fast querying through its LogQL query language.

Loki is modeled after Prometheus and uses the same label-based approach for organizing and querying data. Logs are collected by agents like Promtail, Grafana Alloy, or Fluentd/Fluent Bit and sent to Loki via HTTP push. Loki can run as a single binary (monolithic mode), as microservices for high-scale deployments, or in Simple Scalable Deployment (SSD) mode that balances simplicity with scalability. It integrates seamlessly with Grafana for visualization and alerting, and pairs naturally with Prometheus metrics and Tempo traces for full observability.

Installation

Docker

# Run Loki
docker run -d --name loki \
  -p 3100:3100 \
  -v $(pwd)/loki-config.yaml:/etc/loki/local-config.yaml \
  grafana/loki:3.0.0

# Run Promtail (log collector)
docker run -d --name promtail \
  -v /var/log:/var/log \
  -v $(pwd)/promtail-config.yaml:/etc/promtail/config.yaml \
  grafana/promtail:3.0.0

Helm (Kubernetes)

# Add Grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Simple Scalable Deployment
helm install loki grafana/loki \
  --namespace loki --create-namespace \
  --set loki.storage.type=s3 \
  --set loki.storage.s3.endpoint=minio.minio:9000 \
  --set loki.storage.s3.bucketnames=loki-data \
  --set loki.storage.s3.access_key_id=minioadmin \
  --set loki.storage.s3.secret_access_key=minioadmin

# Install Promtail as DaemonSet
helm install promtail grafana/promtail --namespace loki

Binary

# Download Loki
wget https://github.com/grafana/loki/releases/download/v3.0.0/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki

# Download LogCLI
wget https://github.com/grafana/loki/releases/download/v3.0.0/logcli-linux-amd64.zip
unzip logcli-linux-amd64.zip
sudo mv logcli-linux-amd64 /usr/local/bin/logcli

Configuration

Loki Config (Monolithic)

# loki-config.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_entries_limit_per_query: 5000
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

Promtail Config

# promtail-config.yaml
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets: [localhost]
        labels:
          job: varlogs
          host: myserver
          __path__: /var/log/*.log

  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        target_label: 'container'

  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'

LogQL Query Language

Log Stream Selection

# Select by label
{job="nginx"}
{namespace="production", container="api"}
{host=~"web-.*"}
{level!="debug"}

Log Pipeline (Filtering)

# Line filter
{job="nginx"} |= "error"
{job="nginx"} != "healthcheck"
{job="nginx"} |~ "status=[45]\\d\\d"
{job="nginx"} !~ "GET /favicon"

# JSON parsing
{job="api"} | json | status >= 400
{job="api"} | json | line_format "{{.method}} {{.path}} {{.status}}"

# Logfmt parsing
{job="app"} | logfmt | level="error" | duration > 5s

# Pattern parsing
{job="nginx"} | pattern `<ip> - - <_> "<method> <uri> <_>" <status> <size>`
  | status >= 400

Metric Queries

# Count errors per minute
rate({job="nginx"} |= "error" [1m])

# Bytes rate
bytes_rate({job="nginx"} [5m])

# Top 10 paths by request count
topk(10, sum by (path) (rate({job="nginx"} | json [5m])))

# Error rate percentage
sum(rate({job="nginx"} |= "error" [5m])) / sum(rate({job="nginx"} [5m])) * 100

# P99 latency from log lines
quantile_over_time(0.99, {job="api"} | json | unwrap duration [5m]) by (endpoint)

LogCLI Usage

CommandDescription
logcli query '{job="nginx"}'Query logs
logcli labelsList available labels
logcli labels jobList values for a label
logcli series '{job="nginx"}'List log streams
logcli instant-query 'rate({job="nginx"}[5m])'Run metric query
# Configure LogCLI
export LOKI_ADDR=http://localhost:3100

# Query last hour
logcli query '{job="nginx"}' --since=1h --limit=100

# Tail logs
logcli query '{job="nginx"} |= "error"' --tail

# Output as JSON
logcli query '{job="api"}' --output=jsonl

# Query with time range
logcli query '{job="nginx"}' \
  --from="2024-01-01T00:00:00Z" \
  --to="2024-01-02T00:00:00Z"

Advanced Usage

Loki Alerting Rules

# /loki/rules/alerts.yaml
groups:
  - name: high-error-rate
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({job="api"} |= "error" [5m])) by (service)
            /
          sum(rate({job="api"} [5m])) by (service)
            > 0.05
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.service }}"

S3 Backend Storage

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/tsdb-index
    cache_location: /loki/tsdb-cache
  aws:
    s3: s3://access_key:secret_key@us-east-1/loki-chunks
    s3forcepathstyle: true

Recording Rules

groups:
  - name: nginx_metrics
    interval: 1m
    rules:
      - record: nginx:requests:rate5m
        expr: sum(rate({job="nginx"} [5m]))
      - record: nginx:errors:rate5m
        expr: sum(rate({job="nginx"} |= "error" [5m]))

Troubleshooting

IssueSolution
No logs appearingCheck Promtail targets at :9080/targets; verify Loki URL in client config
entry out of orderLogs must be in chronological order per stream; check unordered_writes: true
Query timeoutAdd more specific label matchers; reduce time range; increase query_timeout
High memory usageReduce max_entries_limit_per_query; add caching; use SSD mode
too many outstanding requestsIncrease max_outstanding_per_tenant or add more read replicas
Labels cardinality too highAvoid dynamic labels (IPs, UUIDs); use structured metadata instead
Chunks not being flushedCheck chunk_idle_period and chunk_retain_period settings