Apache Druid Cheat Sheet
Overview
Apache Druid is a high-performance, real-time analytics database designed for workflows where fast queries and ingest are important. It combines ideas from time-series databases, data warehouses, and search systems to deliver sub-second OLAP queries at scale. Druid stores data in a custom columnar format with automatic indexing, bitmap indexes, and dictionary encoding, enabling it to scan billions of rows in milliseconds.
Druid excels at powering user-facing analytics applications, real-time dashboards, and operational monitoring where query latency matters. It supports both streaming ingestion from Kafka/Kinesis and batch ingestion from HDFS/S3. The architecture separates query processing, data management, and ingestion into independent services that can be scaled independently. Druid integrates with BI tools via its SQL interface and supports native JSON queries for advanced analytics.
Installation
Docker Quickstart
# Clone Druid repository for docker-compose
git clone https://github.com/apache/druid.git
cd druid
git checkout 29.0.1
# Start with Docker Compose (single-server micro-quickstart)
docker compose -f distribution/docker/docker-compose.yml up -d
# Access Druid Console at http://localhost:8888
Standalone Installation
# Download Druid
wget https://dlcdn.apache.org/druid/29.0.1/apache-druid-29.0.1-bin.tar.gz
tar -xzf apache-druid-29.0.1-bin.tar.gz
cd apache-druid-29.0.1
# Start single-server (micro-quickstart)
./bin/start-druid
# Or start with specific configuration
./bin/start-druid -c conf/druid/single-server/medium
Kubernetes via Helm
helm repo add druid https://druid-io.github.io/druid-helm/
helm install druid druid/druid \
--namespace druid \
--create-namespace \
--set image.tag=29.0.1
SQL Queries
-- Basic aggregation query
SELECT
DATE_TRUNC('hour', __time) AS hour,
country,
COUNT(*) AS event_count,
SUM(revenue) AS total_revenue,
AVG(latency_ms) AS avg_latency
FROM events
WHERE __time >= TIMESTAMP '2024-06-01'
AND __time < TIMESTAMP '2024-07-01'
GROUP BY 1, 2
ORDER BY total_revenue DESC
LIMIT 100;
-- Approximate count distinct
SELECT
page,
APPROX_COUNT_DISTINCT(user_id) AS unique_visitors,
COUNT(*) AS page_views
FROM web_events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
GROUP BY page
HAVING COUNT(*) > 100
ORDER BY unique_visitors DESC;
-- Time-series with granularity
SELECT
TIME_FLOOR(__time, 'PT5M') AS five_min_bucket,
COUNT(*) AS requests,
APPROX_QUANTILE_DS(response_time, 0.99) AS p99_response
FROM api_requests
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
GROUP BY 1
ORDER BY 1;
-- Top-N query
SELECT
product_category,
SUM(sales_amount) AS total_sales
FROM sales
WHERE __time >= TIMESTAMP '2024-01-01'
GROUP BY product_category
ORDER BY total_sales DESC
LIMIT 10;
Ingestion
Kafka Streaming Ingestion
{
"type": "kafka",
"spec": {
"ioConfig": {
"type": "kafka",
"consumerProperties": {
"bootstrap.servers": "kafka:9092"
},
"topic": "events",
"inputFormat": {
"type": "json"
},
"useEarliestOffset": true
},
"tuningConfig": {
"type": "kafka",
"maxRowsPerSegment": 5000000,
"maxRowsInMemory": 100000,
"intermediatePersistPeriod": "PT10M"
},
"dataSchema": {
"dataSource": "events",
"timestampSpec": {
"column": "event_time",
"format": "iso"
},
"dimensionsSpec": {
"dimensions": [
"user_id",
"event_type",
"country",
{"name": "device", "type": "string"},
{"name": "is_mobile", "type": "long"}
]
},
"granularitySpec": {
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
},
"metricsSpec": [
{"type": "count", "name": "count"},
{"type": "longSum", "name": "total_events", "fieldName": "event_count"},
{"type": "doubleSum", "name": "revenue", "fieldName": "amount"},
{"type": "HLLSketchBuild", "name": "unique_users", "fieldName": "user_id"}
]
}
}
}
Batch Ingestion (S3)
{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "s3",
"uris": ["s3://bucket/data/2024/06/"],
"properties": {
"accessKeyId": {"type": "environment", "variable": "AWS_ACCESS_KEY_ID"},
"secretAccessKey": {"type": "environment", "variable": "AWS_SECRET_ACCESS_KEY"}
}
},
"inputFormat": {
"type": "parquet"
}
},
"tuningConfig": {
"type": "index_parallel",
"maxNumConcurrentSubTasks": 4,
"partitionsSpec": {
"type": "hashed",
"numShards": 8,
"partitionDimensions": ["country"]
}
},
"dataSchema": {
"dataSource": "historical_events",
"timestampSpec": {"column": "timestamp", "format": "iso"},
"dimensionsSpec": {"dimensions": ["user_id", "event_type", "country"]},
"granularitySpec": {"segmentGranularity": "DAY", "queryGranularity": "HOUR"}
}
}
}
API Commands
| Endpoint | Description |
|---|---|
POST /druid/v2/sql | Execute SQL query |
POST /druid/indexer/v1/supervisor | Submit streaming ingestion supervisor |
GET /druid/indexer/v1/supervisor | List all supervisors |
POST /druid/indexer/v1/supervisor/{id}/suspend | Suspend a supervisor |
POST /druid/indexer/v1/supervisor/{id}/resume | Resume a supervisor |
POST /druid/indexer/v1/supervisor/{id}/terminate | Terminate a supervisor |
POST /druid/indexer/v1/task | Submit a batch ingestion task |
GET /druid/indexer/v1/task/{id}/status | Get task status |
GET /druid/coordinator/v1/datasources | List all datasources |
DELETE /druid/coordinator/v1/datasources/{ds} | Disable a datasource |
POST /druid/coordinator/v1/datasources/{ds} | Enable a datasource |
GET /status | Get server status and version |
# Execute SQL query via API
curl -X POST http://localhost:8888/druid/v2/sql \
-H "Content-Type: application/json" \
-d '{"query": "SELECT COUNT(*) FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '\''1'\'' HOUR"}'
# Submit ingestion spec
curl -X POST http://localhost:8888/druid/indexer/v1/supervisor \
-H "Content-Type: application/json" \
-d @kafka-ingestion.json
# Check supervisor status
curl http://localhost:8888/druid/indexer/v1/supervisor/events/status | jq
Configuration
Common Runtime Properties
# common.runtime.properties
druid.extensions.loadList=["druid-kafka-indexing-service","druid-s3-extensions","druid-hdfs-storage","druid-parquet-extensions"]
# Deep storage (S3)
druid.storage.type=s3
druid.storage.bucket=druid-segments
druid.storage.baseKey=druid/segments
druid.s3.accessKey=...
druid.s3.secretKey=...
# Metadata storage (PostgreSQL)
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://postgres:5432/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=druid
# ZooKeeper
druid.zk.service.host=zk1:2181,zk2:2181,zk3:2181
Advanced Usage
Rollup and Pre-Aggregation
{
"granularitySpec": {
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
},
"metricsSpec": [
{"type": "count", "name": "count"},
{"type": "longSum", "name": "total_bytes", "fieldName": "bytes"},
{"type": "doubleMin", "name": "min_latency", "fieldName": "latency"},
{"type": "doubleMax", "name": "max_latency", "fieldName": "latency"},
{"type": "thetaSketch", "name": "unique_users", "fieldName": "user_id", "size": 16384}
]
}
Compaction
{
"type": "compact",
"dataSource": "events",
"interval": "2024-06-01/2024-07-01",
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 5000000,
"maxNumConcurrentSubTasks": 4,
"partitionsSpec": {
"type": "range",
"partitionDimensions": ["country"],
"targetRowsPerSegment": 5000000
}
}
}
Multi-Stage Query Engine (MSQ)
-- Insert using MSQ (for batch ETL within Druid)
INSERT INTO daily_summary
SELECT
TIME_FLOOR(__time, 'P1D') AS __time,
country,
COUNT(*) AS event_count,
SUM(revenue) AS total_revenue
FROM events
WHERE __time >= TIMESTAMP '2024-06-01'
AND __time < TIMESTAMP '2024-07-01'
GROUP BY 1, 2
PARTITIONED BY DAY
CLUSTERED BY country;
-- Replace (overwrite) segments
REPLACE INTO daily_summary OVERWRITE ALL
SELECT ... FROM events
PARTITIONED BY DAY;
Troubleshooting
| Issue | Solution |
|---|---|
| Slow queries | Check segment sizes (target ~5M rows). Verify partition pruning on __time |
| Ingestion lag | Increase maxRowsInMemory and task count. Check Kafka consumer lag |
| Segments not loading | Check coordinator logs. Verify deep storage connectivity and Historical capacity |
| OOM on Historical nodes | Increase -Xmx in jvm.config. Check segment cache size configuration |
| Query timeout | Increase druid.server.http.defaultQueryTimeout. Optimize query to use rollup |
| Supervisor in UNHEALTHY state | Check Kafka connectivity. Restart supervisor via API |
| Compaction not running | Verify auto-compaction is configured. Check coordinator duty cycle |
| High query latency variance | Check for segment skew. Use range partitioning for better distribution |
| Metadata DB connection issues | Verify PostgreSQL/MySQL connectivity. Check connection pool settings |
| ZooKeeper session expired | Check ZK cluster health. Increase druid.zk.service.sessionTimeoutMs |