Aller au contenu

Apache Pulsar Cheat Sheet

Overview

Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo and now an Apache top-level project. It provides a unified messaging model supporting both queueing and streaming use cases with multi-tenancy, geo-replication, and tiered storage built in.

Pulsar separates serving (brokers) from storage (Apache BookKeeper), enabling independent scaling. It supports multiple subscription types (exclusive, shared, failover, key-shared), schema enforcement, message deduplication, and delayed message delivery. Pulsar Functions provide lightweight serverless compute directly on the messaging layer.

Installation

Standalone Mode (Development)

# Download Pulsar
wget https://archive.apache.org/dist/pulsar/pulsar-3.3.0/apache-pulsar-3.3.0-bin.tar.gz
tar xvfz apache-pulsar-3.3.0-bin.tar.gz
cd apache-pulsar-3.3.0

# Start standalone (broker + bookie + zookeeper)
bin/pulsar standalone

# Verify
bin/pulsar-admin brokers list use

Docker

docker run -d --name pulsar \
  -p 6650:6650 \
  -p 8080:8080 \
  apachepulsar/pulsar:3.3.0 \
  bin/pulsar standalone

Docker Compose (Full Cluster)

version: '3'
services:
  zookeeper:
    image: apachepulsar/pulsar:3.3.0
    command: bin/pulsar zookeeper
    ports:
      - "2181:2181"
  bookie:
    image: apachepulsar/pulsar:3.3.0
    command: bin/pulsar bookie
    depends_on:
      - zookeeper
  broker:
    image: apachepulsar/pulsar:3.3.0
    command: bin/pulsar broker
    ports:
      - "6650:6650"
      - "8080:8080"
    depends_on:
      - bookie

Core CLI Commands

Tenant and Namespace Management

CommandDescription
pulsar-admin tenants listList all tenants
pulsar-admin tenants create my-tenantCreate a tenant
pulsar-admin namespaces list my-tenantList namespaces in a tenant
pulsar-admin namespaces create my-tenant/my-nsCreate a namespace
pulsar-admin namespaces delete my-tenant/my-nsDelete a namespace
pulsar-admin namespaces policies my-tenant/my-nsShow namespace policies

Topic Management

# Create a partitioned topic
bin/pulsar-admin topics create-partitioned-topic \
  persistent://my-tenant/my-ns/my-topic -p 4

# List topics
bin/pulsar-admin topics list my-tenant/my-ns

# Get topic stats
bin/pulsar-admin topics stats persistent://my-tenant/my-ns/my-topic

# Peek at messages
bin/pulsar-admin topics peek-messages \
  persistent://my-tenant/my-ns/my-topic -s my-sub -n 10

# Skip messages on a subscription
bin/pulsar-admin topics skip \
  persistent://my-tenant/my-ns/my-topic -s my-sub -n 100

# Delete a topic
bin/pulsar-admin topics delete persistent://my-tenant/my-ns/my-topic

Producing and Consuming

# Produce messages
bin/pulsar-client produce persistent://my-tenant/my-ns/my-topic \
  -m "Hello Pulsar" -n 10

# Consume messages
bin/pulsar-client consume persistent://my-tenant/my-ns/my-topic \
  -s my-subscription -n 10

# Consume with specific subscription type
bin/pulsar-client consume persistent://my-tenant/my-ns/my-topic \
  -s my-shared-sub -t Shared -n 0

Configuration

Broker Configuration (conf/broker.conf)

# Cluster name
clusterName=my-cluster

# Zookeeper connection
zookeeperServers=zk1:2181,zk2:2181,zk3:2181
configurationStoreServers=zk1:2181,zk2:2181,zk3:2181

# Broker settings
brokerServicePort=6650
webServicePort=8080

# Message retention
defaultRetentionTimeInMinutes=4320
defaultRetentionSizeInMB=10240

# Backlog quota
backlogQuotaDefaultLimitGB=10
backlogQuotaDefaultRetentionPolicy=producer_request_hold

# Deduplication
brokerDeduplicationEnabled=true

# Max message size (5MB default)
maxMessageSize=5242880

# Managed ledger settings
managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2

Namespace Policies

# Set retention policy
bin/pulsar-admin namespaces set-retention my-tenant/my-ns \
  --size 10G --time 7d

# Set backlog quota
bin/pulsar-admin namespaces set-backlog-quota my-tenant/my-ns \
  --limit 10G --policy producer_request_hold

# Set message TTL
bin/pulsar-admin namespaces set-message-ttl my-tenant/my-ns --messageTTL 3600

# Set replication clusters
bin/pulsar-admin namespaces set-clusters my-tenant/my-ns \
  --clusters us-east,us-west,eu-central

# Enable deduplication
bin/pulsar-admin namespaces set-deduplication my-tenant/my-ns --enable

# Set schema validation
bin/pulsar-admin namespaces set-schema-validation-enforce \
  my-tenant/my-ns --enable

Subscription Types

TypeDescriptionUse Case
ExclusiveSingle consumer per subscriptionOrdered processing
SharedRound-robin across consumersParallel processing
FailoverActive-standby consumersHigh availability
Key_SharedPartition by message keyOrdered per-key processing

Advanced Usage

Pulsar Functions

# Deploy a function
bin/pulsar-admin functions create \
  --function-name my-func \
  --inputs persistent://my-tenant/my-ns/input-topic \
  --output persistent://my-tenant/my-ns/output-topic \
  --jar my-function.jar \
  --classname com.example.MyFunction

# List functions
bin/pulsar-admin functions list --tenant my-tenant --namespace my-ns

# Get function status
bin/pulsar-admin functions status \
  --tenant my-tenant --namespace my-ns --name my-func

# Delete a function
bin/pulsar-admin functions delete \
  --tenant my-tenant --namespace my-ns --name my-func

Tiered Storage (Offloading)

# Configure S3 offloader in broker.conf
# managedLedgerOffloadDriver=aws-s3
# s3ManagedLedgerOffloadBucket=pulsar-offload
# s3ManagedLedgerOffloadRegion=us-east-1

# Set offload threshold on namespace
bin/pulsar-admin namespaces set-offload-threshold \
  my-tenant/my-ns --size 10G

# Trigger manual offload
bin/pulsar-admin topics offload \
  persistent://my-tenant/my-ns/my-topic -s 10G

Geo-Replication

# Enable replication on namespace
bin/pulsar-admin namespaces set-clusters my-tenant/my-ns \
  --clusters us-east,eu-west

# Check replication status
bin/pulsar-admin topics stats persistent://my-tenant/my-ns/my-topic \
  | jq '.replication'

Monitoring

# Broker metrics endpoint
curl http://localhost:8080/metrics

# Key metrics to watch
# pulsar_broker_topics_count
# pulsar_subscription_back_log
# pulsar_throughput_in / pulsar_throughput_out
# pulsar_storage_size
# pulsar_msg_backlog
# bookkeeper_server_ADD_ENTRY_REQUEST (bookie write latency)

Troubleshooting

IssueSolution
Broker fails to startCheck ZooKeeper connectivity; verify clusterName matches metadata
Messages not consumedVerify subscription exists; check consumer subscription type
Backlog growingScale consumers; check consumer errors in logs; increase parallelism
Topic creation failsVerify tenant/namespace exists; check authorization
High publish latencyCheck BookKeeper health; ensure sufficient bookie nodes
Out of diskConfigure tiered storage offloading; adjust retention policies
Schema compatibility errorCheck schema compatibility strategy; use BACKWARD for safe evolution
Geo-replication lagMonitor replication throughput; check cross-region network latency