Skip to content

PagerDuty Cheatsheet

Installation

CLI Tools Installation

Platform Method Command
Ubuntu/Debian Python CLI sudo apt-get install python3 python3-pip && pip3 install pdpyras pd-cli
Ubuntu/Debian Node.js CLI curl -fsSL https://deb.nodesource.com/setup_lts.x \| sudo -E bash - && sudo apt-get install -y nodejs && npm install -g pagerduty-cli
macOS Homebrew (Python) brew install python3 && pip3 install pdpyras pd-cli
macOS Homebrew (Node) brew install node && npm install -g pagerduty-cli
Windows Python pip install pdpyras pd-cli
Windows Chocolatey choco install python && pip install pdpyras pd-cli
Any Platform Docker docker pull pagerduty/pdagent

PagerDuty Agent Installation

Platform Command
Ubuntu/Debian curl -s https://packages.pagerduty.com/GPG-KEY-pagerduty \| sudo apt-key add - && echo "deb https://packages.pagerduty.com/pdagent deb/" \| sudo tee /etc/apt/sources.list.d/pdagent.list && sudo apt-get update && sudo apt-get install pdagent
RHEL/CentOS sudo rpm --import https://packages.pagerduty.com/GPG-KEY-pagerduty && sudo yum install pdagent
Start Agent sudo systemctl start pdagent && sudo systemctl enable pdagent

Basic Commands

Authentication & Setup

Command Description
pd login Authenticate and configure API token interactively
export PDTOKEN=your_api_token Set API token via environment variable
pd rest:get /users/me Test authentication and get current user info
pd user:set user@example.com Set default user for operations

Incident Management

Command Description
pd incident:list List all incidents
pd incident:list --status triggered List only triggered (active) incidents
pd incident:list --status acknowledged List acknowledged incidents
pd incident:get --id INCIDENT_ID Get detailed information about specific incident
pd incident:ack --id INCIDENT_ID Acknowledge an incident
pd incident:resolve --id INCIDENT_ID Resolve an incident
pd incident:notes --id INCIDENT_ID --note "Message" Add note to incident
pd incident:reassign --id INCIDENT_ID --user user@example.com Reassign incident to different user
pd incident:priority --id INCIDENT_ID --priority P1 Set incident priority (P1-P5)
pd incident:snooze --id INCIDENT_ID --duration 3600 Snooze incident for specified seconds

Service Management

Command Description
pd service:list List all services
pd service:get --id SERVICE_ID Get service details
pd service:disable --id SERVICE_ID Disable a service
pd service:enable --id SERVICE_ID Enable a service
pd service:integration:list --service-id SERVICE_ID List integrations for a service

User & On-Call Management

Command Description
pd user:list List all users in account
pd user:get --id USER_ID Get user details
pd oncall:list List current on-call users
pd user:contact:list --user-id USER_ID List user's contact methods
pd user:notification:list --user-id USER_ID List user's notification rules

PagerDuty Agent Commands

Command Description
pd-send -k KEY -t trigger -d "Description" Trigger new incident via agent
pd-send -k KEY -t acknowledge -i incident_key Acknowledge incident via agent
pd-send -k KEY -t resolve -i incident_key Resolve incident via agent
sudo systemctl status pdagent Check agent service status
sudo journalctl -u pdagent -f View agent logs in real-time

Advanced Usage

Advanced Incident Operations

Command Description
pd incident:create --title "Issue" --service-id SID --urgency high --priority P1 Create incident with full details
pd incident:merge --source-ids ID1,ID2 --target-id MAIN_ID Merge multiple incidents into one
pd incident:list --since 2024-01-01T00:00:00Z --until 2024-01-31T23:59:59Z List incidents within date range
pd incident:list --service-ids SID1,SID2 --urgencies high Filter incidents by service and urgency
pd incident:list --json \| jq -r '.incidents[].id' Extract incident IDs using jq
pd incident:list --status triggered --json \| jq -r '.incidents[].id' \| xargs -I {} pd incident:ack --id {} Bulk acknowledge all triggered incidents

REST API Operations (curl)

Command Description
curl -X GET "https://api.pagerduty.com/incidents" -H "Authorization: Token token=$PDTOKEN" -H "Accept: application/vnd.pagerduty+json;version=2" List incidents via REST API
curl -X POST "https://api.pagerduty.com/incidents" -H "Authorization: Token token=$PDTOKEN" -H "Content-Type: application/json" -H "From: user@example.com" -d '{"incident":{...}}' Create incident via REST API
curl -X GET "https://api.pagerduty.com/oncalls" -H "Authorization: Token token=$PDTOKEN" Get on-call schedule via API
curl -X PUT "https://api.pagerduty.com/incidents/$ID" -H "Authorization: Token token=$PDTOKEN" -d '{"incident":{"type":"incident_reference","status":"resolved"}}' Update incident status via API

Advanced Agent Operations

Command Description
pd-send -k KEY -t trigger -d "High CPU" -s error -i key123 Send alert with severity and incident key
pd-send -k KEY -t trigger -d "Alert" -f severity=critical -f host=web01 Send alert with custom fields
echo '{"routing_key":"KEY","event_action":"trigger","payload":{"summary":"Alert","severity":"error"}}' \| curl -X POST https://events.pagerduty.com/v2/enqueue -d @- Send Events API v2 alert

Schedule Management

Command Description
pd schedule:list List all schedules
pd schedule:show --id SCHEDULE_ID Show schedule details with on-call users
pd schedule:override --id SCHEDULE_ID --user USER_ID --start START --end END Create schedule override

Escalation Policy Management

Command Description
pd escalation:list List all escalation policies
pd escalation:get --id EP_ID Get escalation policy details

Analytics & Reporting

Command Description
pd analytics:incidents --since 2024-01-01 --until 2024-01-31 Get incident analytics for date range
pd incident:list --json \| jq '[.incidents[] \| {id, created_at, status, urgency}]' Extract incident data for custom reporting

Configuration

Environment Variables

# Set API token
export PDTOKEN="your_api_token_here"

# Set default region (for EU accounts)
export PD_API_BASE="https://api.eu.pagerduty.com"

# Set default user email
export PD_USER_EMAIL="user@example.com"

API Token Generation

  1. Log into PagerDuty web interface
  2. Navigate to Configuration → API Access
  3. Click Create New API Key
  4. Choose User Token or Account Token
  5. Copy token and save securely

Integration Keys

# Integration keys are service-specific
# Find them at: Service → Integrations → Integration Key

# Use in agent:
pd-send -k "your_integration_key" -t trigger -d "Alert message"

# Use in Events API v2:
curl -X POST https://events.pagerduty.com/v2/enqueue \
  -H "Content-Type: application/json" \
  -d '{
    "routing_key": "your_integration_key",
    "event_action": "trigger",
    "payload": {
      "summary": "Server down",
      "severity": "critical",
      "source": "prod-server-01"
    }
  }'

PagerDuty Agent Configuration

# Agent config location: /etc/pdagent.conf

# View current configuration
cat /etc/pdagent.conf

# Common settings:
# - pid_file: /var/run/pdagent/pdagent.pid
# - log_dir: /var/log/pdagent
# - outqueue_dir: /var/lib/pdagent/outqueue

Service Configuration Example

{
  "service": {
    "name": "Production API",
    "description": "Main production API service",
    "escalation_policy": {
      "id": "ESCALATION_POLICY_ID",
      "type": "escalation_policy_reference"
    },
    "alert_creation": "create_alerts_and_incidents",
    "incident_urgency_rule": {
      "type": "constant",
      "urgency": "high"
    },
    "auto_resolve_timeout": 14400,
    "acknowledgement_timeout": 1800
  }
}

Common Use Cases

Use Case 1: Trigger and Resolve Incident from Monitoring

# Trigger incident when issue detected
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t trigger \
  -d "Database connection pool exhausted" \
  -s critical \
  -i db_pool_incident_001

# Add context as incident develops
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t trigger \
  -d "Connection count: 500/500" \
  -i db_pool_incident_001

# Resolve when fixed
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t resolve \
  -i db_pool_incident_001

Use Case 2: Check Who's On-Call Before Deployment

# Get current on-call engineers
pd oncall:list --json | jq -r '.oncalls[] | "\(.escalation_policy.summary): \(.user.summary)"'

# Get on-call for specific escalation policy
pd oncall:list --escalation-policy-ids EP123456 --json | jq -r '.oncalls[].user.summary'

# Check schedule for next 7 days
pd schedule:show --id SCHEDULE_ID --since $(date -u +%Y-%m-%dT%H:%M:%SZ) --until $(date -u -d '+7 days' +%Y-%m-%dT%H:%M:%SZ)

Use Case 3: Bulk Incident Management During Outage

# Get all triggered incidents for a service
INCIDENTS=$(pd incident:list --service-ids SERVICE_ID --status triggered --json | jq -r '.incidents[].id')

# Acknowledge all incidents
echo "$INCIDENTS" | xargs -I {} pd incident:ack --id {}

# Add note to all incidents
echo "$INCIDENTS" | xargs -I {} pd incident:notes --id {} --note "Mass outage - investigating root cause"

# Resolve all incidents after fix
echo "$INCIDENTS" | xargs -I {} pd incident:resolve --id {}

Use Case 4: Create Incident with Conference Bridge

# Create high-priority incident with Zoom link
curl -X POST "https://api.pagerduty.com/incidents" \
  -H "Authorization: Token token=$PDTOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/vnd.pagerduty+json;version=2" \
  -H "From: oncall@example.com" \
  -d '{
    "incident": {
      "type": "incident",
      "title": "Production database outage",
      "service": {
        "id": "SERVICE_ID",
        "type": "service_reference"
      },
      "urgency": "high",
      "priority": {
        "id": "PRIORITY_P1_ID",
        "type": "priority_reference"
      },
      "body": {
        "type": "incident_body",
        "details": "Primary database cluster unresponsive"
      },
      "conference_bridge": {
        "conference_number": "https://zoom.us/j/1234567890",
        "conference_url": "https://zoom.us/j/1234567890"
      }
    }
  }'

Use Case 5: Generate Weekly Incident Report

# Get incidents from last week
LAST_WEEK=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)

pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq -r '.incidents[] | [.created_at, .urgency, .status, .title] | @csv' > weekly_incidents.csv

# Count incidents by service
pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq -r '.incidents[] | .service.summary' | sort | uniq -c | sort -rn

# Calculate mean time to acknowledge
pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq '[.incidents[] | select(.status == "resolved") | 
    (.first_trigger_log_entry.created_at as $trigger | 
     .acknowledgements[0].at as $ack | 
     ($ack | fromdateiso8601) - ($trigger | fromdateiso8601))] | 
    add / length / 60' # Result in minutes

Best Practices

  • Use incident keys for deduplication: Always provide consistent incident keys (-i flag) to prevent duplicate alerts for the same issue
  • Set appropriate urgencies: Use high urgency for critical production issues, low for non-urgent notifications to avoid alert fatigue
  • Leverage auto-resolution: Configure services with auto_resolve_timeout to automatically close incidents when monitoring shows recovery
  • Implement escalation policies: Create multi-level escalation policies to ensure incidents reach someone who can respond
  • Add context to incidents: Include relevant details in incident descriptions, notes, and custom fields to speed up resolution
  • Use schedule overrides: Plan for vacations and schedule changes by creating overrides rather than modifying base schedules
  • Tag and categorize incidents: Use consistent tagging for incidents to enable better reporting and trend analysis
  • Test integrations regularly: Send test alerts to verify monitoring integrations are working correctly
  • Review incident analytics: Regularly analyze MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve) metrics
  • Document runbooks: Link incidents to runbooks and documentation to help responders quickly resolve common issues
  • Use status pages: Keep stakeholders informed by connecting incidents to status pages for transparent communication

Troubleshooting

Issue Solution
Authentication fails with "Invalid token" Verify token with pd rest:get /users/me. Generate new token at Configuration → API Access. Ensure token has correct permissions.
Agent not sending events Check agent status: sudo systemctl status pdagent. View logs: sudo journalctl -u pdagent -f. Verify integration key is correct. Test connectivity: curl https://events.pagerduty.com/health
Incidents not triggering Verify service is enabled: pd service:get --id SERVICE_ID. Check integration key matches. Ensure service has valid escalation policy assigned.
No notifications received Check user contact methods: pd user:contact:list --user-id USER_ID. Verify notification rules: pd user:notification:list --user-id USER_ID. Test contact method in PagerDuty UI.
CLI returns "Service Unavailable" Check PagerDuty status at status.pagerduty.com. Verify API endpoint (use https://api.eu.pagerduty.com for EU accounts). Check network connectivity and firewall rules.
Duplicate incidents created Use consistent incident keys with -i flag. Configure alert grouping in service settings. Set appropriate deduplication time windows.
Schedule shows wrong on-call person Verify timezone settings in schedule configuration. Check for active overrides: pd schedule:show --id SCHEDULE_ID. Ensure schedule layers are configured correctly.
API rate limit exceeded Implement exponential backoff in scripts. Use bulk operations where possible. Cache frequently accessed data. Check rate limit headers in API responses.
Events API v2 returns 400 error Validate JSON payload structure. Ensure routing_key (not integration_key) is used. Check required fields: summary, severity, source. Verify event_action is valid (trigger/acknowledge/resolve).
Cannot resolve incident Check if incident is already resolved. Verify user has permissions to resolve. Ensure incident ID is correct. Try via web UI to rule out API issues.

Quick Reference: Event Severity Levels

Severity Use Case
critical Service outage, data loss, security breach
error Service degradation, failed jobs, errors affecting users
warning Potential issues, threshold breaches, degraded performance
info Informational events, successful deployments, routine notifications

Quick Reference: Incident Priorities

Priority Response Time Use Case
P1 Immediate Complete service outage, critical security incident
P2 < 30 minutes Major feature broken, significant performance degradation
P3 < 2 hours Minor feature issues, isolated customer impact
P4 < 8 hours Small bugs, cosmetic issues
P5 Next business day Enhancement requests, documentation updates