PagerDuty Cheatsheet¶

Installation¶

CLI Tools Installation¶

Platform	Method	Command
Ubuntu/Debian	Python CLI	`sudo apt-get install python3 python3-pip && pip3 install pdpyras pd-cli`
Ubuntu/Debian	Node.js CLI	`curl -fsSL https://deb.nodesource.com/setup_lts.x \\| sudo -E bash - && sudo apt-get install -y nodejs && npm install -g pagerduty-cli`
macOS	Homebrew (Python)	`brew install python3 && pip3 install pdpyras pd-cli`
macOS	Homebrew (Node)	`brew install node && npm install -g pagerduty-cli`
Windows	Python	`pip install pdpyras pd-cli`
Windows	Chocolatey	`choco install python && pip install pdpyras pd-cli`
Any Platform	Docker	`docker pull pagerduty/pdagent`

PagerDuty Agent Installation¶

Platform	Command
Ubuntu/Debian	`curl -s https://packages.pagerduty.com/GPG-KEY-pagerduty \\| sudo apt-key add - && echo "deb https://packages.pagerduty.com/pdagent deb/" \\| sudo tee /etc/apt/sources.list.d/pdagent.list && sudo apt-get update && sudo apt-get install pdagent`
RHEL/CentOS	`sudo rpm --import https://packages.pagerduty.com/GPG-KEY-pagerduty && sudo yum install pdagent`
Start Agent	`sudo systemctl start pdagent && sudo systemctl enable pdagent`

Basic Commands¶

Authentication & Setup¶

Command	Description
`pd login`	Authenticate and configure API token interactively
`export PDTOKEN=your_api_token`	Set API token via environment variable
`pd rest:get /users/me`	Test authentication and get current user info
`pd user:set user@example.com`	Set default user for operations

Incident Management¶

Command	Description
`pd incident:list`	List all incidents
`pd incident:list --status triggered`	List only triggered (active) incidents
`pd incident:list --status acknowledged`	List acknowledged incidents
`pd incident:get --id INCIDENT_ID`	Get detailed information about specific incident
`pd incident:ack --id INCIDENT_ID`	Acknowledge an incident
`pd incident:resolve --id INCIDENT_ID`	Resolve an incident
`pd incident:notes --id INCIDENT_ID --note "Message"`	Add note to incident
`pd incident:reassign --id INCIDENT_ID --user user@example.com`	Reassign incident to different user
`pd incident:priority --id INCIDENT_ID --priority P1`	Set incident priority (P1-P5)
`pd incident:snooze --id INCIDENT_ID --duration 3600`	Snooze incident for specified seconds

Service Management¶

Command	Description
`pd service:list`	List all services
`pd service:get --id SERVICE_ID`	Get service details
`pd service:disable --id SERVICE_ID`	Disable a service
`pd service:enable --id SERVICE_ID`	Enable a service
`pd service:integration:list --service-id SERVICE_ID`	List integrations for a service

User & On-Call Management¶

Command	Description
`pd user:list`	List all users in account
`pd user:get --id USER_ID`	Get user details
`pd oncall:list`	List current on-call users
`pd user:contact:list --user-id USER_ID`	List user's contact methods
`pd user:notification:list --user-id USER_ID`	List user's notification rules

PagerDuty Agent Commands¶

Command	Description
`pd-send -k KEY -t trigger -d "Description"`	Trigger new incident via agent
`pd-send -k KEY -t acknowledge -i incident_key`	Acknowledge incident via agent
`pd-send -k KEY -t resolve -i incident_key`	Resolve incident via agent
`sudo systemctl status pdagent`	Check agent service status
`sudo journalctl -u pdagent -f`	View agent logs in real-time

Advanced Usage¶

Advanced Incident Operations¶

Command	Description
`pd incident:create --title "Issue" --service-id SID --urgency high --priority P1`	Create incident with full details
`pd incident:merge --source-ids ID1,ID2 --target-id MAIN_ID`	Merge multiple incidents into one
`pd incident:list --since 2024-01-01T00:00:00Z --until 2024-01-31T23:59:59Z`	List incidents within date range
`pd incident:list --service-ids SID1,SID2 --urgencies high`	Filter incidents by service and urgency
`pd incident:list --json \\| jq -r '.incidents[].id'`	Extract incident IDs using jq
`pd incident:list --status triggered --json \\| jq -r '.incidents[].id' \\| xargs -I {} pd incident:ack --id {}`	Bulk acknowledge all triggered incidents

REST API Operations (curl)¶

Command	Description
`curl -X GET "https://api.pagerduty.com/incidents" -H "Authorization: Token token=$PDTOKEN" -H "Accept: application/vnd.pagerduty+json;version=2"`	List incidents via REST API
`curl -X POST "https://api.pagerduty.com/incidents" -H "Authorization: Token token=$PDTOKEN" -H "Content-Type: application/json" -H "From: user@example.com" -d '{"incident":{...}}'`	Create incident via REST API
`curl -X GET "https://api.pagerduty.com/oncalls" -H "Authorization: Token token=$PDTOKEN"`	Get on-call schedule via API
`curl -X PUT "https://api.pagerduty.com/incidents/$ID" -H "Authorization: Token token=$PDTOKEN" -d '{"incident":{"type":"incident_reference","status":"resolved"}}'`	Update incident status via API

Advanced Agent Operations¶

Command	Description
`pd-send -k KEY -t trigger -d "High CPU" -s error -i key123`	Send alert with severity and incident key
`pd-send -k KEY -t trigger -d "Alert" -f severity=critical -f host=web01`	Send alert with custom fields
`echo '{"routing_key":"KEY","event_action":"trigger","payload":{"summary":"Alert","severity":"error"}}' \\| curl -X POST https://events.pagerduty.com/v2/enqueue -d @-`	Send Events API v2 alert

Schedule Management¶

Command	Description
`pd schedule:list`	List all schedules
`pd schedule:show --id SCHEDULE_ID`	Show schedule details with on-call users
`pd schedule:override --id SCHEDULE_ID --user USER_ID --start START --end END`	Create schedule override

Escalation Policy Management¶

Command	Description
`pd escalation:list`	List all escalation policies
`pd escalation:get --id EP_ID`	Get escalation policy details

Analytics & Reporting¶

Command	Description
`pd analytics:incidents --since 2024-01-01 --until 2024-01-31`	Get incident analytics for date range
`pd incident:list --json \\| jq '[.incidents[] \\| {id, created_at, status, urgency}]'`	Extract incident data for custom reporting

Configuration¶

Environment Variables¶

# Set API token
export PDTOKEN="your_api_token_here"

# Set default region (for EU accounts)
export PD_API_BASE="https://api.eu.pagerduty.com"

# Set default user email
export PD_USER_EMAIL="user@example.com"

API Token Generation¶

Log into PagerDuty web interface
Navigate to Configuration → API Access
Click Create New API Key
Choose User Token or Account Token
Copy token and save securely

Integration Keys¶

# Integration keys are service-specific
# Find them at: Service → Integrations → Integration Key

# Use in agent:
pd-send -k "your_integration_key" -t trigger -d "Alert message"

# Use in Events API v2:
curl -X POST https://events.pagerduty.com/v2/enqueue \
  -H "Content-Type: application/json" \
  -d '{
    "routing_key": "your_integration_key",
    "event_action": "trigger",
    "payload": {
      "summary": "Server down",
      "severity": "critical",
      "source": "prod-server-01"
    }
  }'

PagerDuty Agent Configuration¶

# Agent config location: /etc/pdagent.conf

# View current configuration
cat /etc/pdagent.conf

# Common settings:
# - pid_file: /var/run/pdagent/pdagent.pid
# - log_dir: /var/log/pdagent
# - outqueue_dir: /var/lib/pdagent/outqueue

Service Configuration Example¶

{
  "service": {
    "name": "Production API",
    "description": "Main production API service",
    "escalation_policy": {
      "id": "ESCALATION_POLICY_ID",
      "type": "escalation_policy_reference"
    },
    "alert_creation": "create_alerts_and_incidents",
    "incident_urgency_rule": {
      "type": "constant",
      "urgency": "high"
    },
    "auto_resolve_timeout": 14400,
    "acknowledgement_timeout": 1800
  }
}

Common Use Cases¶

Use Case 1: Trigger and Resolve Incident from Monitoring¶

# Trigger incident when issue detected
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t trigger \
  -d "Database connection pool exhausted" \
  -s critical \
  -i db_pool_incident_001

# Add context as incident develops
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t trigger \
  -d "Connection count: 500/500" \
  -i db_pool_incident_001

# Resolve when fixed
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t resolve \
  -i db_pool_incident_001

Use Case 2: Check Who's On-Call Before Deployment¶

# Get current on-call engineers
pd oncall:list --json | jq -r '.oncalls[] | "\(.escalation_policy.summary): \(.user.summary)"'

# Get on-call for specific escalation policy
pd oncall:list --escalation-policy-ids EP123456 --json | jq -r '.oncalls[].user.summary'

# Check schedule for next 7 days
pd schedule:show --id SCHEDULE_ID --since $(date -u +%Y-%m-%dT%H:%M:%SZ) --until $(date -u -d '+7 days' +%Y-%m-%dT%H:%M:%SZ)

Use Case 3: Bulk Incident Management During Outage¶

# Get all triggered incidents for a service
INCIDENTS=$(pd incident:list --service-ids SERVICE_ID --status triggered --json | jq -r '.incidents[].id')

# Acknowledge all incidents
echo "$INCIDENTS" | xargs -I {} pd incident:ack --id {}

# Add note to all incidents
echo "$INCIDENTS" | xargs -I {} pd incident:notes --id {} --note "Mass outage - investigating root cause"

# Resolve all incidents after fix
echo "$INCIDENTS" | xargs -I {} pd incident:resolve --id {}

Use Case 4: Create Incident with Conference Bridge¶

# Create high-priority incident with Zoom link
curl -X POST "https://api.pagerduty.com/incidents" \
  -H "Authorization: Token token=$PDTOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/vnd.pagerduty+json;version=2" \
  -H "From: oncall@example.com" \
  -d '{
    "incident": {
      "type": "incident",
      "title": "Production database outage",
      "service": {
        "id": "SERVICE_ID",
        "type": "service_reference"
      },
      "urgency": "high",
      "priority": {
        "id": "PRIORITY_P1_ID",
        "type": "priority_reference"
      },
      "body": {
        "type": "incident_body",
        "details": "Primary database cluster unresponsive"
      },
      "conference_bridge": {
        "conference_number": "https://zoom.us/j/1234567890",
        "conference_url": "https://zoom.us/j/1234567890"
      }
    }
  }'

Use Case 5: Generate Weekly Incident Report¶

# Get incidents from last week
LAST_WEEK=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)

pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq -r '.incidents[] | [.created_at, .urgency, .status, .title] | @csv' > weekly_incidents.csv

# Count incidents by service
pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq -r '.incidents[] | .service.summary' | sort | uniq -c | sort -rn

# Calculate mean time to acknowledge
pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq '[.incidents[] | select(.status == "resolved") | 
    (.first_trigger_log_entry.created_at as $trigger | 
     .acknowledgements[0].at as $ack | 
     ($ack | fromdateiso8601) - ($trigger | fromdateiso8601))] | 
    add / length / 60' # Result in minutes

Best Practices¶

Use incident keys for deduplication: Always provide consistent incident keys (-i flag) to prevent duplicate alerts for the same issue
Set appropriate urgencies: Use high urgency for critical production issues, low for non-urgent notifications to avoid alert fatigue
Leverage auto-resolution: Configure services with auto_resolve_timeout to automatically close incidents when monitoring shows recovery
Implement escalation policies: Create multi-level escalation policies to ensure incidents reach someone who can respond
Add context to incidents: Include relevant details in incident descriptions, notes, and custom fields to speed up resolution
Use schedule overrides: Plan for vacations and schedule changes by creating overrides rather than modifying base schedules
Tag and categorize incidents: Use consistent tagging for incidents to enable better reporting and trend analysis
Test integrations regularly: Send test alerts to verify monitoring integrations are working correctly
Review incident analytics: Regularly analyze MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve) metrics
Document runbooks: Link incidents to runbooks and documentation to help responders quickly resolve common issues
Use status pages: Keep stakeholders informed by connecting incidents to status pages for transparent communication

Troubleshooting¶

Issue	Solution
Authentication fails with "Invalid token"	Verify token with `pd rest:get /users/me`. Generate new token at Configuration → API Access. Ensure token has correct permissions.
Agent not sending events	Check agent status: `sudo systemctl status pdagent`. View logs: `sudo journalctl -u pdagent -f`. Verify integration key is correct. Test connectivity: `curl https://events.pagerduty.com/health`
Incidents not triggering	Verify service is enabled: `pd service:get --id SERVICE_ID`. Check integration key matches. Ensure service has valid escalation policy assigned.
No notifications received	Check user contact methods: `pd user:contact:list --user-id USER_ID`. Verify notification rules: `pd user:notification:list --user-id USER_ID`. Test contact method in PagerDuty UI.
CLI returns "Service Unavailable"	Check PagerDuty status at status.pagerduty.com. Verify API endpoint (use `https://api.eu.pagerduty.com` for EU accounts). Check network connectivity and firewall rules.
Duplicate incidents created	Use consistent incident keys with `-i` flag. Configure alert grouping in service settings. Set appropriate deduplication time windows.
Schedule shows wrong on-call person	Verify timezone settings in schedule configuration. Check for active overrides: `pd schedule:show --id SCHEDULE_ID`. Ensure schedule layers are configured correctly.
API rate limit exceeded	Implement exponential backoff in scripts. Use bulk operations where possible. Cache frequently accessed data. Check rate limit headers in API responses.
Events API v2 returns 400 error	Validate JSON payload structure. Ensure `routing_key` (not integration_key) is used. Check required fields: `summary`, `severity`, `source`. Verify `event_action` is valid (trigger/acknowledge/resolve).
Cannot resolve incident	Check if incident is already resolved. Verify user has permissions to resolve. Ensure incident ID is correct. Try via web UI to rule out API issues.

Quick Reference: Event Severity Levels¶

Severity	Use Case
`critical`	Service outage, data loss, security breach
`error`	Service degradation, failed jobs, errors affecting users
`warning`	Potential issues, threshold breaches, degraded performance
`info`	Informational events, successful deployments, routine notifications

Quick Reference: Incident Priorities¶

Priority	Response Time	Use Case
`P1`	Immediate	Complete service outage, critical security incident
`P2`	< 30 minutes	Major feature broken, significant performance degradation
`P3`	< 2 hours	Minor feature issues, isolated customer impact
`P4`	< 8 hours	Small bugs, cosmetic issues
`P5`	Next business day	Enhancement requests, documentation updates