Zum Inhalt springen

PagerDuty Cheatsheet

PagerDuty Cheatsheet

Installation

CLI Tools Installation

PlatformMethodCommand
Ubuntu/DebianPython CLIsudo apt-get install python3 python3-pip && pip3 install pdpyras pd-cli
Ubuntu/DebianNode.js CLI`curl -fsSL https://deb.nodesource.com/setup_lts.x \
macOSHomebrew (Python)brew install python3 && pip3 install pdpyras pd-cli
macOSHomebrew (Node)brew install node && npm install -g pagerduty-cli
WindowsPythonpip install pdpyras pd-cli
WindowsChocolateychoco install python && pip install pdpyras pd-cli
Any PlatformDockerdocker pull pagerduty/pdagent

PagerDuty Agent Installation

PlatformCommand
Ubuntu/Debian`curl -s https://packages.pagerduty.com/GPG-KEY-pagerduty \
RHEL/CentOSsudo rpm --import https://packages.pagerduty.com/GPG-KEY-pagerduty && sudo yum install pdagent
Start Agentsudo systemctl start pdagent && sudo systemctl enable pdagent

Basic Commands

Authentication & Setup

CommandDescription
pd loginAuthenticate and configure API token interactively
export PDTOKEN=your_api_tokenSet API token via environment variable
pd rest:get /users/meTest authentication and get current user info
pd user:set user@example.comSet default user for operations

Incident Management

CommandDescription
pd incident:listList all incidents
pd incident:list --status triggeredList only triggered (active) incidents
pd incident:list --status acknowledgedList acknowledged incidents
pd incident:get --id INCIDENT_IDGet detailed information about specific incident
pd incident:ack --id INCIDENT_IDAcknowledge an incident
pd incident:resolve --id INCIDENT_IDResolve an incident
pd incident:notes --id INCIDENT_ID --note "Message"Add note to incident
pd incident:reassign --id INCIDENT_ID --user user@example.comReassign incident to different user
pd incident:priority --id INCIDENT_ID --priority P1Set incident priority (P1-P5)
pd incident:snooze --id INCIDENT_ID --duration 3600Snooze incident for specified seconds

Service Management

CommandDescription
pd service:listList all services
pd service:get --id SERVICE_IDGet service details
pd service:disable --id SERVICE_IDDisable a service
pd service:enable --id SERVICE_IDEnable a service
pd service:integration:list --service-id SERVICE_IDList integrations for a service

User & On-Call Management

CommandDescription
pd user:listList all users in account
pd user:get --id USER_IDGet user details
pd oncall:listList current on-call users
pd user:contact:list --user-id USER_IDList user’s contact methods
pd user:notification:list --user-id USER_IDList user’s notification rules

PagerDuty Agent Commands

CommandDescription
pd-send -k KEY -t trigger -d "Description"Trigger new incident via agent
pd-send -k KEY -t acknowledge -i incident_keyAcknowledge incident via agent
pd-send -k KEY -t resolve -i incident_keyResolve incident via agent
sudo systemctl status pdagentCheck agent service status
sudo journalctl -u pdagent -fView agent logs in real-time

Advanced Usage

Advanced Incident Operations

CommandDescription
pd incident:create --title "Issue" --service-id SID --urgency high --priority P1Create incident with full details
pd incident:merge --source-ids ID1,ID2 --target-id MAIN_IDMerge multiple incidents into one
pd incident:list --since 2024-01-01T00:00:00Z --until 2024-01-31T23:59:59ZList incidents within date range
pd incident:list --service-ids SID1,SID2 --urgencies highFilter incidents by service and urgency
`pd incident:list —json \jq -r ‘.incidents[].id’`
`pd incident:list —status triggered —json \jq -r ‘.incidents[].id’ \

REST API Operations (curl)

CommandDescription
curl -X GET "https://api.pagerduty.com/incidents" -H "Authorization: Token token=$PDTOKEN" -H "Accept: application/vnd.pagerduty+json;version=2"List incidents via REST API
curl -X POST "https://api.pagerduty.com/incidents" -H "Authorization: Token token=$PDTOKEN" -H "Content-Type: application/json" -H "From: user@example.com" -d '{"incident":{...}}'Create incident via REST API
curl -X GET "https://api.pagerduty.com/oncalls" -H "Authorization: Token token=$PDTOKEN"Get on-call schedule via API
curl -X PUT "https://api.pagerduty.com/incidents/$ID" -H "Authorization: Token token=$PDTOKEN" -d '{"incident":{"type":"incident_reference","status":"resolved"}}'Update incident status via API

Advanced Agent Operations

CommandDescription
pd-send -k KEY -t trigger -d "High CPU" -s error -i key123Send alert with severity and incident key
pd-send -k KEY -t trigger -d "Alert" -f severity=critical -f host=web01Send alert with custom fields
`echo ’{“routing_key”:“KEY”,“event_action”:“trigger”,“payload”:{“summary”:“Alert”,“severity”:“error”}}’ \curl -X POST https://events.pagerduty.com/v2/enqueue -d @-`

Schedule Management

CommandDescription
pd schedule:listList all schedules
pd schedule:show --id SCHEDULE_IDShow schedule details with on-call users
pd schedule:override --id SCHEDULE_ID --user USER_ID --start START --end ENDCreate schedule override

Escalation Policy Management

CommandDescription
pd escalation:listList all escalation policies
pd escalation:get --id EP_IDGet escalation policy details

Analytics & Reporting

CommandDescription
pd analytics:incidents --since 2024-01-01 --until 2024-01-31Get incident analytics for date range
`pd incident:list —json \jq ‘[.incidents[] \

Configuration

Environment Variables

# Set API token
export PDTOKEN="your_api_token_here"

# Set default region (for EU accounts)
export PD_API_BASE="https://api.eu.pagerduty.com"

# Set default user email
export PD_USER_EMAIL="user@example.com"

API Token Generation

  1. Log into PagerDuty web interface
  2. Navigate to Configuration → API Access
  3. Click Create New API Key
  4. Choose User Token or Account Token
  5. Copy token and save securely

Integration Keys

# Integration keys are service-specific
# Find them at: Service → Integrations → Integration Key

# Use in agent:
pd-send -k "your_integration_key" -t trigger -d "Alert message"

# Use in Events API v2:
curl -X POST https://events.pagerduty.com/v2/enqueue \
  -H "Content-Type: application/json" \
  -d '{
    "routing_key": "your_integration_key",
    "event_action": "trigger",
    "payload": {
      "summary": "Server down",
      "severity": "critical",
      "source": "prod-server-01"
    }
  }'

PagerDuty Agent Configuration

# Agent config location: /etc/pdagent.conf

# View current configuration
cat /etc/pdagent.conf

# Common settings:
# - pid_file: /var/run/pdagent/pdagent.pid
# - log_dir: /var/log/pdagent
# - outqueue_dir: /var/lib/pdagent/outqueue

Service Configuration Example

{
  "service": {
    "name": "Production API",
    "description": "Main production API service",
    "escalation_policy": {
      "id": "ESCALATION_POLICY_ID",
      "type": "escalation_policy_reference"
    },
    "alert_creation": "create_alerts_and_incidents",
    "incident_urgency_rule": {
      "type": "constant",
      "urgency": "high"
    },
    "auto_resolve_timeout": 14400,
    "acknowledgement_timeout": 1800
  }
}

Common Use Cases

Use Case 1: Trigger and Resolve Incident from Monitoring

# Trigger incident when issue detected
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t trigger \
  -d "Database connection pool exhausted" \
  -s critical \
  -i db_pool_incident_001

# Add context as incident develops
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t trigger \
  -d "Connection count: 500/500" \
  -i db_pool_incident_001

# Resolve when fixed
pd-send -k R0123456789ABCDEF0123456789ABCDEF \
  -t resolve \
  -i db_pool_incident_001

Use Case 2: Check Who’s On-Call Before Deployment

# Get current on-call engineers
pd oncall:list --json | jq -r '.oncalls[] | "\(.escalation_policy.summary): \(.user.summary)"'

# Get on-call for specific escalation policy
pd oncall:list --escalation-policy-ids EP123456 --json | jq -r '.oncalls[].user.summary'

# Check schedule for next 7 days
pd schedule:show --id SCHEDULE_ID --since $(date -u +%Y-%m-%dT%H:%M:%SZ) --until $(date -u -d '+7 days' +%Y-%m-%dT%H:%M:%SZ)

Use Case 3: Bulk Incident Management During Outage

# Get all triggered incidents for a service
INCIDENTS=$(pd incident:list --service-ids SERVICE_ID --status triggered --json | jq -r '.incidents[].id')

# Acknowledge all incidents
echo "$INCIDENTS" | xargs -I {} pd incident:ack --id {}

# Add note to all incidents
echo "$INCIDENTS" | xargs -I {} pd incident:notes --id {} --note "Mass outage - investigating root cause"

# Resolve all incidents after fix
echo "$INCIDENTS" | xargs -I {} pd incident:resolve --id {}

Use Case 4: Create Incident with Conference Bridge

# Create high-priority incident with Zoom link
curl -X POST "https://api.pagerduty.com/incidents" \
  -H "Authorization: Token token=$PDTOKEN" \
  -H "Content-Type: application/json" \
  -H "Accept: application/vnd.pagerduty+json;version=2" \
  -H "From: oncall@example.com" \
  -d '{
    "incident": {
      "type": "incident",
      "title": "Production database outage",
      "service": {
        "id": "SERVICE_ID",
        "type": "service_reference"
      },
      "urgency": "high",
      "priority": {
        "id": "PRIORITY_P1_ID",
        "type": "priority_reference"
      },
      "body": {
        "type": "incident_body",
        "details": "Primary database cluster unresponsive"
      },
      "conference_bridge": {
        "conference_number": "https://zoom.us/j/1234567890",
        "conference_url": "https://zoom.us/j/1234567890"
      }
    }
  }'

Use Case 5: Generate Weekly Incident Report

# Get incidents from last week
LAST_WEEK=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)

pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq -r '.incidents[] | [.created_at, .urgency, .status, .title] | @csv' > weekly_incidents.csv

# Count incidents by service
pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq -r '.incidents[] | .service.summary' | sort | uniq -c | sort -rn

# Calculate mean time to acknowledge
pd incident:list --since $LAST_WEEK --until $NOW --json | \
  jq '[.incidents[] | select(.status == "resolved") | 
    (.first_trigger_log_entry.created_at as $trigger | 
     .acknowledgements[0].at as $ack | 
     ($ack | fromdateiso8601) - ($trigger | fromdateiso8601))] | 
    add / length / 60' # Result in minutes

Best Practices

  • Use incident keys for deduplication: Always provide consistent incident keys (-i flag) to prevent duplicate alerts for the same issue
  • Set appropriate urgencies: Use high urgency for critical production issues, low for non-urgent notifications to avoid alert fatigue
  • Leverage auto-resolution: Configure services with auto_resolve_timeout to automatically close incidents when monitoring shows recovery
  • Implement escalation policies: Create multi-level escalation policies to ensure incidents reach someone who can respond
  • Add context to incidents: Include relevant details in incident descriptions, notes, and custom fields to speed up resolution
  • Use schedule overrides: Plan for vacations and schedule changes by creating overrides rather than modifying base schedules
  • Tag and categorize incidents: Use consistent tagging for incidents to enable better reporting and trend analysis
  • Test integrations regularly: Send test alerts to verify monitoring integrations are working correctly
  • Review incident analytics: Regularly analyze MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve) metrics
  • Document runbooks: Link incidents to runbooks and documentation to help responders quickly resolve common issues
  • Use status pages: Keep stakeholders informed by connecting incidents to status pages for transparent communication

Troubleshooting

IssueSolution
Authentication fails with “Invalid token”Verify token with pd rest:get /users/me. Generate new token at Configuration → API Access. Ensure token has correct permissions.
Agent not sending eventsCheck agent status: sudo systemctl status pdagent. View logs: sudo journalctl -u pdagent -f. Verify integration key is correct. Test connectivity: curl https://events.pagerduty.com/health
Incidents not triggeringVerify service is enabled: pd service:get --id SERVICE_ID. Check integration key matches. Ensure service has valid escalation policy assigned.
No notifications receivedCheck user contact methods: pd user:contact:list --user-id USER_ID. Verify notification rules: pd user:notification:list --user-id USER_ID. Test contact method in PagerDuty UI.
CLI returns “Service Unavailable”Check PagerDuty status at status.pagerduty.com. Verify API endpoint (use https://api.eu.pagerduty.com for EU accounts). Check network connectivity and firewall rules.
Duplicate incidents createdUse consistent incident keys with -i flag. Configure alert grouping in service settings. Set appropriate deduplication time windows.
Schedule shows wrong on-call personVerify timezone settings in schedule configuration. Check for active overrides: pd schedule:show --id SCHEDULE_ID. Ensure schedule layers are configured correctly.
API rate limit exceededImplement exponential backoff in scripts. Use bulk operations where possible. Cache frequently accessed data. Check rate limit headers in API responses.
Events API v2 returns 400 errorValidate JSON payload structure. Ensure routing_key (not integration_key) is used. Check required fields: summary, severity, source. Verify event_action is valid (trigger/acknowledge/resolve).
Cannot resolve incidentCheck if incident is already resolved. Verify user has permissions to resolve. Ensure incident ID is correct. Try via web UI to rule out API issues.

Quick Reference: Event Severity Levels

SeverityUse Case
criticalService outage, data loss, security breach
errorService degradation, failed jobs, errors affecting users
warningPotential issues, threshold breaches, degraded performance
infoInformational events, successful deployments, routine notifications

Quick Reference: Incident Priorities

PriorityResponse TimeUse Case
P1ImmediateComplete service outage, critical security incident
P2< 30 minutesMajor feature broken, significant performance degradation
P3< 2 hoursMinor feature issues, isolated customer impact
P4< 8 hoursSmall bugs, cosmetic issues