콘텐츠로 이동

OpsGenie Cheat Sheet

Overview

OpsGenie is an incident alerting and on-call management platform acquired by Atlassian that centralizes alert management from monitoring tools, ticketing systems, and custom integrations. It routes alerts to the right on-call responders via multiple notification channels including SMS, phone calls, push notifications, and email, ensuring critical incidents are never missed.

OpsGenie provides flexible on-call scheduling, automatic escalation policies, and team-based routing rules that adapt to complex organizational structures. It integrates natively with over 200 tools including Jira, PagerDuty, Datadog, Prometheus, and Slack, making it a central hub for incident response workflows across the entire DevOps toolchain.

Installation

CLI Installation

# Install OpsGenie CLI (lampgenie) via npm
npm install -g opsgenie-cli

# Install via Homebrew (macOS)
brew install opsgenie-lamp

# Install lamp CLI from binary
curl -L https://github.com/opsgenie/opsgenie-lamp/releases/latest/download/lamp-linux-amd64 -o /usr/local/bin/lamp
chmod +x /usr/local/bin/lamp

# Configure API key
lamp configure --apiKey "your-api-key-here"

Docker Integration Agent

# Run OpsGenie Marid integration agent
docker run -d \
  --name opsgenie-marid \
  -e OPSGENIE_API_KEY=your-api-key \
  -v /opt/opsgenie/marid/conf:/opt/opsgenie/marid/conf \
  opsgenie/marid:latest

# Run OpsGenie Edge Connector
docker run -d \
  --name opsgenie-edge \
  -e OEC_API_KEY=your-api-key \
  -v /opt/oec/conf:/opt/oec/conf \
  atlassian/opsgenie-oec:latest

Terraform Provider

# Install Terraform OpsGenie provider
terraform {
  required_providers {
    opsgenie = {
      source  = "opsgenie/opsgenie"
      version = "~> 0.6"
    }
  }
}

provider "opsgenie" {
  api_key = var.opsgenie_api_key
  api_url = "api.opsgenie.com"  # or api.eu.opsgenie.com for EU
}

Core Commands — Alert Management

Creating and Managing Alerts

# Create a new alert
lamp createAlert --message "Database connection pool exhausted" \
  --priority P1 \
  --description "Connection pool on prod-db-01 reached 100%" \
  --tags "database,production,critical"

# Create alert with team routing
lamp createAlert --message "High CPU on web servers" \
  --teams "platform-team" \
  --priority P2 \
  --alias "high-cpu-web-cluster"

# Acknowledge an alert
lamp acknowledgeAlert --id "alert-id-here"

# Close an alert
lamp closeAlert --id "alert-id-here" \
  --note "Resolved by scaling up connection pool"

# Add note to alert
lamp addNote --id "alert-id-here" \
  --note "Investigating — checking connection pool metrics"

# Snooze an alert
lamp snoozeAlert --id "alert-id-here" \
  --endDate "2026-05-18T14:00:00Z"

Querying Alerts

# List open alerts
lamp listAlerts --query "status=open"

# List alerts by priority
lamp listAlerts --query "priority=P1 AND status=open"

# List alerts for a specific team
lamp listAlerts --query "teams=platform-team AND createdAt>1716000000"

# Get alert details
lamp getAlert --id "alert-id-here"

# Count alerts by tag
lamp listAlerts --query "tag=production AND status=open" --limit 100

Core Commands — REST API

Alert API

# Create alert via API
curl -X POST 'https://api.opsgenie.com/v2/alerts' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "message": "Disk space critical on prod-app-01",
    "alias": "disk-space-prod-app-01",
    "description": "Root partition at 95% usage",
    "responders": [
      {"type": "team", "name": "infrastructure-team"}
    ],
    "priority": "P1",
    "tags": ["disk", "production", "infrastructure"]
  }'

# Get alert by ID
curl -X GET 'https://api.opsgenie.com/v2/alerts/ALERT_ID' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

# Close alert via API
curl -X POST 'https://api.opsgenie.com/v2/alerts/ALERT_ID/close' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"note": "Issue resolved after disk cleanup"}'

# Escalate alert to next responder
curl -X POST 'https://api.opsgenie.com/v2/alerts/ALERT_ID/escalate' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"escalation": {"name": "critical-escalation"}}'

On-Call and Schedule API

# Get current on-call participants
curl -X GET 'https://api.opsgenie.com/v2/schedules/SCHEDULE_NAME/on-calls' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

# List all schedules
curl -X GET 'https://api.opsgenie.com/v2/schedules' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

# Create an on-call override
curl -X POST 'https://api.opsgenie.com/v2/schedules/SCHEDULE_NAME/overrides' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "user": {"type": "user", "username": "jane@example.com"},
    "startDate": "2026-05-18T09:00:00Z",
    "endDate": "2026-05-19T09:00:00Z"
  }'

# Get who is on call for a specific time
curl -X GET 'https://api.opsgenie.com/v2/schedules/SCHEDULE_NAME/on-calls?date=2026-05-20T12:00:00Z' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

Configuration

Integration Configuration (Prometheus Example)

# OpsGenie Alertmanager integration
receivers:
  - name: 'opsgenie-critical'
    opsgenie_configs:
      - api_key: 'your-opsgenie-api-key'
        api_url: 'https://api.opsgenie.com'
        message: '{{ .GroupLabels.alertname }}'
        description: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
        priority: '{{ if eq .GroupLabels.severity "critical" }}P1{{ else }}P3{{ end }}'
        tags: 'prometheus,{{ .GroupLabels.severity }}'
        responders:
          - type: team
            name: sre-team

Terraform Team and Routing Configuration

# Create a team
resource "opsgenie_team" "platform" {
  name        = "Platform Engineering"
  description = "Platform engineering team"

  member {
    id   = opsgenie_user.admin.id
    role = "admin"
  }

  member {
    id   = opsgenie_user.engineer.id
    role = "user"
  }
}

# Create escalation policy
resource "opsgenie_escalation" "critical" {
  name = "critical-escalation"

  rules {
    condition   = "if-not-acked"
    notify_type = "default"
    delay       = 5

    recipient {
      type = "team"
      id   = opsgenie_team.platform.id
    }
  }

  rules {
    condition   = "if-not-acked"
    notify_type = "default"
    delay       = 15

    recipient {
      type = "user"
      id   = opsgenie_user.manager.id
    }
  }
}

# Create routing rule
resource "opsgenie_team_routing_rule" "high_priority" {
  name    = "high-priority-routing"
  team_id = opsgenie_team.platform.id
  order   = 0

  criteria {
    type = "match-all-conditions"
    conditions {
      field          = "priority"
      operation      = "equals"
      expected_value = "P1"
    }
  }

  notify {
    type = "escalation"
    name = opsgenie_escalation.critical.name
    id   = opsgenie_escalation.critical.id
  }
}

Advanced Usage

Heartbeat Monitoring

# Create a heartbeat (detect silent failures)
curl -X POST 'https://api.opsgenie.com/v2/heartbeats' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "backup-job-heartbeat",
    "description": "Nightly backup job health check",
    "interval": 60,
    "intervalUnit": "minutes",
    "enabled": true,
    "ownerTeam": {"name": "infrastructure-team"}
  }'

# Send heartbeat ping (call from your cron job)
curl -X GET 'https://api.opsgenie.com/v2/heartbeats/backup-job-heartbeat/ping' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

# Disable heartbeat during maintenance
curl -X POST 'https://api.opsgenie.com/v2/heartbeats/backup-job-heartbeat/disable' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

Alert Policy Automation

# Create an alert policy (auto-tagging)
curl -X POST 'https://api.opsgenie.com/v2/policies' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Auto-tag database alerts",
    "type": "alert",
    "enabled": true,
    "filter": {
      "type": "match-all-conditions",
      "conditions": [
        {"field": "message", "operation": "contains", "expectedValue": "database"}
      ]
    },
    "tags": ["database", "auto-tagged"],
    "priority": "P2"
  }'

# Create a maintenance window
curl -X POST 'https://api.opsgenie.com/v2/maintenance' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "description": "Scheduled database maintenance",
    "time": {
      "type": "schedule",
      "startDate": "2026-05-20T02:00:00Z",
      "endDate": "2026-05-20T06:00:00Z"
    },
    "rules": [
      {
        "entity": {"type": "integration", "id": "integration-id"},
        "state": "disabled"
      }
    ]
  }'

Incident Management

# Create an incident
curl -X POST 'https://api.opsgenie.com/v1/incidents/create' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "message": "Major outage — payment processing down",
    "description": "All payment transactions failing since 14:30 UTC",
    "priority": "P1",
    "impactedServices": ["payment-service-id"],
    "responders": [
      {"type": "team", "id": "sre-team-id"}
    ],
    "tags": ["outage", "payments", "p1"],
    "notifyStakeholders": true,
    "statusPageEntry": {
      "title": "Payment Processing Disruption"
    }
  }'

# Resolve an incident
curl -X POST 'https://api.opsgenie.com/v1/incidents/INCIDENT_ID/resolve' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"note": "Root cause identified and fix deployed"}'

Troubleshooting

IssueCauseSolution
Alerts not receivedAPI key invalid or expiredRegenerate key in Settings > API Key Management
Duplicate alerts firingMissing alias deduplicationSet unique alias field to deduplicate alerts
Escalation not triggeringEscalation policy misconfiguredVerify escalation rules and team membership
Integration not forwardingWebhook URL changed or blockedCheck integration logs in OpsGenie dashboard
Heartbeat false alarmsNetwork latency to APIIncrease heartbeat interval or add retry logic
On-call override not workingTime zone mismatchAlways use UTC in API calls, check user timezone settings
Alert actions failingOEC agent not runningRestart Edge Connector: docker restart opsgenie-edge
Rate limiting errors (429)Too many API callsImplement exponential backoff; limit is 3000 req/min
# Debug integration connectivity
curl -v 'https://api.opsgenie.com/v2/alerts/count?query=status=open' \
  -H 'Authorization: GenieKey YOUR_API_KEY'

# Verify Edge Connector agent health
docker logs opsgenie-edge --tail 50

# Test webhook delivery
curl -X POST 'https://api.opsgenie.com/v2/alerts' \
  -H 'Authorization: GenieKey YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"message": "Test alert — please ignore", "priority": "P5", "tags": ["test"]}'