Nomad

Comprehensive HashiCorp Nomad commands and workflows for workload orchestration, job scheduling, and cluster management.

Installation & Setup

Command	Description
`nomad version`	Show Nomad version
`nomad agent -dev`	Start development agent
`nomad agent -config=nomad.hcl`	Start with configuration
`nomad server members`	List server members
`nomad node status`	List client nodes

Job Management

Job Operations

Command	Description
`nomad job run example.nomad`	Submit job
`nomad job status`	List all jobs
`nomad job status example`	Show job details
`nomad job stop example`	Stop job
`nomad job stop -purge example`	Stop and purge job

Job Planning and Validation

Command	Description
`nomad job plan example.nomad`	Plan job changes
`nomad job validate example.nomad`	Validate job file
`nomad job inspect example`	Inspect job configuration
`nomad job history example`	Show job history

Job Scaling

Command	Description
`nomad job scale example 5`	Scale job to 5 instances
`nomad job scale example group 3`	Scale specific group

Allocation Management

Allocation Operations

Command	Description
`nomad alloc status`	List allocations
`nomad alloc status ALLOC_ID`	Show allocation details
`nomad alloc logs ALLOC_ID`	Show allocation logs
`nomad alloc logs -f ALLOC_ID`	Follow allocation logs
`nomad alloc exec ALLOC_ID /bin/bash`	Execute command in allocation

Allocation Debugging

Command	Description
`nomad alloc fs ALLOC_ID`	List allocation files
`nomad alloc fs ALLOC_ID /path/to/file`	Read allocation file
`nomad alloc restart ALLOC_ID`	Restart allocation
`nomad alloc stop ALLOC_ID`	Stop allocation

Node Management

Node Operations

Command	Description
`nomad node status`	List all nodes
`nomad node status NODE_ID`	Show node details
`nomad node drain NODE_ID`	Drain node
`nomad node eligibility -disable NODE_ID`	Disable node scheduling
`nomad node eligibility -enable NODE_ID`	Enable node scheduling

Node Maintenance

Command	Description
`nomad node drain -enable -deadline 30m NODE_ID`	Drain with deadline
`nomad node drain -disable NODE_ID`	Cancel drain
`nomad node meta apply NODE_ID key=value`	Set node metadata

Namespace Management

Command	Description
`nomad namespace list`	List namespaces
`nomad namespace status default`	Show namespace details
`nomad namespace apply -description="Dev environment" dev`	Create namespace
`nomad namespace delete dev`	Delete namespace

ACL Management

ACL Operations

Command	Description
`nomad acl bootstrap`	Bootstrap ACL system
`nomad acl token create -name="dev-token" -policy=dev-policy`	Create token
`nomad acl token list`	List tokens
`nomad acl token info TOKEN_ID`	Show token details

ACL Policies

Command	Description
`nomad acl policy apply dev-policy dev-policy.hcl`	Create/update policy
`nomad acl policy list`	List policies
`nomad acl policy info dev-policy`	Show policy details

Monitoring and Debugging

System Information

Command	Description
`nomad operator raft list-peers`	List Raft peers
`nomad operator snapshot save backup.snap`	Create snapshot
`nomad operator snapshot restore backup.snap`	Restore snapshot

Monitoring

Command	Description
`nomad monitor`	Stream logs
`nomad monitor -log-level=DEBUG`	Debug level logs
`nomad status`	Show cluster status

Job Specification Examples

Basic Web Service

job "web" \\\\{
  datacenters = ["dc1"]
  type = "service"

  group "web" \\\\{
    count = 3

    network \\\\{
      port "http" \\\\{
        static = 8080
      \\\\}
    \\\\}

    service \\\\{
      name = "web"
      port = "http"

      check \\\\{
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "2s"
      \\\\}
    \\\\}

    task "server" \\\\{
      driver = "docker"

      config \\\\{
        image = "nginx:latest"
        ports = ["http"]
      \\\\}

      resources \\\\{
        cpu    = 100
        memory = 128
      \\\\}
    \\\\}
  \\\\}
\\\\}

Batch Job

job "batch-job" \\\\{
  datacenters = ["dc1"]
  type = "batch"

  group "processing" \\\\{
    count = 1

    task "process" \\\\{
      driver = "docker"

      config \\\\{
        image = "alpine:latest"
        command = "sh"
        args = ["-c", "echo 'Processing data...' && sleep 30"]
      \\\\}

      resources \\\\{
        cpu    = 200
        memory = 256
      \\\\}
    \\\\}
  \\\\}
\\\\}

Periodic Job

job "backup" \\\\{
  datacenters = ["dc1"]
  type = "batch"

  periodic \\\\{
    cron             = "0 2 * * *"
    prohibit_overlap = true
  \\\\}

  group "backup" \\\\{
    task "backup-task" \\\\{
      driver = "docker"

      config \\\\{
        image = "backup-tool:latest"
        command = "/backup.sh"
      \\\\}

      resources \\\\{
        cpu    = 100
        memory = 256
      \\\\}
    \\\\}
  \\\\}
\\\\}

System Job

job "monitoring" \\\\{
  datacenters = ["dc1"]
  type = "system"

  group "monitoring" \\\\{
    task "node-exporter" \\\\{
      driver = "docker"

      config \\\\{
        image = "prom/node-exporter:latest"
        network_mode = "host"
        pid_mode = "host"
      \\\\}

      resources \\\\{
        cpu    = 50
        memory = 64
      \\\\}
    \\\\}
  \\\\}
\\\\}

Configuration Examples

Server Configuration

datacenter = "dc1"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"

server \\\\{
  enabled = true
  bootstrap_expect = 3

  server_join \\\\{
    retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"]
  \\\\}
\\\\}

consul \\\\{
  address = "127.0.0.1:8500"
\\\\}

vault \\\\{
  enabled = true
  address = "https://vault.service.consul:8200"
\\\\}

acl \\\\{
  enabled = true
\\\\}

ui \\\\{
  enabled = true
\\\\}

Client Configuration

datacenter = "dc1"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"

client \\\\{
  enabled = true

  server_join \\\\{
    retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"]
  \\\\}

  node_class = "compute"

  meta \\\\{
    "type" = "worker"
    "zone" = "us-east-1a"
  \\\\}
\\\\}

plugin "docker" \\\\{
  config \\\\{
    allow_privileged = true
    volumes \\\\{
      enabled = true
    \\\\}
  \\\\}
\\\\}

consul \\\\{
  address = "127.0.0.1:8500"
\\\\}

vault \\\\{
  enabled = true
  address = "https://vault.service.consul:8200"
\\\\}

Advanced Features

Constraints and Affinities

job "web" \\\\{
  constraint \\\\{
    attribute = "$\\\\{attr.kernel.name\\\\}"
    value     = "linux"
  \\\\}

  affinity \\\\{
    attribute = "$\\\\{node.class\\\\}"
    value     = "compute"
    weight    = 100
  \\\\}

  group "web" \\\\{
    constraint \\\\{
      attribute = "$\\\\{meta.zone\\\\}"
      value     = "us-east-1a"
    \\\\}

    # ... rest of group configuration
  \\\\}
\\\\}

Volume Management

job "database" \\\\{
  group "db" \\\\{
    volume "data" \\\\{
      type      = "host"
      source    = "mysql_data"
      read_only = false
    \\\\}

    task "mysql" \\\\{
      driver = "docker"

      volume_mount \\\\{
        volume      = "data"
        destination = "/var/lib/mysql"
      \\\\}

      config \\\\{
        image = "mysql:8.0"
      \\\\}
    \\\\}
  \\\\}
\\\\}

Service Discovery Integration

job "api" \\\\{
  group "api" \\\\{
    service \\\\{
      name = "api"
      port = "http"

      tags = [
        "api",
        "v1.0",
        "traefik.enable=true",
        "traefik.http.routers.api.rule=Host(`api.example.com`)"
      ]

      check \\\\{
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "2s"
      \\\\}

      connect \\\\{
        sidecar_service \\\\{
          proxy \\\\{
            upstreams \\\\{
              destination_name = "database"
              local_bind_port  = 5432
            \\\\}
          \\\\}
        \\\\}
      \\\\}
    \\\\}
  \\\\}
\\\\}

Best Practices

Job Design

Resource Allocation: Set appropriate CPU and memory limits
Health Checks: Implement comprehensive health checks
Graceful Shutdown: Handle SIGTERM signals properly
Logging: Use structured logging with proper levels
Configuration: Use templates and environment variables

Cluster Management

High Availability: Deploy multiple server nodes
Backup Strategy: Regular snapshots and backups
Monitoring: Monitor cluster health and job status
Capacity Planning: Plan for resource requirements
Security: Enable ACLs and use TLS

Operations

Rolling Updates: Use update strategies for zero downtime
Canary Deployments: Test changes with canary deployments
Resource Monitoring: Monitor resource usage
Log Aggregation: Centralize log collection
Alerting: Set up alerts for critical issues

Security

ACL Policies: Implement least privilege access
Network Security: Use service mesh for secure communication
Secrets Management: Integrate with Vault for secrets
Image Security: Scan container images for vulnerabilities
Audit Logging: Enable audit logging for compliance