Nomad
Comprehensive HashiCorp Nomad commands and workflows for workload orchestration, job scheduling, and cluster management.
Installation & Setup
Command |
Description |
nomad version |
Show Nomad version |
nomad agent -dev |
Start development agent |
nomad agent -config=nomad.hcl |
Start with configuration |
nomad server members |
List server members |
nomad node status |
List client nodes |
Job Management
Job Operations
Command |
Description |
nomad job run example.nomad |
Submit job |
nomad job status |
List all jobs |
nomad job status example |
Show job details |
nomad job stop example |
Stop job |
nomad job stop -purge example |
Stop and purge job |
Job Planning and Validation
Command |
Description |
nomad job plan example.nomad |
Plan job changes |
nomad job validate example.nomad |
Validate job file |
nomad job inspect example |
Inspect job configuration |
nomad job history example |
Show job history |
Job Scaling
Command |
Description |
nomad job scale example 5 |
Scale job to 5 instances |
nomad job scale example group 3 |
Scale specific group |
Allocation Management
Allocation Operations
Command |
Description |
nomad alloc status |
List allocations |
nomad alloc status ALLOC_ID |
Show allocation details |
nomad alloc logs ALLOC_ID |
Show allocation logs |
nomad alloc logs -f ALLOC_ID |
Follow allocation logs |
nomad alloc exec ALLOC_ID /bin/bash |
Execute command in allocation |
Allocation Debugging
Command |
Description |
nomad alloc fs ALLOC_ID |
List allocation files |
nomad alloc fs ALLOC_ID /path/to/file |
Read allocation file |
nomad alloc restart ALLOC_ID |
Restart allocation |
nomad alloc stop ALLOC_ID |
Stop allocation |
Node Management
Node Operations
Command |
Description |
nomad node status |
List all nodes |
nomad node status NODE_ID |
Show node details |
nomad node drain NODE_ID |
Drain node |
nomad node eligibility -disable NODE_ID |
Disable node scheduling |
nomad node eligibility -enable NODE_ID |
Enable node scheduling |
Node Maintenance
Command |
Description |
nomad node drain -enable -deadline 30m NODE_ID |
Drain with deadline |
nomad node drain -disable NODE_ID |
Cancel drain |
nomad node meta apply NODE_ID key=value |
Set node metadata |
Namespace Management
Command |
Description |
nomad namespace list |
List namespaces |
nomad namespace status default |
Show namespace details |
nomad namespace apply -description="Dev environment" dev |
Create namespace |
nomad namespace delete dev |
Delete namespace |
ACL Management
ACL Operations
Command |
Description |
nomad acl bootstrap |
Bootstrap ACL system |
nomad acl token create -name="dev-token" -policy=dev-policy |
Create token |
nomad acl token list |
List tokens |
nomad acl token info TOKEN_ID |
Show token details |
ACL Policies
Command |
Description |
nomad acl policy apply dev-policy dev-policy.hcl |
Create/update policy |
nomad acl policy list |
List policies |
nomad acl policy info dev-policy |
Show policy details |
Monitoring and Debugging
Command |
Description |
nomad operator raft list-peers |
List Raft peers |
nomad operator snapshot save backup.snap |
Create snapshot |
nomad operator snapshot restore backup.snap |
Restore snapshot |
Monitoring
Command |
Description |
nomad monitor |
Stream logs |
nomad monitor -log-level=DEBUG |
Debug level logs |
nomad status |
Show cluster status |
Job Specification Examples
Basic Web Service
job "web" \\\\{
datacenters = ["dc1"]
type = "service"
group "web" \\\\{
count = 3
network \\\\{
port "http" \\\\{
static = 8080
\\\\}
\\\\}
service \\\\{
name = "web"
port = "http"
check \\\\{
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
\\\\}
\\\\}
task "server" \\\\{
driver = "docker"
config \\\\{
image = "nginx:latest"
ports = ["http"]
\\\\}
resources \\\\{
cpu = 100
memory = 128
\\\\}
\\\\}
\\\\}
\\\\}
Batch Job
job "batch-job" \\\\{
datacenters = ["dc1"]
type = "batch"
group "processing" \\\\{
count = 1
task "process" \\\\{
driver = "docker"
config \\\\{
image = "alpine:latest"
command = "sh"
args = ["-c", "echo 'Processing data...' && sleep 30"]
\\\\}
resources \\\\{
cpu = 200
memory = 256
\\\\}
\\\\}
\\\\}
\\\\}
Periodic Job
job "backup" \\\\{
datacenters = ["dc1"]
type = "batch"
periodic \\\\{
cron = "0 2 * * *"
prohibit_overlap = true
\\\\}
group "backup" \\\\{
task "backup-task" \\\\{
driver = "docker"
config \\\\{
image = "backup-tool:latest"
command = "/backup.sh"
\\\\}
resources \\\\{
cpu = 100
memory = 256
\\\\}
\\\\}
\\\\}
\\\\}
System Job
job "monitoring" \\\\{
datacenters = ["dc1"]
type = "system"
group "monitoring" \\\\{
task "node-exporter" \\\\{
driver = "docker"
config \\\\{
image = "prom/node-exporter:latest"
network_mode = "host"
pid_mode = "host"
\\\\}
resources \\\\{
cpu = 50
memory = 64
\\\\}
\\\\}
\\\\}
\\\\}
Configuration Examples
Server Configuration
datacenter = "dc1"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"
server \\\\{
enabled = true
bootstrap_expect = 3
server_join \\\\{
retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"]
\\\\}
\\\\}
consul \\\\{
address = "127.0.0.1:8500"
\\\\}
vault \\\\{
enabled = true
address = "https://vault.service.consul:8200"
\\\\}
acl \\\\{
enabled = true
\\\\}
ui \\\\{
enabled = true
\\\\}
Client Configuration
datacenter = "dc1"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"
client \\\\{
enabled = true
server_join \\\\{
retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"]
\\\\}
node_class = "compute"
meta \\\\{
"type" = "worker"
"zone" = "us-east-1a"
\\\\}
\\\\}
plugin "docker" \\\\{
config \\\\{
allow_privileged = true
volumes \\\\{
enabled = true
\\\\}
\\\\}
\\\\}
consul \\\\{
address = "127.0.0.1:8500"
\\\\}
vault \\\\{
enabled = true
address = "https://vault.service.consul:8200"
\\\\}
Advanced Features
Constraints and Affinities
job "web" \\\\{
constraint \\\\{
attribute = "$\\\\{attr.kernel.name\\\\}"
value = "linux"
\\\\}
affinity \\\\{
attribute = "$\\\\{node.class\\\\}"
value = "compute"
weight = 100
\\\\}
group "web" \\\\{
constraint \\\\{
attribute = "$\\\\{meta.zone\\\\}"
value = "us-east-1a"
\\\\}
# ... rest of group configuration
\\\\}
\\\\}
Volume Management
job "database" \\\\{
group "db" \\\\{
volume "data" \\\\{
type = "host"
source = "mysql_data"
read_only = false
\\\\}
task "mysql" \\\\{
driver = "docker"
volume_mount \\\\{
volume = "data"
destination = "/var/lib/mysql"
\\\\}
config \\\\{
image = "mysql:8.0"
\\\\}
\\\\}
\\\\}
\\\\}
Service Discovery Integration
job "api" \\\\{
group "api" \\\\{
service \\\\{
name = "api"
port = "http"
tags = [
"api",
"v1.0",
"traefik.enable=true",
"traefik.http.routers.api.rule=Host(`api.example.com`)"
]
check \\\\{
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
\\\\}
connect \\\\{
sidecar_service \\\\{
proxy \\\\{
upstreams \\\\{
destination_name = "database"
local_bind_port = 5432
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
Best Practices
Job Design
- Resource Allocation: Set appropriate CPU and memory limits
- Health Checks: Implement comprehensive health checks
- Graceful Shutdown: Handle SIGTERM signals properly
- Logging: Use structured logging with proper levels
- Configuration: Use templates and environment variables
Cluster Management
- High Availability: Deploy multiple server nodes
- Backup Strategy: Regular snapshots and backups
- Monitoring: Monitor cluster health and job status
- Capacity Planning: Plan for resource requirements
- Security: Enable ACLs and use TLS
Operations
- Rolling Updates: Use update strategies for zero downtime
- Canary Deployments: Test changes with canary deployments
- Resource Monitoring: Monitor resource usage
- Log Aggregation: Centralize log collection
- Alerting: Set up alerts for critical issues
Security
- ACL Policies: Implement least privilege access
- Network Security: Use service mesh for secure communication
- Secrets Management: Integrate with Vault for secrets
- Image Security: Scan container images for vulnerabilities
- Audit Logging: Enable audit logging for compliance