Nomad¶
Umfassende HashiCorp Nomad Befehle und Workflows für Workload-Orchestrierung, Job-Scheduling und Cluster-Management.
Installation und Inbetriebnahme¶
Command | Description |
---|---|
nomad version |
Show Nomad version |
nomad agent -dev |
Start development agent |
nomad agent -config=nomad.hcl |
Start with configuration |
nomad server members |
List server members |
nomad node status |
List client nodes |
Job Management¶
Stellenangebote¶
Command | Description |
---|---|
nomad job run example.nomad |
Submit job |
nomad job status |
List all jobs |
nomad job status example |
Show job details |
nomad job stop example |
Stop job |
nomad job stop -purge example |
Stop and purge job |
Jobplanung und Validierung¶
Command | Description |
---|---|
nomad job plan example.nomad |
Plan job changes |
nomad job validate example.nomad |
Validate job file |
nomad job inspect example |
Inspect job configuration |
nomad job history example |
Show job history |
Job Scaling¶
Command | Description |
---|---|
nomad job scale example 5 |
Scale job to 5 instances |
nomad job scale example group 3 |
Scale specific group |
Allocation Management¶
Zuweisungen¶
Command | Description |
---|---|
nomad alloc status |
List allocations |
nomad alloc status ALLOC_ID |
Show allocation details |
nomad alloc logs ALLOC_ID |
Show allocation logs |
nomad alloc logs -f ALLOC_ID |
Follow allocation logs |
nomad alloc exec ALLOC_ID /bin/bash |
Execute command in allocation |
Allocation Debugging¶
Command | Description |
---|---|
nomad alloc fs ALLOC_ID |
List allocation files |
nomad alloc fs ALLOC_ID /path/to/file |
Read allocation file |
nomad alloc restart ALLOC_ID |
Restart allocation |
nomad alloc stop ALLOC_ID |
Stop allocation |
Node Management¶
Node Operationen¶
Command | Description |
---|---|
nomad node status |
List all nodes |
nomad node status NODE_ID |
Show node details |
nomad node drain NODE_ID |
Drain node |
nomad node eligibility -disable NODE_ID |
Disable node scheduling |
nomad node eligibility -enable NODE_ID |
Enable node scheduling |
Keine Wartung¶
Command | Description |
---|---|
nomad node drain -enable -deadline 30m NODE_ID |
Drain with deadline |
nomad node drain -disable NODE_ID |
Cancel drain |
nomad node meta apply NODE_ID key=value |
Set node metadata |
Name und Name¶
Command | Description |
---|---|
nomad namespace list |
List namespaces |
nomad namespace status default |
Show namespace details |
nomad namespace apply -description="Dev environment" dev |
Create namespace |
nomad namespace delete dev |
Delete namespace |
ACL Management¶
ACL Operationen¶
Command | Description |
---|---|
nomad acl bootstrap |
Bootstrap ACL system |
nomad acl token create -name="dev-token" -policy=dev-policy |
Create token |
nomad acl token list |
List tokens |
nomad acl token info TOKEN_ID |
Show token details |
ACL Richtlinien¶
Command | Description |
---|---|
nomad acl policy apply dev-policy dev-policy.hcl |
Create/update policy |
nomad acl policy list |
List policies |
nomad acl policy info dev-policy |
Show policy details |
Überwachung und Debugging¶
Systeminformationen¶
Command | Description |
---|---|
nomad operator raft list-peers |
List Raft peers |
nomad operator snapshot save backup.snap |
Create snapshot |
nomad operator snapshot restore backup.snap |
Restore snapshot |
Überwachung¶
Command | Description |
---|---|
nomad monitor |
Stream logs |
nomad monitor -log-level=DEBUG |
Debug level logs |
nomad status |
Show cluster status |
Beispiele für die Job-Spezifikation¶
Basic Web Service¶
```hcl job "web" \\{ datacenters = ["dc1"] type = "service"
group "web" \\{ count = 3
network \\\\{
port "http" \\\\{
static = 8080
\\\\}
\\\\}
service \\\\{
name = "web"
port = "http"
check \\\\{
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
\\\\}
\\\\}
task "server" \\\\{
driver = "docker"
config \\\\{
image = "nginx:latest"
ports = ["http"]
\\\\}
resources \\\\{
cpu = 100
memory = 128
\\\\}
\\\\}
\\} \\} ```_
Batch Job¶
```hcl job "batch-job" \\{ datacenters = ["dc1"] type = "batch"
group "processing" \\{ count = 1
task "process" \\\\{
driver = "docker"
config \\\\{
image = "alpine:latest"
command = "sh"
args = ["-c", "echo 'Processing data...' && sleep 30"]
\\\\}
resources \\\\{
cpu = 200
memory = 256
\\\\}
\\\\}
\\} \\} ```_
Regelmäßiger Job¶
```hcl job "backup" \\{ datacenters = ["dc1"] type = "batch"
periodic \\{ cron = "0 2 * * *" prohibit_overlap = true \\}
group "backup" \\{ task "backup-task" \\{ driver = "docker"
config \\\\{
image = "backup-tool:latest"
command = "/backup.sh"
\\\\}
resources \\\\{
cpu = 100
memory = 256
\\\\}
\\\\}
\\} \\} ```_
System Job¶
```hcl job "monitoring" \\{ datacenters = ["dc1"] type = "system"
group "monitoring" \\{ task "node-exporter" \\{ driver = "docker"
config \\\\{
image = "prom/node-exporter:latest"
network_mode = "host"
pid_mode = "host"
\\\\}
resources \\\\{
cpu = 50
memory = 64
\\\\}
\\\\}
\\} \\} ```_
Konfigurationsbeispiele¶
Serverkonfiguration¶
```hcl datacenter = "dc1" data_dir = "/opt/nomad/data" log_level = "INFO" bind_addr = "0.0.0.0"
server \\{ enabled = true bootstrap_expect = 3
server_join \\{ retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"] \\} \\}
consul \\{ address = "127.0.0.1:8500" \\}
vault \\{ enabled = true address = "https://vault.service.consul:8200" \\}
acl \\{ enabled = true \\}
ui \\{ enabled = true \\} ```_
Client Konfiguration¶
```hcl datacenter = "dc1" data_dir = "/opt/nomad/data" log_level = "INFO" bind_addr = "0.0.0.0"
client \\{ enabled = true
server_join \\{ retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"] \\}
node_class = "compute"
meta \\{ "type" = "worker" "zone" = "us-east-1a" \\} \\}
plugin "docker" \\{ config \\{ allow_privileged = true volumes \\{ enabled = true \\} \\} \\}
consul \\{ address = "127.0.0.1:8500" \\}
vault \\{ enabled = true address = "https://vault.service.consul:8200" \\} ```_
Erweiterte Funktionen¶
Einschränkungen und Affinitäten¶
```hcl job "web" \\{ constraint \\{ attribute = "$\\{attr.kernel.name\\}" value = "linux" \\}
affinity \\{ attribute = "$\\{node.class\\}" value = "compute" weight = 100 \\}
group "web" \\{ constraint \\{ attribute = "$\\{meta.zone\\}" value = "us-east-1a" \\}
# ... rest of group configuration
\\} \\} ```_
Finanzmanagement¶
```hcl job "database" \\{ group "db" \\{ volume "data" \\{ type = "host" source = "mysql_data" read_only = false \\}
task "mysql" \\\\{
driver = "docker"
volume_mount \\\\{
volume = "data"
destination = "/var/lib/mysql"
\\\\}
config \\\\{
image = "mysql:8.0"
\\\\}
\\\\}
\\} \\} ```_
Service Discovery Integration¶
```hcl job "api" \\{ group "api" \\{ service \\{ name = "api" port = "http"
tags = [
"api",
"v1.0",
"traefik.enable=true",
"traefik.http.routers.api.rule=Host(`api.example.com`)"
]
check \\\\{
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
\\\\}
connect \\\\{
sidecar_service \\\\{
proxy \\\\{
upstreams \\\\{
destination_name = "database"
local_bind_port = 5432
\\\\}
\\\\}
\\\\}
\\\\}
\\\\}
\\} \\} ```_
Best Practices¶
Job Design¶
- Resource Allocation: Setzen Sie entsprechende CPU- und Speichergrenzen
- **Gesundheitskontrollen*: Durchführung umfassender Gesundheitskontrollen
- Graceful Shutdown: Schalten Sie SIGTERM Signale richtig
- Logging: Verwenden Sie strukturiertes Protokoll mit den richtigen Ebenen
- ** Konfiguration*: Vorlagen und Umgebungsvariablen verwenden
Cluster Management¶
- ** Hohe Verfügbarkeit**: Bereitstellung mehrerer Serverknoten
- **Backup-Strategie*: Regelmäßige Snapshots und Backups
- Monitoring: Überwachung von Cluster-Gesundheit und Jobstatus
- **Kapazitätsplanung*: Plan für Ressourcenanforderungen
- Sicherheit: ACL aktivieren und TLS verwenden
Operationen¶
- **Rolling-Updates*: Verwenden Sie Update-Strategien für null Ausfallzeiten
- Kanzleien: Teständerungen mit Kanarieneinsätzen
- ** Ressourcenüberwachung** Ressourcennutzung überwachen
- Log Aggregation: Zentrale Protokollsammlung
- Alerting: Alarme für kritische Fragen einrichten
Sicherheit¶
- ACL Richtlinien: Mindestberechtigungszugriff
- Network Security: Dienstnetz für sichere Kommunikation verwenden
- **Secrets Management*: Integrieren mit Tresor für Geheimnisse
- Image Security: Scannen von Containerbildern für Schwachstellen
- **Audit Logging*: Auditprotokoll aktivieren für Compliance