Running Duplicate Batch Jobs in HashiCorp Nomad
Two approaches to injecting variability into your Nomad batch job template without having to modify the template in the future.
By default, HashiCorp Nomad prevents duplicate batch jobs from executing. This is by design because duplicate job submissions could result in unnecessary work. From Nomad’s perspective, an unchanged job qualifies as a duplicate batch job.
However, there are times when a duplicate batch job or an unchanged job may be the correct approach. One example is a batch job that executes a calculation and outputs the results. In this scenario, it is likely that there is no need to change the Nomad job specification definition. Running the command nomad run
for the specific job would be the desired behavior. But, due to Nomad’s default behavior, this would result in the batch job placement failing.
To get around this default behavior, you can use a couple of techniques to inject variation in ways that don't require you to alter the job’s content. This blog presents two approaches to injecting variability into your Nomad batch job template without having to modify the template in the future.
» Use a UUID as an Ever-Changing Value
The meta
block of a Nomad job specification allows for user-defined arbitrary key-value pairs. By using HCL2 functions and the meta
block, you can inject variation into a batch job without having to alter the job specification template. You can use the UUID function to inject variation and thus ensure the job is unique every time you run the command nomad run
.
To see how it works, create a file called uuid.nomad
and copy the content below into it. This batch job runs the Hello World
Docker example. Note how the meta
block is setting a key-value pair and using the uuidv4()
function:
job "uuid.nomad" {
datacenters = ["dc1"]
type = "batch"
meta {
run_uuid = "${uuidv4()}"
}
group "uuid" {
task "hello-world" {
driver = "docker"
config {
image = "hello-world:latest"
}
}
}
}
Start a local Nomad server by issuing the command nomad agent -dev
:
nomad agent -dev
==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:
...
client: node registration complete
Ensure the Nomad server is up and running. Then navigate to the directory where you created the file uuid.nomad
and issue the command nomad run uuid.nomad
. This will submit the batch job to Nomad:
$ nomad run uuid.nomad
==> Monitoring evaluation "44c8a150"
Evaluation triggered by job "uuid.nomad"
==> Monitoring evaluation "44c8a150"
Allocation "4fb444d4" created: node "fd14a894", group "uuid"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "44c8a150" finished with status "complete"
Check the status of the job allocation by using the nomad alloc status
command:
$ nomad alloc status 4fb444d4
ID = 4fb444d4-3c5c-51d7-3820-c3752796aad7
Eval ID = 44c8a150
Name = uuid.nomad.uuid[0]
Node ID = fd14a894
Node Name = myDeskTop
Job ID = uuid.nomad
Job Version = 0
Client Status = complete
Client Description = All tasks have completed
Desired Status = run
Desired Description = <none>
Created = 2m30s ago
Modified = 2m27s ago
Task "hello-world" is "dead"
Task Resources
CPU Memory Disk Addresses
0/100 MHz 0 B/300 MiB 300 MiB
Task Events:
Started At = 2021-05-28T19:47:41Z
Finished At = 2021-05-28T19:47:41Z
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2021-05-28T12:47:41-07:00 Terminated Exit Code: 0
2021-05-28T12:47:41-07:00 Started Task started by client
2021-05-28T12:47:38-07:00 Driver Downloading image
2021-05-28T12:47:38-07:00 Task Setup Building Task Directory
2021-05-28T12:47:38-07:00 Received Task received by client
The output indicates a successful job with an exit code 0
. Submit the job again though the command nomad run uuid.nomad
:
nomad run uuid.nomad
==> Monitoring evaluation "fd2e5e6d"
Evaluation triggered by job "uuid.nomad"
Allocation "9528b83d" created: node "fd14a894", group "uuid"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "fd2e5e6d" finished with status "complete"
The job ran again and bypassed the default behavior due to having a different uuid
value. You can verify that the job ran twice through the Nomad UI by looking at the jobs overview, as shown here:
You can see in the Recent Allocations view that the two jobs ran successfully.
» Use an HCL2 Variable
You can achieve the same behavior of injecting variability by utilizing the meta
block in a job specification and a variable.
Start by creating a file named variable.nomad
and copy the content below into the file. This batch does the exact same thing as the uuid.nomad
file, except this code snippet is using variables:
job "variable.nomad" {
datacenters = ["dc1"]
type = "batch"
meta {
run_index = "${floor(var.run_index)}"
}
group "variable" {
task "hello-world" {
driver = "docker"
config {
image = "hello-world:latest"
}
}
}
}
variable "run_index" {
type = number
description = "An integer that, when changed from the current value, causes the job to restart."
validation {
condition = var.run_index == floor(var.run_index)
error_message = "The run_index must be an integer."
}
}
Go ahead and submit the batch job by running the command nomad run -var run_index=1 variable.nomad
:
$ nomad run -var run_index=1 variable.nomad
==> Monitoring evaluation "387bfe35"
Evaluation triggered by job "variable.nomad"
Allocation "de54c080" created: node "185068cf", group "variable"
==> Monitoring evaluation "387bfe35"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "387bfe35" finished with status "complete"
Check the status of the job with the nomad alloc status
command:
$ nomad alloc status de54
ID = de54c080-e3f3-cef3-1d9e-b1a4d956106c
Eval ID = 387bfe35
Name = variable.nomad.variable[0]
Node ID = 185068cf
Node Name = myDeskTop
Job ID = variable.nomad
Job Version = 0
Client Status = complete
Client Description = All tasks have completed
Desired Status = run
Desired Description = <none>
Created = 26s ago
Modified = 24s ago
Task "hello-world" is "dead"
Task Resources
CPU Memory Disk Addresses
0/100 MHz 0 B/300 MiB 300 MiB
Task Events:
Started At = 2021-05-28T20:50:48Z
Finished At = 2021-05-28T20:50:48Z
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2021-05-28T13:50:48-07:00 Terminated Exit Code: 0
2021-05-28T13:50:48-07:00 Started Task started by client
2021-05-28T13:50:46-07:00 Driver Downloading image
2021-05-28T13:50:46-07:00 Task Setup Building Task Directory
2021-05-28T13:50:46-07:00 Received Task received by client
The output reveals that the job completed successfully. If you were to submit the job again through the command nomad run -var run_index=1 variable.nomad
, the job allocation would have failed as the index value provided is the same as the previously submitted batch job. The screenshot below was taken after three submissions of the same batch job were submitted:
Three evaluations were conducted but only one batch job was allocated, the first one:
In order for Nomad to accept the job, you need to provide a unique value. Go ahead and change the index to a value of 2 and issue the command nomad run -var run_index=2 variable.nomad
:
$ nomad run -var run_index=2 variable.nomad
==> Monitoring evaluation "522bce96"
Evaluation triggered by job "variable.nomad"
Allocation "298d7cf7" created: node "185068cf", group "variable"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "522bce96" finished with status "complete"
This submission is accepted because it contains a unique value, an index value of 2. You can confirm the allocation was successful by visiting the Nomad UI or by running the command nomad alloc status
:
» Next Steps
This post shared two approaches to injecting variability into your Nomad batch job template without having to modify the template in the future. There are many more Nomad tutorials available on the HashiCorp Learn Platform, where you can expand your Nomad knowledge and skills. Here are a few tutorials worth checking out this summer that will help you power up your Nomad skills.
Sign up for the latest HashiCorp news
More blog posts like this one
Terraform Enterprise improves deployment flexibility with Nomad and OpenShift
Customers can now deploy Terraform Enterprise using Red Hat OpenShift or HashiCorp Nomad runtime platforms.
Nomad’s internal garbage collection and optimization discovery during the Nomad Bench project
A look into Nomad’s internal garbage collection process and the optimization discovered during the bench project.
New approaches to measuring Nomad performance
See how the HashiCorp Nomad team re-examined how to capture performance for a workload orchestrator, resulting in new metrics to better capture Nomad’s performance.