Spread and Affinites in Nomad
Hashicorp Nomad 0.9 introduces new scheduling features that allow application owners and operators more fine-grained control over where to place their workloads. This post describes these features in more detail.
» Background
The Nomad scheduler uses a bin packing algorithm to optimize resource utilization across a cluster. Application owners can use the constraint stanza to limit the set of eligible nodes for placement. Apart from the constraint stanza, Nomad 0.8 and prior versions did not provide other fine grained control to users on where their workload would get placed in the cluster.
The new scheduling features in 0.9 provide additional flexibility in expressing placement preferences and allow operators to increase the failure tolerance of their workloads. These new features support use cases such as:
-
Increasing the failure tolerance of a job by spreading its instances across multiple data centers or physical racks, via the spread stanza.
-
Targeting a specific class of nodes for specialized workloads via the new affinity stanza.
» Spread Stanza
Nomad’s primary placement strategy is a bin packing algorithm. Bin packing reduces overall infrastructure costs by optimizing placement on nodes that are running existing workloads. One downside to bin packing is that it could lead to situations where too many instances of the same workload end up in a single datacenter even if the job is specified to run in multiple data centers. If there is a catastrophic failure at the datacenter level, this can cause temporary outages until Nomad reschedules to another datacenter.
The spread stanza introduced in Nomad 0.9 solves this problem by allowing operators to distribute their workloads in a customized way based on attributes and/or client metadata. By using spread criteria in their job specification, Nomad job operators can ensure that failures across a domain such as datacenter or rack don't affect application availability.
The spread stanza can be specified at the job level as well as at the task group level. Job level spread criteria are inherited by all task groups in the job.
Example:
job "docs" {
datacenters = [“us-east1”, “us-east2”]
#Spread allocations over all datacenter
spread {
attribute = "${node.datacenter}"
}
group "test" {
count = 10
#Spread allocations over each rack based on desired percentage
spread {
attribute = "${meta.rack}"
target "r1" {
percent = 60
}
target "r2" {
percent = 40
}
}
}
}
In the above example, the job has a spread stanza based on the datacenter of the node. By default, Nomad uses a uniform spread strategy when a spread stanza does not specify specific target percentages. Nomad will prefer that each datacenter runs 5 instances of the job.
Spread stanzas can also have specific target percentages. The task group “test” in the above example specifies different target percentages for "r1”
and “r2”
. Nomad will ensure that 60% of the instances schedule onto nodes in “r1”
, and 40% in nodes in “r2”
.
The spread stanza also works when targets are partially specified. In the same example, if we removed the target for "r2”
, and there were more than two racks, Nomad will schedule 50% of instances in “r1”
and evenly spread the other remaining instances across all other racks.
Spread criteria are also treated as a soft preference by the Nomad scheduler. If no nodes match a given spread criteria, placement is still successful as long as constraints and resource requirements can be satisfied
By using spread criteria in their job specification, Nomad job operators can ensure that failures across a domain such as datacenter or rack don't affect application availability. For more details and examples, refer to our spread documentation.
» Affinity Stanza
As mentioned above, previous versions of Nomad have a constraint stanza which strictly filters where jobs are run based on attributes and client metadata. If no nodes are found to match, the placement does not succeed.
The affinity stanza in Nomad 0.9 allows operators to express placement preferences for their jobs on particular types of nodes.The affinity stanza acts like a "soft constraint." Nomad will attempt to match the desired affinity, but placement will succeed even if no nodes match the desired criteria.
When scoring nodes for placement, Nomad will take any matching affinities into account so that nodes that match preferred criteria are scored higher. Scores from affinities are combined with other scoring factors such as bin packing.
Similar to the constraint stanza, the affinity stanza can be specified at the job level as well as at the task group and task levels. Job level affinities are inherited by all task groups in the job. Task level affinities are combined together with task group level affinities.
Example:
job "docs" {
#Prefer m4.xlarge nodes
affinity {
attribute = "${attr.platform.aws.instance-type}"
value = "m4.xlarge"
weight = 100
}
group "example" {
#Prefer the "r1" rack
affinity {
attribute = "${meta.rack}"
value = "r1"
weight = 50
}
task "server" {
..
}
}
In the above example, the job has an affinity for "m4.xlarge"
node. This affinity will apply to all task groups in the job. The task group also has an affinity for a specific rack "r1"
. Nomad will add additional boosting factors to the score of nodes that match these two affinities so that they are preferred for placement. However if no nodes are found that match these affinities, the placement still succeeds.
Negative weights act like anti-affinities, and encourage the matching nodes to be avoided for placement by Nomad.
The affinity stanza is useful for use cases like preferring a specific class of nodes for workloads with specialized requirements. For more details and examples, refer to our affinity documentation.
» Conclusion
Nomad 0.9 introduces two new stanzas, spread and affinity, that allow for advanced placement strategies and increasing failure tolerance. With these features, Nomad 0.9 provides more fine grained control over workload placement to operators and job specification authors.
Sign up for the latest HashiCorp news
More blog posts like this one
Terraform Enterprise improves deployment flexibility with Nomad and OpenShift
Customers can now deploy Terraform Enterprise using Red Hat OpenShift or HashiCorp Nomad runtime platforms.
Nomad’s internal garbage collection and optimization discovery during the Nomad Bench project
A look into Nomad’s internal garbage collection process and the optimization discovered during the bench project.
New approaches to measuring Nomad performance
See how the HashiCorp Nomad team re-examined how to capture performance for a workload orchestrator, resulting in new metrics to better capture Nomad’s performance.