Products & Technology

Nomad 1.7 improves Vault and Consul integrations, adds NUMA support

HashiCorp Nomad 1.7 — now generally available — introduces improved workload identity, improved Vault and Consul integrations, NUMA support, Nomad actions, and more.

Dec 07 2023Mike Nomitch

HashiCorp Nomad is a simple and flexible orchestrator used to deploy and manage containers and non-containerized applications across multiple cloud, on-premises, and edge environments. Today, we are excited to announce that Nomad 1.7 is now generally available.

Here’s what’s new in Nomad:

Improved workload identity
HashiCorp Vault integration improvements
HashiCorp Consul integration improvements
Vault and Consul setup helpers
Multi-cluster Vault and Consul (Enterprise)
Non-uniform memory access (NUMA) support (Enterprise)
Nomad actions
ACL roles and tokens UI
Distributed locks
High-availability autoscaler

»Improved workload identity

For several releases, Nomad has included workload identity tokens in tasks when users include identity blocks in their jobs.

identity {
  name = "aws"
  aud = ["aws"]
  file = true
  env = true
  ttl = "30m"
}

For each block added to a task, Nomad will mint a signed JSON Web Token (JWT) that declares information about the identity of a task. This includes the job ID/name, the task name, the group name, the allocation ID, and custom audience values specified in the block.

These JWTs can be accessed via a file or environment variables and will expire based on the time-to-live (TTL) defined in the job. Identity blocks also include a changemode attribute to define how to handle updated identity tokens after expiry.

In previous versions of Nomad, this identity token was used for authentication within Nomad. In Nomad 1.7, these identity tokens can now be used by third parties to authenticate the identity of Nomad tasks.

Nomad will act as an OIDC provider, allowing OIDC consumers to verify workload identity tokens against a public signing key. This means that other tools such as Vault and Consul and cloud providers such as AWS, Azure, and Google Cloud can all accept Nomad workload identity tokens, validate their authenticity, and return credentials granting specific permissions depending on task ID.

Nomad as an OIDC provider - workload identity

See the following video, for a very quick example of this working in AWS:

This enhancement for identity tokens allows for dynamic credentials to be minted for Nomad tasks. The result is improved security and simplified management of integrations with Nomad workloads.

Using static credentials to authenticate Nomad tasks presents a security risk, even if you rotate your credentials regularly. Dynamic credentials improve your security posture by letting you provision new, temporary credentials for each workload and renew them at regular intervals.

Nomad administrators and developers won’t have to manage as many static tokens in Vault or Nomad variables. They also won’t be tempted to hardcode tokens into Nomad jobspecs.

Lastly, workload identity tokens greatly improve the simplicity and security of integrations with other HashiCorp products, closing key security gaps in your threat model.

»Vault integration improvements

Nomad 1.7 introduces a new Vault integration process that makes it easier for users to set up and manage Vault. Nomad 1.7+ users will no longer have to manage Vault ACL tokens when submitting Nomad jobs or provisioning Nomad clients. Instead, Nomad will automatically manage Vault tokens for the user using workload identity based authorization.

In previous versions of Nomad, users would submit Vault tokens with Nomad jobs:

$ nomad run -vault-token=<REDACTED> my-job.hcl

Now, just submit a job that works with Vault, and Nomad will automatically authenticate into Vault using identity-based policies instead of manually provided tokens. This is both more secure and easier to manage.

$ nomad run my-job.hcl

Additionally, Vault tokens are no longer required for setting up a Nomad server agent or Nomad client agent. In fact, Nomad servers don't have to communicate directly with Vault at all, reducing networking constraints.

Nomad also now includes support for templated policies in Vault. Nomad information including task, group, job, and namespace can be interpolated into a Vault policy. This means that with just a single policy, you can potentially support every Nomad job.

For instance, the following Vault policy snippet would give every Nomad job read access to a unique path based on its namespace and job name:

path "secret/data/{{identity.entity.aliases.auth_jwt_X.metadata.nomad_namespace}}/{{identity.entity.aliases.auth_jwt_X.metadata.nomad_job_id}}/*" {
  capabilities = ["read"]
}

To learn about our revamped Vault integration process, see our new Vault integration documentation and read the “Vault and Consul setup helpers” section further down in this post.

Lastly, users can configure their Nomad integration to use batch tokens instead of regular Vault tokens. Batch tokens reduce Vault load in large-scale deployments with numerous allocations. For more information, read our full explanation of batch tokens vs service tokens.

Our legacy Vault integration, using explicitly defined Vault tokens, is deprecated in Nomad 1.7 and is slated for removal in Nomad 1.9.

»Consul integration improvements

Nomad 1.7 also includes an improved integration with Consul based on workload identity tokens. Users of the new Consul integration will no longer have to provide Consul ACL tokens to Nomad when submitting jobs. This reduces the overall toil involved with setting up and maintaining Consul with Nomad.

In previous versions of Nomad, users would submit Consul tokens with Nomad jobs:

$ nomad run -consul-token=<REDACTED> my_job.hcl

Now, once Nomad and Consul are initially configured, end users can submit a job without any manual token management:

$ nomad run my_job.hcl

If Nomad agents are relying on Consul for discovering other Nomad agents in a cluster, a Consul token can still be passed into the Nomad agent configuration, but it is only used for this purpose. As such, the Consul ACL tokens in Nomad agents can be more finely scoped for just these permissions.

Overall, these changes will make it simpler for Nomad administrators and end users to integrate with Consul and reduce any overhead or security risk associated with static token management. To learn more, see our new Consul integration documentation and read the “Vault and Consul setup helpers” section further down in this post.

Our legacy Consul integration — using explicitly defined Consul tokens for Consul service and key-value (KV) storage management — is deprecated in Nomad 1.7 and is slated for removal in Nomad 1.9.

»Vault and Consul setup helpers

With the improved Vault and Consul integrations, the long-term overhead of using these tools goes down, but initial setup still involves writing authentication rules and ACL policies in Vault and Consul. In order to streamline this process, we’ve provided several methods to configure the integrations.

For initial testing, learning, or development, Nomad 1.7 GA will include new nomad setup commands in the Nomad CLI. These commands are the simplest way to try out the new integrations, but provide the least customization.

The Vault setup command will:

Enable JWT authentication on Vault
Optionally create a namespace for Nomad workloads
Set up a templated policy for Nomad workloads to access specific paths
Set up an auth role allowing Nomad-minted identity tokens to log in under the new policy
Output Nomad agent configuration changes to properly integrate with Vault

Just run one command:

$ nomad setup vault

Reload your Nomad agent with config supplied by the command, then run a Nomad job that uses Vault:

$ nomad run my-vault-enabled-job.nomad.hcl

The Consul setup command works similarly, setting up everything Consul needs for Nomad jobs to authenticate to Consul using workload identities for service and KV storage.

For more declarative and production-ready setups, we are also providing Terraform modules for Vault integration setup and Consul integration setup. These modules allow further customization beyond what the built-in setup command provides.

Lastly, users can customize their own setup manually by creating the necessary resources via the command line, via API, or via Terraform. See our Vault integration and Consul integration documentation for more information.

»Multi-cluster Vault and Consul (Enterprise)

Nomad users on the Enterprise Platform Package can also integrate with multiple Vault clusters or Consul clusters using a single Nomad cluster. Previously, there was a one-to-one relationship between Nomad and these other tools.

Nomad administrators can now define multiple integrations in the vault and consul portions of the Nomad agent configuration. Then, Nomad jobspec writers can pick which Consul or Vault cluster to use in their job. Nomad administrators can also set default clusters for each in Nomad namespace configuration, as well as deny access to certain clusters by namespace.

Let’s take a look at using multiple Vault clusters as an example. First, in the Nomad agent config, add two named vault blocks, one of which is the default cluster. Note, these blocks do not include Vault tokens since these are now unnecessary.

vault {
  name = "default"
  default_cluster = true
  address = "https://vault.company.internal:8200"
 
  # remaining configuration unchanged
}
 
vault {
  name = "financial"
  default_cluster = false
  address = "https://vault.company.internal:8201"
 
  # remaining configuration unchanged
}

Nomad namespace configuration can now include information about which Vault cluster to use by default along with allow/deny lists.

name        = "finance-team"
description = "The namespace used by the finance team"
 
vault {
  default = "financial"
  # allowed = ["financial"]
  denied = ["default"]
}

For namespaces with multiple Vaults, users can opt into non-default Vault clusters using the cluster value in the vault block.

job "example" {
  datacenters = ["*"]
 
  vault {
    cluster = "financial"
  }
 
  …etc…
}

Multiple Consul clusters can be supported and configured in a similar manner. See our documentation on agent configuration, namespace configuration, job group configuration, and service configuration for details.

We believe these changes will enable Nomad to support a wider variety of architectures and deployments. Additionally, users who are in the process of rearchitecting their HashiCorp stack deployments or moving to HCP Consul or HCP Vault should be able to more iteratively modify their stacks.

»Non-uniform memory access support (Enterprise)

Since Nomad 1.1, Nomad users have been able to dedicate whole CPU cores to specific tasks by using the cores attribute in the resources block. With Nomad Enterprise 1.7, you can now include non-uniform memory access (NUMA) information in these scheduling decisions.

NUMA node communication before Nomad 1.7 and after.

Previously, if multiple processes from a single Nomad task were put on cores in different NUMA nodes, inter-process communication might have to cross NUMA nodes and take a performance hit accessing memory in a different socket. Now, users can add a numa block alongside cores to tell the Nomad scheduler to either prefer or require cores in the same NUMA node.

resources {
  cores = 8
  numa {
    affinity = "require"
  }
}

If the require value is used, Nomad will reject any placement options that do not have the specified number of cores free on the same NUMA node. If prefer is used, Nomad will attempt to place cores together, but will not guarantee this placement.

For multi-core, latency-sensitive workloads, NUMA-aware scheduling can greatly increase the performance of your Nomad tasks. For more information, see the Nomad CPU concepts documentation.

»Nomad actions

Nomad allows users to execute commands in the context of an allocation using the nomad alloc exec command. This can be helpful in one-off scenarios when you need to debug or run ad hoc commands within a container or alongside an application.

Many Nomad users run the exec command to repeatedly do the same tasks. These users have to remember the commands they want to run and risk accidentally inputting the wrong command, sometimes in a highly privileged context. For these repeated tasks, Nomad actions now offers a simpler and safer user experience.

Nomad actions allow job writers to define a named command that can later be executed within the context of an allocation. This makes it easy for job writers to codify, share, and execute repeated tasks in Nomad.

Include the new action block in a jobspec’s task to define a command and arguments:

action "migrate" {
  command = "rake"
  args = ["db:migrate"]
}
 
action "rollback" {
  command = "rake"
  args = ["db:rollback"]
}

Then, once the job is up and running, any users with exec permissions can execute the action from the CLI, providing either an allocation ID or task and group names:

$ nomad action -alloc=a4nd1k -job=rails-app migrate
 
$ nomad action -group=core -task=app -job=rails-app rollback

They can execute actions from the UI as well:

Users can run a predefined action on single allocations or allow Nomad to pick an allocation at random.

»ACL roles and tokens UI

The Nomad UI now includes two new sections related to access control, roles, and tokens.

ACL tokens have long been a part of Nomad, but until Nomad 1.7, Nomad administrators would have to manage tokens through the CLI or API. Now the Nomad UI includes pages to view, manage, and create ACL tokens. Users can list all existing tokens, disable tokens, and create new tokens from new dedicated pages.

ACL roles were added in Nomad 1.4 as an easier way of managing ACL policies. A role is a named set of policies that can be attached to an ACL token. The Nomad UI now also includes an index and details page for ACL roles. Making it easier to create, manage, and delete ACL roles through one interface.

»Distributed locks

Some Nomad workloads run with a single leader instance and one or many followers. One of the difficulties of dynamically running these workloads is synchronizing which instance of a workload should be the leader. Traditionally, Nomad users have relied on Consul’s distributed locks or a third party datastore to act as a source of truth for which application is a leader.

Nomad 1.7 includes a built-in mechanism for distributed locks using Nomad variables. Locks can be created and claimed using the API, the CLI, or with the Nomad API golang library. Let’s look at a quick example using the CLI and API.

First, create a lock using the nomad var lock command. This will create a lock at a specific variable path, periodically renew the lock, and run a given script. In this case, the script will simply run for 30 seconds and log each second.

$ nomad var lock -verbose demo ./sleep-and-log.sh
Writing to path "demo"
Using defaults for the lock
Attempting to acquire lock
Variable locked, ready to execute: ./sleep.sh
starting...
1
2
3
4
…

If you run the same command in another tab, it will recognize that the lock is currently taken by another process and it will wait until the original process finishes before running.

$ nomad var lock -verbose demo ./sleep-and-log.sh
Writing to path "demo"
Using defaults for the lock
Attempting to acquire lock

This CLI command provides a simple way to coordinate locking behavior for simple scripts across any Nomad-connected machine.

While the lock is in use, the Nomad variable will be marked with a “Lock” response.

$ nomad operator api /v1/var/demo | jq
{
  "CreateIndex": 50,
  "CreateTime": 1698437371232685000,
  "Lock": {
	"TTL": "15s",
	"LockDelay": "15s",
	"ID": "a7e677f7-5a2c-9236-27d1-2aaafb333663"
  },
  "ModifyIndex": 69,
  "ModifyTime": 1698437877117772000,
  "Namespace": "default",
  "Path": "demo"
}

Apart from the CLI tool and API, the Nomad team is exploring adding distributed lock helpers directly into the Nomad jobspec to help enable leadership election. Feedback on this proposal is welcome on this GitHub issue.

This behavior allows applications on Nomad to perform complex locking behavior and leadership coordination using Nomad as a source of truth. One such application now using this behavior is the Nomad Autoscaler.

»High-availability autoscaler

The Nomad Autoscaler now includes a high-availability (HA) mode which allows multiple autoscaler instances to run in a cluster at once. A single instance will be elected leader while other follower instances stand by. If a leader instance fails, one of the follower instances will assume leadership and start making scaling decisions. The high-availability autoscaler ensures that a single node or allocation going down doesn’t affect critical autoscaling operations.

In order to launch the autoscaler in HA mode, you must be using version 0.4+ of the Nomad Autoscaler, be on Nomad 1.7, and allow variable write access for the autoscaler job. See the autoscaler documentation to learn more.

»More Nomad updates

Outside of these core improvements, new additions in Nomad 1.7 and in minor releases since Nomad 1.6 was released include:

CSI volumes can now be expanded by changing the capacity_min value of a volume.
Templates can be explicitly rerendered on task restart using the new render_templates field in the job restart block.
Added crons field for multiple cron expressions in a periodic job.
Support for destination_peer, destination_type, local_bind_socker_path, and local_bind_socker_mode added to Consul service mesh upstream config.
New transfer leadership command added to more easily switch raft leader nodes.
Wildcards are now supported in the -namespace in nomad alloc commands.
Existing job information is now available in Sentinel policy checks in Nomad Enterprise.
Added unofficial s390X builds to Nomad Enterprise releases.
Added a new Variable tab to job details page in the Nomad UI.
Added support for Unix domain sockets in the Nomad API package.

»Community updates

If you’re familiar with Go or interested in learning/honing your Golang skills, we invite you to join the group of Nomad contributors helping to improve the product.

Looking for a place to start? Head to the Nomad contribute page for a curated list of good first issues. If you’re a returning Nomad contributor looking for an interesting problem to tackle, take a glance at issues labeled “help-wanted” or “good first issue”. For help getting started, check out the Nomad contributing documentation or comment directly on the issue with any questions you have. Community members can also contribute integrations to Nomad or to the Nomad Pack Community Registry.

We also encourage users to go to the official Nomad Community Forums or join us for community office hours if they have Nomad questions or feedback. We also would like to recognize some of our community members for creating unofficial spaces for Nomad users to connect. Thank you to the communities on Gitter and the HashiCorp Community Discord.

»Getting started with Nomad 1.7

We encourage you to try out the new features in Nomad 1.7:

Download Nomad 1.7 from the project website.
Learn more about Nomad with tutorials on the HashiCorp Developer site.
Contribute to Nomad by submitting a pull request for a GitHub issue with the “help wanted” or “good first issue” label.
Participate in our community forums, office hours, and other events.