HashiCorp Nomad 0.4

We've released Nomad 0.4. Nomad is a distributed, scalable and highly available cluster manager and scheduler designed for both microservice and batch workloads. Nomad 0.4 ships a number of new features focused on improving the operational aspects of the tool. Highlights include: Nomad Plan Live Resource Utilization Simpler Clustering

Mitchell Hashimoto

Nomad

Jun 28, 2016

Mitchell Hashimoto

We've released Nomad 0.4. Nomad is a distributed, scalable and highly available cluster manager and scheduler designed for both microservice and batch workloads.

Nomad 0.4 ships a number of new features focused on improving the operational aspects of the tool. Highlights include:

Nomad Plan
Live Resource Utilization
Simpler Clustering

»Nomad Plan

Nomad plan shows you the changes in your job and whether Nomad could allocate it or not onto your cluster. This lets you verify that your changes will make it into the system and that your job will be allocated properly.

Unlike Terraform, Nomad plan isn't a guarantee that allocation will succeed. Nomad plan won't reserve the resources for an upcoming change. It only checks that at that point in time the allocation would've succeeded. Operators should use this knowledge to make an informed decision on whether to run the job or not.

Nomad is a declarative system: you declare what to run and Nomad decides how to run it. You don't explicitly tell Nomad what server to run a job on, when to run it, etc. This is important for a cluster manager: it lets Nomad make efficient use of resources, automatically migrate workloads on failure, etc.

The downside, however, is not knowing how a job will behave when it is submitted. Operators are left wondering: are there enough resources to run this job? will this job update in-place? will this cause downtime or will it rolling update existing jobs? And so on.

nomad plan raises operational confidence by showing a point-in-time view of what Nomad would do. An example plan for an updated job is shown below:


$ nomad plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (3 create/destroy update)
  +/- Task: "redis" (forces create/destroy update)
    +/- Config {
        args[0]: "--port ${NOMAD_PORT_db}"
      + args[1]: "--loglevel verbose"
        command: "redis-server"
    }

Scheduler dry-run:
- All tasks successfully allocated.
- Rolling update, next evaluation will be in 10s.

Job Modify Index: 7

The output shows the changes to the job (in this case adding a new argument to the command). It shows you that the change in that task is forcing a create/destroy update of the job.

Finally, at the bottom, you can see a "scheduler dry-run." This shows that the job allocated successful and would be deployed as part of a rolling update policy on 10 second intervals.

Nomad plan can be used for new or existing jobs. It will never modify the state of your cluster so it is a safe operation to run. You can read more about nomad plan in the documentation.

»Live Resource Utilization

Nomad can now report the actual resource utilization for tasks, nodes, and allocations.

All jobs in Nomad must declare how many resources they require: compute, memory, network, etc. It is often difficult to know what to request initially, so we recommend over-allocating resources initially and then tuning it down as you determine the actual resource usage. Determining this actual resource usage has until now been very difficult. With Nomad 0.4, you can easily inspect the resource usage for a job, task, or node.

The example below shows the resource usage for a task:


$ nomad alloc-status abcd1234
Task: "www"
CPU     Memory MB  Disk MB  IOPS  Addresses
100/250   212/256     300      0

In Nomad 0.3 and earlier, the CPU and memory would simply match the requested values in the job specification. With Nomad 0.4 the values shown are actual, live values from the allocation.

Nodes can show even more detailed information:


$ nomad node-status -stats abcd1234
...

Detailed CPU Stats
CPU    = cpu0
User   = 1.03%
System = 0.00%
Idle   = 98.97%

CPU    = cpu1
User   = 1.00%
System = 2.00%
Idle   = 93.00%

Detailed Memory Stats
Total     = 2.1 GB
Available = 1.9 GB
Used      = 227 MB
Free      = 1.4 GB

Detailed Disk Stats
Device         = /dev/mapper/ubuntu--1404--vbox--vg-root
MountPoint     = /
Size           = 41 GB
Used           = 3.4 GB
Available      = 36 GB
Used Percent   = 8.14%
Inodes Percent = 4.94%

All of this data is up to date and can be used by operators to better tune jobs, find poorly behaving applications, etc.

»Simpler Clustering

Nomad 0.4 brings two key changes that simplify creating and operating a cluster. Cluster creation is now automatic when using Consul. Nomad servers and clients auto-register services and health checks with Consul and these are then used to discover other Nomad servers. Given a federated Consul deployment, Nomad servers will even automatically federate cross regions!

To improve cluster stability as well as to simplify updating servers, Nomad servers now advertise the full set of Nomad servers to clients via heartbeats. These heartbeats occur roughly every 30 seconds, providing each client with a current view of the Nomad servers in its region. This allows Nomad servers to be immutably upgraded without any configuration change on the clients.

Previously, if you upgraded Nomad servers by bringing up new servers and deprecating the old after Raft replication occured, you would have to update all the client configurations to point to the new server addresses. This upgrade process was burdensome and error prone.

In Nomad 0.4, servers notify clients of the full list of servers and thus allow you to roll the Nomad servers without any configuration change to any client.

»Conclusion

Nomad is still a very young project, but it has been exciting to view the adoption grow and Nomad running production workloads. Nomad 0.3 brought million container scalability to Nomad as well as some big features. For Nomad 0.4, we chose to focus on features for improving operational confidence.

Moving forward, we have some big features planned. Native Vault integration, increased Consul integration, and more. Exact roadmap details will emerge nearer to releases.

Go to Nomad to learn more.