HashiCorp Nomad 0.3

We are proud to announce the release of Nomad 0.3. Nomad is a distributed, scalable and highly available cluster manager and scheduler designed for both microservice and batch workloads. This release introduces new features, hardens core components and improves UX across the board on the path towards ensuring Nomad is ready to run in production. Feature highlights include: Periodic Jobs Log Rotation and File System APIs Job Queues Please see the full Nomad 0.3 CHANGELOG for more details. Download Nomad 0.3 here or read on to learn more about the major new features and improvements in Nomad 0.3.

Diptanu Choudhury

Nomad

Feb 25, 2016

Diptanu Choudhury

We are proud to announce the release of Nomad 0.3. Nomad is a distributed, scalable and highly available cluster manager and scheduler designed for both microservice and batch workloads.

This release introduces new features, hardens core components and improves UX across the board on the path towards ensuring Nomad is ready to run in production. Feature highlights include:

Please see the full Nomad 0.3 CHANGELOG for more details.

Download Nomad 0.3 here or read on to learn more about the major new features and improvements in Nomad 0.3.

»Periodic Jobs

Nomad 0.3 introduces periodic jobs which allow users to run batch jobs periodically based on cron expressions. Periodic jobs can be used in most environments to run a variety of workloads such as back-ups or ETLs. This feature has been heavily requested by the community since the introduction of Nomad last year.

Nomad provides a distributed and fault tolerant environment for running periodic jobs whereas crontab based execution systems suffer from availability issues of single, ephemeral nodes.

The following example shows all that is needed to make a batch Nomad job run periodic, every 15 minutes:

job "backup" {
    ...
    periodic {
        cron = "*/15 * * * * *"
    }
    ...
}

The nomad status shows the past invocations of periodic jobs and the timestamp of the next invocation.

$ nomad status backup
ID                   = backup
Name                 = backup
Type                 = batch
Priority             = 50
Datacenters          = dc1
Status               = running
Periodic             = true
Next Periodic Launch = 2016-02-25 00:40:00 +0000 UTC

Previously launched jobs:
ID                           Status
backup/periodic-1456359780  dead
backup/periodic-1456359840  dead
backup/periodic-1456360020  running

$ nomad status backup/periodic-1456360020
ID          = backup/periodic-1456360020
Name        = backup/periodic-1456360020
Type        = batch
Priority    = 50
Datacenters = dc1
Status      = running
Periodic    = false

==> Evaluations
ID        Priority  Triggered By  Status
d451a894  50        periodic-job  complete

==> Allocations
ID        Eval ID   Node ID   Task Group  Desired  Status
b763c7b8  d451a894  d551531b  cache       run      running

»Log Rotation and File System APIs

Log management is a criticial component for managing and debugging applications in production. Nomad 0.3 solves two key aspects of logging:

Rotation of stdout and stderr log files.
Access to logs without the need to ssh onto the host machines.

Nomad 0.3 provides a log rotation configuration per task. This configuration allows users to control the size and retention of their logs. For example:

logs {
   max_files = 5
   max_file_size = 10
}

The above configuration would retain five log files for both stderr and stdout, rotating once a file has reached a size of 10 MBs. As the application writes more logs, older log files are purged, limiting the required storage needed for logs.

Along with application logs configuration, Nomad 0.3 introduces a new family of commands for viewing a task's file system. This can be used to view the logs or any other file.

For example, to view the files of a redis task running in allocation "c5598dc8":

$ nomad fs ls c5598dc8 alloc/logs/
Mode        Size    Modfied Time           Name
-rw-r--r--  0 B     24/02/16 23:19:18 UTC  redis.stderr.0
-rw-r--r--  10 MB   24/02/16 23:22:54 UTC  redis.stdout.3
-rw-r--r--  10 MB   24/02/16 23:22:54 UTC  redis.stdout.4
-rw-r--r--  10 MB   24/02/16 23:22:54 UTC  redis.stdout.5
-rw-r--r--  10 MB   24/02/16 23:22:54 UTC  redis.stdout.6
-rw-r--r--  6.0 MB  24/02/16 23:22:54 UTC  redis.stdout.7

This output indicates that the Nomad client purged the files redis.stdout.0, redis.stdout.1, redis.stdout.2 and retained the last 5 recent log files.

To view one of the log files the following command can be used:

$ nomad fs cat c5598dc8 alloc/logs/redis.stdout.7
<LOG CONTENT>

Nomad's roadmap includes enhancements to the logging subsystem to support both streaming logs and remote log sinks.

»Job Queues

Job queues allow users to schedule jobs even when all resources are exhausted in the cluster. Once resources are added or become available, Nomad will re-evaluate and run the job.

In the example below demonstrates a job that requires more resources than are currently available. This causes Nomad to create a "blocked" evaluation that will be processed when resource conditions change.

$ nomad run redis-cache.nomad
==> Monitoring evaluation "b58210a7"
    Evaluation triggered by job "redis-cache"
    Scheduling error for group "cache" (failed to find a node for placement)
    Allocation "eb78c6f3" status "failed" (0/1 nodes filtered)
      * Resources exhausted on 1 nodes
      * Dimension "cpu exhausted" exhausted on 1 nodes
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b58210a7" finished with status "complete"

$ nomad status redis-cache
ID          = redis-cache
Name        = redis-cache
Type        = service
Priority    = 50
Datacenters = dc1
Status      = pending
Periodic    = false

==> Evaluations
ID        Priority  Triggered By  Status
05263d94  50        job-register  blocked
b58210a7  50        job-register  complete

==> Allocations
ID        Eval ID   Node ID  Task Group  Desired  Status
eb78c6f3  b58210a7  <none>   cache       failed   failed

Nomad created a blocked evaluation because the cluster didn't have enough CPU resources to run that job. Once resources were freed up the blocked evaluation triggered the scheduler to create a new allocation and run the job.

$ nomad status redis-cache
ID          = redis-cache
Name        = redis-cache
Type        = service
Priority    = 50
Datacenters = dc1
Status      = pending
Periodic    = false

==> Evaluations
ID        Priority  Triggered By  Status
05263d94  50        job-register  complete
b58210a7  50        job-register  complete

==> Allocations
ID        Eval ID   Node ID  Task Group  Desired  Status
eb79c6fg  05263d94  d551531b cache       running  running
eb78c6f3  b58210a7  <none>   cache       failed   failed

Engineering effort was focused to make job queues extremely efficient. As a side effect of this work, the scheduler is significantly faster and more performant.

We will be highlighting the amazing performance improvements we have made in Nomad in an upcoming blog post.

»Upgrade Details

Nomad 0.3 has significant changes that must be understood before upgrading. Nomad's documentation provides upgrade instructions from version 0.2.3.

»Roadmap

Features that are currently planned for the next major release of Nomad are:

Support for persistent volumes across all supported drivers.
Support for multiple network interfaces and more flexible IP allocation schemes.
Enhancements to the logging subsystem to support streaming logs and remote sinks.

As always, we recommend upgrading and testing this release in an isolated environment. Please report any issues on GitHub.