Cluster Scaling with the HashiCorp Nomad Autoscaler
Back in March, the HashiCorp Nomad team announced the tech preview release of our new project, the Nomad Autoscaler. It brought horizontal application autoscaling capabilities to your Nomad workloads, so you don’t have to worry about manually managing your task group count values anymore.
Today we are happy to announce the new release of the Nomad Autoscaler, which is now in beta.
The highlight of this release is the long-awaited horizontal cluster autoscaling capability. This feature allows you to automatically add or remove clients from your Nomad cluster as your load changes, with initial support for Autoscaling Groups in AWS. It’s built on top of the existing functionalities of the Nomad Autoscaler, so it’s easy to get started.
» Getting Started with Cluster Scaling
With horizontal application autoscaling, the scaling policy was defined in the jobspec itself, using the new scaling
block. With cluster scaling, we don’t have a specific job to attach our policy to, so we added the ability to load policies from files. You can specify a directory where your policies are located using the -policy-dir
flag or in the Nomad Autoscaler configuration file:
policy {
dir = "..."
}
Scaling policies files are written using the HCL syntax, the same used to write Nomad jobspecs. Here is an example cluster scaling policy:
enabled = true
min = 1
max = 10
policy {
cooldown = "2m"
evaluation_interval = "1m"
check "cpu_allocated_percentage" {
source = "prometheus"
query = "scalar(sum(nomad_client_allocated_cpu/(nomad_client_unallocated_cpu + nomad_client_allocated_cpu))/count(nomad_client_allocated_cpu))"
strategy "target-value" {
target = 70
}
}
check "mem_allocated_percentage" {
source = "nomad_apm"
query = "cpu_high-memory"
strategy "target-value" {
target = 70
}
}
target "aws-asg" {
dry-run = "false"
aws_asg_name = "hashistack-nomad_client"
node_class = "hashistack"
node_drain_deadline = "5m"
}
}
If you're familiar with Nomad's support for application scaling policy, this is similar to how a scaling
block would look in a jobspec. However, cluster autoscaling brings a few changes to scaling policy that are worth mentioning.
First, each policy can now have one or more check
blocks. Previously a policy could only look at a single metric value to make scaling decisions. This was very limiting for something that can be quite complex, such as deciding when to scale your cluster.
With multiple checks, you can now specify multiple queries targeted at retrieving the different metrics that are relevant to your infrastructure. The Nomad Autoscaler will run them and pick the result that is most appropriate for the current situation.
As before, you can use one of the available APM plugins to read your metrics from different sources. Currently we support Prometheus and native Nomad metrics. We are working on adding support for more sources, and external plugins can be easily deployed alongside the autoscaler.
The second important addition is the new aws-asg
target plugin which, as the name suggests, is used to interact with an Autoscaling Group on AWS. When scaling your cluster, Nomad Autoscaler will take care of the labor intensive process of draining clients and adding/removing servers from your AWS ASG.
» Trying it Out For Yourself
We prepared a demo so you can try autoscaling a cluster for yourself. It uses HashiCorp Terraform and Packer to provision the entire infrastructure on AWS, so it’s easy to follow along.
» Let Us Know What You Think
As of today, the Nomad Autoscaler is out of tech preview and into its Beta cycle. We are always happy to hear from our community, so if you have any questions, comments, feature requests or any other type of feedback feel free to file an issue or find us at our discussion forum.
Sign up for the latest HashiCorp news
More blog posts like this one
Terraform Enterprise improves deployment flexibility with Nomad and OpenShift
Customers can now deploy Terraform Enterprise using Red Hat OpenShift or HashiCorp Nomad runtime platforms.
Nomad’s internal garbage collection and optimization discovery during the Nomad Bench project
A look into Nomad’s internal garbage collection process and the optimization discovered during the bench project.
New approaches to measuring Nomad performance
See how the HashiCorp Nomad team re-examined how to capture performance for a workload orchestrator, resulting in new metrics to better capture Nomad’s performance.