Consul Auto-Join with Cloud Metadata

Consul Auto-Join with Cloud Metadata

Mar 28 2017    Nic Jackson

We work in a world of distributed systems which operate in rapidly changing environments. Servers come and go, they move across region and distribution groups, and somehow they need to communicate and connect to one another.

To solve this problem, HashiCorp created Consul, which among many other things enabled service registry and service discovery. Application instances register themselves with Consul, and dependent instances query Consul to discover each other. Since Consul itself is a distributed system, this creates a chicken-and-egg problem - how do you boostrap your service discovery.

Automation Challenges

How do you discover your service discovery? Traditionally this has been a challenge for distributed systems. The technique often involves spinning up a cluster in one operation and then performing a second operation once the IP addresses are known to join the nodes together. This two-step approach not only makes automation challenging, but also raises questions about the behavior of the system when losing a node. Autoscaling could bring another node online, but an operator would still need to manually join the node to the cluster.

Consul Auto-Join for EC2

Consul 0.7.1 introduced new functionality which allows it to discover other agents using cloud metadata. This blog post explores leveraging AWS metadata to auto-join and auto scale a Consul cluster.

The latest documentation for Consul shows new options we can specify in the Consul configuration file or startup parameters.

  • -retry-join-ec2-tag-key - The Amazon EC2 instance tag key to filter on. When used with -retry-join-ec2-tag-value, Consul will attempt to join EC2 instances with the given tag key and value on startup.
  • -retry-join-ec2-tag-value - The Amazon EC2 instance tag value to filter on.
  • -retry-join-ec2-region - (Optional) The Amazon EC2 region to use. If not specified, Consul will use the local instance's EC2 metadata endpoint to discover the region.

The new feature requires permission to read the AWS instance state, and there are a variety of options available to grant these permissions.

  • Static credentials (from the config file)
  • Environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY)
  • Shared credentials file (~/.aws/credentials or the path specified by AWS_SHARED_CREDENTIALS_FILE)
  • ECS task role metadata (container-specific)
  • EC2 instance role metadata

The startup process for the AWS instance is as follows:’

  • The instance bootstraps and installs consul
  • Init system starts consul with the configuration to join via EC2 metadata
  • On start, consul queries the EC2 metadata service with ec2:DescribeInstances to list all instance tags
  • Consul extracts the private IP addresses of other EC2 instances which have the configured tag name and tag value from the metadata
  • Consul runs consul join on those private IP addresses

Graphic of flow between EC2 metadata and consul instances

The method we are using in this example is the EC2 role metadata. By assigning the ec2:DescribeInstances permission to the instances IAM role, we can give Consul this permission without leaking any other control over your AWS account.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "ec2:DescribeInstances", "Resource": "*" } ] }

Auto-Joining in Action

The repository at https://github.com/hashicorp/consul-ec2-auto-join-example includes a Terraform configuration to demonstrate this functionality. To start and bootstrap the cluster modify the file terraform.tfvars to add your AWS credentials and default region and then run terraform plan, terraform apply to create the cluster.

aws_region = "eu-west-1"

aws_access_key = "[AWS_ACCESS_KEY]"

aws_secret_key = "[AWS_SECRET]"

Once this is all up and running, you will see some output from Terraform showing the IP addresses of the created agents and servers.

Outputs:

clients = [ 34.253.136.132, 34.252.238.49 ] servers = [ 34.251.206.78, 34.249.242.227, 34.253.133.165 ]

After provisioning, it is possible to login to one of the client nodes via SSH using the IP address output from Terraform.

$ ssh ubuntu@34.251.206.78

The cluster should be auto-joined, since the instances share the same auto-join tag value.

Consul Aut0-Join AWS List of Tags

Running the consul members command will show all members of the cluster and their status (both clients and servers).

$ consul members Node Address Status Type Build Protocol DC consul-blog-client-0 10.1.1.189:8301 alive client 0.7.5 2 dc1 consul-blog-client-1 10.1.2.187:8301 alive client 0.7.5 2 dc1 consul-blog-server-0 10.1.1.241:8301 alive server 0.7.5 2 dc1 consul-blog-server-1 10.1.2.24:8301 alive server 0.7.5 2 dc1 consul-blog-server-2 10.1.1.26:8301 alive server 0.7.5 2 dc1

This cluster automatically bootstrapped with no human intervention, but what about failure scenarios?

Without the auto-join functionality, scaling Consul servers can be challenging and often involves operator participation. With the new auto-join functionality, scaling (up or down) is incredibly easy. It is so easy, that we do not have to do anything. To demonstrate this, edit the terraform.tfvars file and increase the number of instances to 5 and re-run terraform plan and terraform apply.

$ terraform plan Plan: 2 to add, 0 to change, 0 to destroy.

$ terraform apply Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path below. This state is required to modify and destroy your infrastructure, so keep it safe. To inspect the complete state use the terraform show command.

State path: terraform.tfstate

Outputs:

clients = [ 34.253.136.132, 34.252.238.49 ] servers = [ 34.251.206.78, 34.249.242.227, 34.253.133.165, 34.252.132.0, 34.253.148.148 ]

Run consul members again after the new servers have finished provisioning. It might take a few seconds for the new servers to join the cluster, but they will be available in the memberlist:

Node Address Status Type Build Protocol DC consul-blog-client-0 10.1.1.189:8301 alive client 0.7.5 2 dc1 consul-blog-client-1 10.1.2.187:8301 alive client 0.7.5 2 dc1 consul-blog-server-0 10.1.1.241:8301 alive server 0.7.5 2 dc1 consul-blog-server-1 10.1.2.24:8301 alive server 0.7.5 2 dc1 consul-blog-server-2 10.1.1.26:8301 alive server 0.7.5 2 dc1 consul-blog-server-3 10.1.2.44:8301 alive server 0.7.5 2 dc1 consul-blog-server-4 10.1.1.75:8301 alive server 0.7.5 2 dc1

The same applies when scaling down - there is no need to manually remove nodes, so long as we stay above the originally-configured minimum number of servers (3 in this example). To demonstrate this functionality, decrease the number of servers in the terraform.tfvars file and run terraform plan and terraform apply again. The deprovisioned server nodes will show in the members list as failed, but the cluster will be fully operational.

Node Address Status Type Build Protocol DC consul-blog-client-0 10.1.1.189:8301 alive client 0.7.5 2 dc1 consul-blog-client-1 10.1.2.187:8301 alive client 0.7.5 2 dc1 consul-blog-server-0 10.1.1.241:8301 alive server 0.7.5 2 dc1 consul-blog-server-1 10.1.2.24:8301 alive server 0.7.5 2 dc1 consul-blog-server-2 10.1.1.26:8301 alive server 0.7.5 2 dc1 consul-blog-server-3 10.1.2.44:8301 failed server 0.7.5 2 dc1 consul-blog-server-4 10.1.1.75:8301 failed server 0.7.5 2 dc1

Summary

The Consul EC2 auto-join functionality enables seamless bootstrapping and auto-scaling of Consul clusters by leveraging cloud metadata. This post shows the functionality using AWS EC2, but the same functionality is also available for Google Cloud, and Consul's roadmap includes adding support for additional cloud providers in the future. We hope you enjoy this new functionality and look forward to future improvements.

close modal

Request a Demo

Fill out the form below and we'll reach out to discuss a product demo.

check mark
check mark
check mark
check mark
Select an option
  • Select one
  • Terraform
  • Nomad
  • Vault
  • Consul
Trusted by
  • Adobe Logo
  • Barclays Logo
  • Cisco Logo
  • Citadel Logo
  • DigitalOcean Logo
  • Hewlett Packard Enterprise Logo
  • SAP Arabia Logo
  • New Relic Logo
  • Pinterest Logo
  • Segment Logo
  • Spaceflight Logo
  • Stripe Logo