HashiCorp Consul 0.6

HashiCorp Consul 0.6

Dec 07 2015 James Phillips

We are excited to release Consul 0.6, a major update with many new features and improvements. Consul is a modern datacenter runtime that provides service discovery, configuration, and orchestration capabilities in an easy-to-deploy Go binary. It is distributed, highly available, and proven to scale to tens of thousands of nodes with services across multiple datacenters.

This release has taken a few months to prepare as it involved core changes like migrating to a totally new in-memory database and adding a new network tomography subsystem that spans several internal layers. Despite all the major changes, we've worked hard to make upgrading from Consul 0.5.2 a standard upgrade that will usually just require an agent restart with the new binary.

There are a huge number of features, bug fixes, and improvements in Consul 0.6. Consul is also now 100% pure Go, making it even easier to build and deploy. Here are some of the feature highlights:

You can download Consul 0.6 here or view the changelog

Read on to learn more about the major new features in 0.6.

» Get the estimated RTT between two other nodes from a third node

$ consul rtt nyc3-server-1 nyc3-server-3 Estimated nyc3-server-1 <-> nyc3-server-3 rtt: 1.210 ms (using LAN coordinates)

» Get the estimated RTT from current server to one in another datacenter

$ consul rtt -wan sfo1-server-1.sfo1 Estimated sfo1-server-1.sfo1 <-> nyc3-server-2.nyc3 rtt: 79.291 ms (using WAN coordinates)

Note in the last command above that network coordinates are also available for the servers in the WAN pool. This is useful for applications such as failing over to the next closest datacenter if no services are available in the local datacenter.

Consul 0.6 adds a ?near=

» Get the full list of nodes sorted relative to a specific node by RTT

$ curl localhost:8500/v1/catalog/nodes?near=nyc3-server-2

Finally, Consul 0.6 exposes a new HTTP API for raw network coordinates for use in any external application, allowing for powerful new placement, failover, and other types of RTT-aware algorithms. Here's an example showing what a network coordinate looks like:

$ curl localhost:8500/v1/coordinate/nodes?pretty [ { "Node": "nyc3-server-1", "Coord": { "Vec": [ 0.01566849390443588, -0.041767933427767884, -0.039999165076651244, -0.03846615252705054, -0.028208175963693814, 0.021521109674738765, -0.03259531251031965, 0.0392771535734132 ], "Error": 0.2093820069437395, "Adjustment": 0.00038434248982906666, "Height": 1.8735554114160737e-05 } }, ...

Even though it looks like there are a lot components in a network coordinate, calculating an RTT estimate is very simple. You can find out how these calculations are performed, and more details about the techniques Consul uses to improve the accuracy of its RTT estimates by reading the Consul and Serf internals guides on network coordinates.

New In-Memory Database

Previous versions of Consul used LMDB for the in-memory database that backs all of a Consul server's data such as the catalog, key/value store, and sessions. LMDB performed well and was very stable, but by moving to a new in-memory database based on radix trees, we were able to tailor this core component exactly to Consul's needs, bringing several important benefits in Consul 0.6.

The design of the database allows data to be directly referenced in the radix tree, eliminating the need to deserialize structures during read queries. This dramatically improves the read throughput of Consul's servers, and lowers garbage collection pressure. Direct references, combined with the use of Go 1.5.1 with its improved garbage collection performance, helps make Consul servers more stable under heavy read loads that are common in large, busy clusters.

Another benefit is that LMDB was the last cgo dependency for Consul, so removing it makes Consul 100% pure Go code which simplifies the build process and makes it easier to build Consul for new platforms.

Finally, this new database provides a good foundation for improving the granularity of blocking queries in a future version of Consul. This can be added with support from the database itself, and will also exploit the radix tree structure.

{"ID":"5e1e24e5-1329-f86f-18c6-3d3734edb2cd"}

Once this prepared query is registered with Consul, a DNS lookup for redis-with-failover.query.consul will execute the following steps:

  1. Look for healthy instances of the "redis" service in the local datacenter

  2. Filter those to instances that have the required tag "master" but not the disallowed tag "experimental"

  3. If no instances are available in the local datacenter, then query up 3 of the nearest remote datacenters, based on RTT data from network tomography

  4. Return results with a DNS TTL of 10 seconds

As shown in the example above, this simple redis-with-failover.query.consul prepared query exposes a great deal of functionality in a way that would be impractical to encode into a DNS query. Other properties of prepared queries include service discovery ACL support, session integration, and a user-defined list of datacenters for failovers. Existing prepared queries can also be updated on the fly to change their behavior without having to change any Consul agent configurations.

Prepared queries were designed for future extensibility so that DNS can continue to support new features offered by Consul as it evolves. Query results are also available via a new HTTP endpoint for integration with other applications.

More details are available in the prepared query HTTP endpoint documentation

event "" { policy = "write" }

keyring = "write"

More details are available in the ACL documentation and the upgrade process documentation.

TCP and Docker Container Health Checks

Consul 0.6 introduces two new types of health checks. A TCP check performs a simple connect at a periodic interval. Here's an example definition:

{ "check": { "id": "ssh", "name": "SSH TCP on port 22", "tcp": "localhost:22", "interval": "10s", "timeout": "1s" } }

A Docker check runs a script inside a container at a periodic interval. An optional shell can be specified, or else /bin/sh will be used by default. Here's an example configuration:

{ "check": { "id": "mem-util", "name": "Memory utilization", "docker_container_id": "f972c95ebf0e", "shell": "/bin/bash", "script": "/usr/local/bin/check_mem.py", "interval": "10s" } }

More details can be found in the health check documentation.

Better Low-Level Network Robustness

Consul 0.6 introduces many low-level changes to help make Consul more stable in the face of misconfigured or misbehaving networks.

A TCP fallback probe was added to Serf's node failure detector in order to help operators diagnose a common misconfiguration where TCP traffic is allowed between nodes but not UDP. A log message will alert the operator to the problem, but the node will still be probed successfully, preventing a flappy failure detection. This also helps ride out brief periods of high packet loss by providing a more reliable alternate path to probe another node.

Consul's stream multiplexer and connection pool logic was also improved to provide much lower overhead when network reconnects occur, as well as reducing memory usage when connections are idle. The internal RPC mechanism was also updated to reduce overhead for the Consul servers. These changes all help make Consul more stable under heavy loads, even during periods with some network instability.

Upgrade Details

Consul 0.6 has been designed to "upshift" automatically as servers and clients are upgraded to start taking advantage of network tomography features and new protocol features such as the TCP fallback probe. For most configurations, upgrading will just require an agent restart with the new binary.

Prepared queries introduce new internal commands and must not be used until all servers have been upgraded to version 0.6.0.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×