We are excited to release Consul 0.7.1. Consul is a critical infrastructure service for organizations that rely on it for service discovery, key/value storage, and health checks.
The focus for Consul 0.7.1 is the ability to take a complete snapshot of Consul's state and restore it for disaster recovery. With three or more servers, Consul is highly available, but if a cluster is lost completely, it's essential that organizations have a workflow to quickly restore Consul as the source of truth for the status of applications and resources in their infrastructure.
New snapshot CLI and APIs provide an easy mechanism for operators to capture and restore the complete state of a Consul cluster. Consul Enterprise adds a new Snapshot service to automatically schedule taking snapshots, sending them off site, and rotating them.
Read on to learn more about snapshot and restore features in this release. You can also read the 0.7.1 change log for details on features like the new key/value store CLI, AWS auto discovery, and more.
Consul Snapshot CLI and API
Consul now has a snapshot/restore interface as part of a new
consul snapshot CLI and
/v1/snapshot API namespace. Consul internally has the capability to perform snapshots as part of its Raft integration, so we leveraged that to make a simple API for snapshot and restore. Because we use the same mechanism that Consul itself uses to manage its state for Raft, we know that this will pick up all object types that are added to the state store, and backup clients won't have to update and deal with changes in the future.
To save a snapshot, the command is
consul snapshot save <file>, which saves the snapshot to the supplied file. To restore a snapshot, the command is just
consul snapshot restore <file>,, which restores the snapshot from the supplied file. Restores can happen right into a running cluster, so there's no special startup orchestration to perform in order to do a restore. It's also possible to restore into a single-process cluster running in
-dev mode, which makes it easy to work with snapshot data for development.
Here is an example output from running consul snapshot to save and restore a snapshot:
$ consul snapshot save consul.snap Saved and verified snapshot to index 539
$ consul snapshot restore consul.snap Restored snapshot
There are a number of open source backup tools that currently perform a subset of this function:
Most of these are limited to the KV store, or the KV store plus some subset of Consul's other features. The new snapshot API gives an atomic, point-in-time snapshot of all state on the Consul servers which includes key/value entries, service catalog, prepared queries, sessions, and ACLs.
Consul Enterprise provides a new service that integrates with the snapshot API to automatically manage taking snapshots, sending them off site, and rotating them.
The service is a new
consul snapshot agent subcommand that uses the new
/v1/snapshot API to automatically back up Consul. Here's a summary of its features:
- Auto registers with the Consul agent as the "consul-snapshot" service, and registers health checks to show it's still alive and able to perform backups.
- Uses Consul's key/value store to coordinate electing a leader and handling failovers automatically. It's simple to achieve a highly available snapshot service by running multiple agents.
- Snapshots can run at a configured interval with the agent as a long-running daemon, or in a one-shot mode which is useful for snapshotting from batch jobs.
- A simple retain setting allows for a configurable number of snapshots to be saved and automatically rotated out. Rotation works in daemon or one-shot mode, and can also be disabled to allow snapshots to accumulate indefinitely.
- Snapshots can be stored locally or pushed to Amazon S3. The architecture is set up to easily add other storage back ends in the future.
Here is an example output from running the snapshot agent:
$ consul snapshot agent -aws-s3-bucket=hc-test-slackpad -aws-s3-region=us-east-1 ==> Consul snapshot agent running! Version: v0.7.1_ent Datacenter: (default) Interval: "1h0m0s" Retain: 30 Stale: false Mode: Daemon Service: "consul-snapshot" Deregister After: "72h0m0s" Lock Key: "consul-snapshot/lock" Max Failures: 3 Snapshot Storage: Amazon S3 -> Region: "us-east-1" Bucket: "hc-test-slackpad" Key Prefix: "consul-snapshot"
==> Log data will now stream in as it occurs:
2016/11/18 13:16:12 [INFO] Waiting to obtain leadership... 2016/11/18 13:16:12 [INFO] Obtained leadership 2016/11/18 13:16:13 [INFO] Saved snapshot 1479503772206993731
To learn more about Consul Snapshot and Restore, read the documentation which goes into further technical detail on the open source API and Enterprise snapshot service.