Guide

Configuring Disaster Recovery Replication

Published 8:00 AM UTC Feb 23, 2018

This guide walks through configuring disaster recovery replication to automatically reduce failovers.

Vault Enterprise's disaster recovery replication ensures that a standby Vault cluster is kept synchronized with an active Vault cluster. This mode of replication includes data such as ephemeral authentication tokens, time based token information, and token usage data. This provides for an aggressive recovery point objective in environments where high availability is of the utmost concern.

Note: If you have not set-up Vault Enterprise's replication capabilities, see Setting up and configuring Performance Replication

Use cases

The primary use case for disaster recovery (or DR) replication is to ensure business continuity requirements in the face of catastrophic failure of entire clusters; in other words, acting as failovers to standby Vault clusters.

In disaster recovery (or DR) replication, secondaries share the same underlying configuration, policy, and supporting secrets (K/V values, encryption keys for transit, etc) infrastructure as the primary. They also share the same token and lease infrastructure as the primary, as they are designed to allow for continuous operations with applications connecting to the original primary on the election of the DR secondary.

Note that DR does not forward service read or write requests until they are elected and become a new primary.

Configuration

Disaster recovery provides new options for how Vault users can architect multi-cluster and multi-data center environments.

Like performance secondary clusters, secondary clusters linked to a primary in disaster recovery replication mirror the security infrastructure and configuration infrastructure of the primary cluster. Unlike their performance counterparts, DR secondary clusters do not forward read/write instructions like their performance secondary counterparts. They simply wait and mirror data from their primary, until elected to serve as the primary due to the primary’s failure.

It is possible however to create sophisticated architectures of primaries and secondaries to satisfy both the need to scale read write performance as well as satisfy disaster recovery needs by using a combination of performance replication and DR.

<em>Sample DR cluster architecture</em> — *Sample DR cluster architecture*

For example, in the above architecture Cluster A is replicating data to clusters B and C - both in separate, geographically distributed data centers or availability zones.

To satisfy the need to scale performance for users and applications in another part of the world, the Vault user has elected to have Cluster B located in a region physically closer to distant, large hub of applications using Vault. The user does this with performance replication and applications accessing secrets in Vault forward requests for reads and writes through B to the primary cluster, A.

To satisfy the needs to protect against catastrophic failure of A, the user also establishes Cluster C and links it to A with Disaster Recovery. This allows C to be elected to resume operations for A and serve read/write requests to B without requiring applications to generate and authenticate new tokens for access to secrets.

It is possible to chain clusters together to satisfy a variety of performance and disaster recovery/compliance needs in a single infrastructure.

However, because of the nature of what data can and can not be forwarded from some secondary clusters, it is important to note a cluster’s existing replication relationships before attempting to have it serve as a primary for DR or performance relationships:

	Can be DR Primary	Can be Performance Primary
No Replication	Yes	Yes
Performance Primary	Yes	Yes
Performance Secondary	Yes	Yes, via promotion
Disaster Recovery Primary	N/A	Yes
Disaster Recovery Secondary	Yes, via promotion	No

Topology Examples

Below is an example of a simple topology with all secrets, access tokens replicated.

The below is an example of a multi-site replication topology.

Comparing disaster recovery and performance replication

Capability	Disaster Recovery	Performance
Mirrors the configuration of a primary cluster	Yes	Yes
Mirrors the configuration of a primary cluster’s backends (i.e.: auth methods, secrets engines, audit devices, etc.)	Yes	Yes
Mirrors the tokens and leases for applications and users interacting with the primary cluster	Yes	No. Applications must re-auth tokens and obtain new leases from the new primary.
Allows the secondary cluster to handle client requests	No	Yes
Contains a local replica of secrets on the secondary and allows the secondary to forward writes	No	Yes

As you're building out your configuration, the below explores and differentiates between disaster recovery and performance replication capabilities.

Sign up for the latest HashiCorp news

More resources like this one

4/11/2024FAQ

Introduction to HashiCorp Vault

12/28/2023FAQ

Why should we use identity-based or "identity-first" security as we adopt cloud infrastructure?

3/14/2023Article

5 best practices for secrets management

2/3/2023Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

View all resources