Skip to main content
Save 10-15% Register for HashiConf 2025 and save big when you buy 2+ tickets Get your passes
Guide

Configuring Disaster Recovery Replication

This guide walks through configuring disaster recovery replication to automatically reduce failovers.

Vault Enterprise's disaster recovery replication ensures that a standby Vault cluster is kept synchronized with an active Vault cluster. This mode of replication includes data such as ephemeral authentication tokens, time based token information, and token usage data. This provides for an aggressive recovery point objective in environments where high availability is of the utmost concern.

Note: If you have not set-up Vault Enterprise's replication capabilities, see Setting up and configuring Performance Replication

Use cases

The primary use case for disaster recovery (or DR) replication is to ensure business continuity requirements in the face of catastrophic failure of entire clusters; in other words, acting as failovers to standby Vault clusters.

In disaster recovery (or DR) replication, secondaries share the same underlying configuration, policy, and supporting secrets (K/V values, encryption keys for transit, etc) infrastructure as the primary. They also share the same token and lease infrastructure as the primary, as they are designed to allow for continuous operations with applications connecting to the original primary on the election of the DR secondary.

Note that DR does not forward service read or write requests until they are elected and become a new primary.

Configuration

Disaster recovery provides new options for how Vault users can architect multi-cluster and multi-data center environments.

Like performance secondary clusters, secondary clusters linked to a primary in disaster recovery replication mirror the security infrastructure and configuration infrastructure of the primary cluster. Unlike their performance counterparts, DR secondary clusters do not forward read/write instructions like their performance secondary counterparts. They simply wait and mirror data from their primary, until elected to serve as the primary due to the primary’s failure.

It is possible however to create sophisticated architectures of primaries and secondaries to satisfy both the need to scale read write performance as well as satisfy disaster recovery needs by using a combination of performance replication and DR.

<em>Sample DR cluster architecture</em>
Sample DR cluster architecture

For example, in the above architecture Cluster A is replicating data to clusters B and C - both in separate, geographically distributed data centers or availability zones.

To satisfy the need to scale performance for users and applications in another part of the world, the Vault user has elected to have Cluster B located in a region physically closer to distant, large hub of applications using Vault. The user does this with performance replication and applications accessing secrets in Vault forward requests for reads and writes through B to the primary cluster, A.

To satisfy the needs to protect against catastrophic failure of A, the user also establishes Cluster C and links it to A with Disaster Recovery. This allows C to be elected to resume operations for A and serve read/write requests to B without requiring applications to generate and authenticate new tokens for access to secrets.

It is possible to chain clusters together to satisfy a variety of performance and disaster recovery/compliance needs in a single infrastructure.

However, because of the nature of what data can and can not be forwarded from some secondary clusters, it is important to note a cluster’s existing replication relationships before attempting to have it serve as a primary for DR or performance relationships:

Can be DR Primary Can be Performance Primary
No Replication Yes Yes
Performance Primary Yes Yes
Performance Secondary Yes Yes, via promotion
Disaster Recovery Primary N/A Yes
Disaster Recovery Secondary Yes, via promotion No

Topology Examples

Below is an example of a simple topology with all secrets, access tokens replicated.

The below is an example of a multi-site replication topology.

Comparing disaster recovery and performance replication

Capability Disaster Recovery Performance
Mirrors the configuration of a primary cluster Yes Yes
Mirrors the configuration of a primary cluster’s backends (i.e.: auth methods, secrets engines, audit devices, etc.) Yes Yes
Mirrors the tokens and leases for applications and users interacting with the primary cluster Yes No. Applications must re-auth tokens and obtain new leases from the new primary.
Allows the secondary cluster to handle client requests No Yes
Contains a local replica of secrets on the secondary and allows the secondary to forward writes No Yes

As you're building out your configuration, the below explores and differentiates between disaster recovery and performance replication capabilities.

More resources like this one

4/11/2024FAQ

Introduction to HashiCorp Vault

Vault identity diagram
12/28/2023FAQ

Why should we use identity-based or "identity-first" security as we adopt cloud infrastructure?

3/14/2023Article

5 best practices for secrets management

2/3/2023Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones