Autopilot for Integrated Storage is taking the operator experience to the next level.
We introduced official support for Integrated Storage in HashiCorp Vault 1.4, which allows Vault admins to configure an internal storage option for storing Vault’s persistent data rather than using an external storage backend (via the Raft consensus protocol). With each subsequent Vault release, we have continued to improve the operational experience and we are pleased to announce a highly requested feature called Autopilot in Vault 1.7.
Integrated Storage eliminates much of the operational overhead of managing a separate storage backend and avoids the additional networking imposed by these separate systems. This, in turn, reduces the complexity of the Vault cluster deployment and also makes the diagnosis and troubleshooting of issues easier, reducing mean-time-to-detect for issues and mean-time-to-restore for customers.
The Vault team has added several enhancements over the last few releases to improve deployment of Integrated Storage clusters.
These features made Integrated Storage easier to use with cloud environments and snapshots. However, operators still had to use manual methods to monitor the health of a cluster, ensure cluster stability when nodes are added (or removed), and clean up failed nodes in a cluster.
Vault 1.7 was made publicly available on March 25, 2021. This release introduced support for the Autopilot features in Vault open source. Autopilot, as the name itself suggests, will help automate and simplify Vault operator and admin workflows for monitoring and operating Vault Integrated Storage clusters.
$ vault operator raft autopilot get-config
Key Value
--- -----
Cleanup Dead Severs false
Last Contact Threshold 10s
Dead Server Last Contact Threshold 24h0m0s
Server Stabilization Time 10s
Min Quorum 0
Max Trailing Logs 1000
Autopilot is enabled by default with Integrated Storage clusters using Vault 1.7. Note, though, dead server cleanup is not enabled by default, it must be explicitly enabled. The primary pain points that Autopilot helps alleviate for operators are elaborated below.
$ vault operator raft autopilot state
Healthy: true
Failure Tolerance: 1
Leader: raft1
Voters:
raft1
raft2
raft3
Servers:
raft1
Name: raft1
Address: 127.0.0.1:8201
Status: leader
Node Status: alive
Healthy: true
Last Contact: 0s
Last Term: 3
Last Index: 38
Autopilot ensures cluster stability when new nodes join a cluster. A newly joined voter node is initially added to a cluster as a “non-voter,” and its state is monitored for the configured “server stabilization time” period. If the node stays healthy for that period, the node is promoted to “voter” status. This ensures that an unstable new node does not disrupt the entire cluster, and is handled without requiring operator intervention.
Autopilot takes away from Vault operators the burden of monitoring and cleaning up failed servers. Dead server cleanup, which needs to be explicitly enabled via the API, periodically scans the cluster and automatically cleans up failed servers. The “Dead Server Last Contact Threshold” configuration can be used to tune the time to wait until declaring that the lost node is “failed” and cleaning it up from the configuration. When dead server cleanup is enabled, a min-quorum configuration needs to be provided to configure the minimum number of servers to be retained in a cluster despite enabling dead server cleanup. This is essential so that cluster stability is not impacted due to quorum disruption.
With the various features now supported for managing Integrated Storage, operators have access to simple, automated workflows to manage and operate their Vault Integrated Storage clusters. HCP Vault, which is HashiCorp’s managed cloud service for Vault, also uses Integrated Storage for these reasons. Users of Integrated Storage will be able to benefit from the wide variety of workflows that have been tested for the customer-managed and HashiCorp-managed Vault products.
To get started exploring and using Integrated Storage, please refer to the HashiCorp Learn guides and the reference architecture documents as well as the documentation. For more information on Vault, please visit the Vault project website. As always, we are very interested in hearing about your experiences with the product, so please share your feedback so that we can continue to improve our products.
Do cloud right with The Infrastructure Cloud from HashiCorp. Unlock developer potential while controlling cloud costs and risk.
HCP Vault Radar conducts ongoing reconnaissance of unsecured secrets stored as plain text in code repositories as well as configuration, DevOps, and collaboration tools.
Secrets sync is a new feature in HashiCorp Vault that facilitates centralized management, governance, and control of secrets for multiple external secret managers.