HashiCorp’s Rift proof-of-concept app automatically grants and revokes infrastructure access to on-call engineers when an incident is triggered and resolved.
When applying security best practices to our applications and infrastructure, the recommended approach for secure remote access is the principle of least privilege. This means that users and applications are granted only the minimum permissions required to perform their function. As a developer advocate, I know that generally speaking, engineers do not need access to production environments for our day-to-day roles. But that’s not always the case.
This post introduces an event-driven workflow that enables dynamic, on-demand access controls using HashiCorp Boundary and Vault. This workflow has been prototyped into a tool we call Rift. Note that Rift is a proof of concept demonstrating event-driven access control. It is not currently intended for production use.
There are a few exceptions to the principle of least privilege access where wider temporary access may be required. For example, when engineers are on call and troubleshooting an incident, they may need access to underlying infrastructure to identify and remediate an issue. Traditionally in these scenarios, the software engineer requests access from a security engineer, who grants access to the required environment components and is responsible for revoking access upon resolution of the incident.
This scenario presents a few challenges:
This is a common problem faced by many organizations, regardless of scale. As a former consultant, I have personally witnessed security incidents caused by not revoking access permissions.
What would a good solution to this problem look like? Typically, these issues are set off by an event that requires engineers to gain access to a system. Is there a way to use this event to trigger a workflow that grants access to the target infrastructure and then automatically revokes access when the event has ended?
One solution would be to assign a time-to-live (TTL) to group membership. While this approach could be useful, an event-driven approach provides additional advantages. Specifically, a TTL on group membership assumes that access permissions are granted via group principals, but the event-driven approach can be applied to any type of principal.
Based on this idea, I worked with HashiCorp Group Manager of Developer Relations Erik Veld to develop an application internally called Rift, which facilitates this event-driven workflow, as demonstrated by HashiCorp Senior Developer Advocate Kerim Satirli at HashiConf Europe 2022.
Rift acts as the glue between three different systems:
The end-to-end workflow relies on the alerting platform sending a webhook notification of the incident to Rift. Once Rift receives this notification, it takes the payload and uses this information to ascertain who is on call. Rift then makes a call to Boundary to grant the on-call engineers access to the underlying infrastructure. Once the incident is resolved, the alerting sends another webhook notification of the incident resolution to Rift. Rift then processes this notification and makes another call to Boundary to revoke access for the on-call engineers.
This workflow requires a few things to be set up in Boundary:
The users must already exist in Boundary The host catalogs, hosts, host sets, and targets must already exist The relevant credential libraries must be added to their respective Boundary targets. For more information on how to configure credential libraries, see the Vault Credential Brokering Quickstart guide
The alerting platform needs to be configured to send webhook notifications to Rift. An example of this is the webhooks implementation that PagerDuty has built into its platform, which would need to be configured with Rift’s callback URL.
Rift also needs to be configured with Boundary credentials to receive enough permissions to perform the following actions:
Create/update/delete roles and associated configurations (
grant_strings, add/remove principals)
Adding the credential library to the Boundary target enables the on-call engineers to connect and authenticate to that target. This is all facilitated by Vault brokering short-lived credentials.
Rift must also be accessible to receive the webhook notifications, and it needs to communicate with the Boundary controller. Once all of these building blocks are in place, you should be ready to experiment with Rift.
This blog post examined the challenges around effective access controls in a zero trust environment and proposed an automated solution to grant temporary access during a production incident. The proof of concept, called Rift, uses HashiCorp Vault and Boundary to broker credentials to engineers who need production access. Upon incident resolution, Rift automatically revokes the credentials and limits access to the production environment.
We hope you will experiment with Rift and let us know your thoughts, ideas, and how this relates to your operational challenges. Since Rift is a proof of concept, we’d like to learn more from your feedback and suggestions on the best and most useful solutions for your day-to-day operations. Please provide your feedback can be provided on this github issue.
We recently launched Boundary on the HashiCorp Cloud Platform (HCP), now free in public beta. HCP Boundary provides a single, fully managed workflow to securely connect to hosts and critical systems across Kubernetes clusters, cloud service catalogs, and on-premises infrastructure. Try HCP Boundary today!
Learn how to migrate HashiCorp Boundary from one KMS provider to another, and learn more about how its encryption works.
Here’s how to use HashiCorp Boundary to provide identity-based remote access and credential management for Kubernetes clusters.
Before we ring in the new year, here’s a look back at some of the most important moments in 2022 for HashiCorp.