Scaling Vault to Your Whole Organization
Nov 20, 2018
Vault is easy to set up on one node. But as you scale it up to a large organization, it can become a bottleneck if you don't take advantage of Vault Enterprise features.
As you scale up your Vault clusters from one node, to handling thousands of requests per second from your entire organization, there are features built into Vault and Vault Enterprise that help you at each stage of scaling. These include:
- ACL Templating
- Performance standbys
- Service tokens
- Batch tokens
Join Calvin Leung Huang, a software engineer at HashiCorp, and Brian Kassouf, the Vault engineering lead at HashiCorp, for a technical overview of Vault at scale.
Calvin Leung Huang
Software Engineer, HashiCorp
Software Engineer, Vault
Brian Kassouf: Hey everyone, thanks for coming to watch us talk. We're gonna be talking about scaling Vault to your organization today. I'm Brian Kassouf, I'm the Vault engineering lead for the Vault team.
Calvin Leung Huang: My name is Calvin, I'm a software engineer, also in the Vault team. Today, we're going to go through the journey of how your organization can go from having it's first node and user, to having a fleet of clusters with many users and services, using Vault as their secrets management tool.Our journey begins with adoption and how we can get from 0 to 1. We break ground by setting up our physical storage. In this case, a simple 3-node Consul cluster. Separately, we spin up our first Vault server. Alongside that, we have a Consul client so that it's able to talk to our storage backend.
We have our first developers and operators, essentially our early adopters start experimenting with Vault. Either via the CLI or the web UI. As our adoption gets traction, we can scale up by turning out a single Vault instance into a 3-node HA cluster. We also put a proxy in front of it. In this case, it's gonna be an ELB. And at this point, not only are humans using Vault, but also instances and services.
» The expansion phase
That's about when we get into the expansion phase. With increased usage, we see issues such as permission management and performance degradation. We experienced an increased number of leases, tokens and policies, which means that permission isolation and delegation becomes hard. The need for self-service arises. In order to mitigate this issue, teams naturally gravitate toward spinning up their own clusters. This makes sense because it lets them retain control and decrease the number of leases, tokens and policies, thus decreasing the overall server load.
However, individual clusters do come with their own set of challenges. For one, they bring burden to developers. Teams have to look for dedicated operators to watch the state of their Vault servers. They don't support replicated or shared data. Which means that services that need to talk to different parts of the org now need to authenticate to these clusters separately. And they make it really complex for our SecOps team to oversee since there's no centralized point to manage the state of the leases, the tokens and the policies. It's clear that there are two things that we need to tackle: management and performance. Today, we're gonna first talk about two relatively new features that will help us ease management. Later down the road, Brian will go into how we can scale up and improve our performance.
Let's go back to our organization for a moment and pretend that this is how it's laid out. We have ACME, which is our organization. Below it we have groups such as engineering and marketing, then below those groups, we might have teams such as DevOps or Frontend and so on and so forth.
» Namespaces: Vaults within Vault
What we want to do is let teams retain control of the data that they're allowed to access. But at the same time, give them the ability to grant other individuals, teams, or services access to that data if they desire. How can we do that? The answer is namespaces, a feature that was implemented in Vault 0.11 Enterprise. It let's us isolate and scope access to data. You can think of it as vaults within Vault. Each namespace has its own set of secrets engines and auth methods, its own identities store, and its own set of policies and tokens. This lets the team essentially have control over the components involved.
If we apply our org chart into Vault, this is what it would look like. We have the root namespace, which is managed by the security team and that's on by default. That's how we started using Vault. Inside or below that, depending on how you look at it, we can have a namespace for our engineering org. Then inside or below that, we have another namespace which is above engineering for our DevOps team.
Let's go ahead and create this in our Vault cluster. We can do so by easily insuring the
namespace create command to first create our engineering namespace. Underneath that, we create our DevOps namespace. With these namespaces all set up, we then write our first policy in engineering, called
admin-eng using the
engineering.hel file. We'll go into what that file looks like in a minute. For those of you who are less familiar with Vault, an ACL policy let's us allow or deny actions -- or in this case, we call it "capabilities" -- to certain paths within Vault.
If we go ahead and make a list of our policies, if we make it in the root namespace, we get back our default and root policies, which are available from the get-go. If we do the same in the engineering namespace, we get back the
admin-eng policy that was just written and a default policy which is available right after namespace creation.
Let's take a look at that
engineering.hel file. In that file we have two policies. One powerful aspect of namespaces is constrained policies. This means the policies written to a namespace will only apply starting from that particular namespace. To put it simply, the path of those policies will be relative starting from the namespace they're written into. As a concrete example, we have the first ACL, which is relative starting from
sys/mounts but it applies to
This policy allows us to list and CRUD operations for those paths and subpaths. Since namespaces can be nested, we can also do delegation. We can say that this policy, if attached to a token, can also reach into its child namespace. This is what we're doing with our second policy by allowing tokens that are attached with the
admin-eng policy to do listing and reading under
Let's go back to our terminal, and we issue our first token in the engineering namespace. We also attached the
admin-eng policy into that. If we do a quick token lookup, we're able to see that this token has been created under the right namespace with the proper policies attached. If we use that token to log in, we get a successful response. But if we do a Vault policy, a
vault secrets list command, we get back a
And that is because the CLI by default will operate in the root namespace and since we don't have permission to perform this action, we get back a 403. However, if we do a
secrets list in the engineering namespace, we're able to get back our list of enabled secrets engines. And that is because of our first ACL policy inside
admin-eng. If we do the second command, which is the same but under
engineering/devops, we are also able to get back the list of enabled secrets engines for that particular namespace.
One last thing that we want to do is a
secrets enable, which enables a secrets engine. If we do that in the engineering namespace, we're able to get a successful response. But if we do the same thing now in
engineering/devops, we get back a
permission denied. That is because we only have list and read capabilities for this particular namespace path. So if we go back to our org chart, we're able to mimic this particular layout in Vault with namespaces. We can do so without breaking this tree structure into separate clusters. More important, however, is that we don't lose the ability to oversee access and perform delegation.
» ACL templating, starting with the concept of identity
The second thing that I want to talk about today is ACL templating. But not so fast. To fully grasp the power of ACL templating, we need to first understand the concept of identity. There are two terms that come to mind when we're talking about identity. One is "entities," which allows us to map a single user to zero or more aliases. An alias can come from GitHub or LDAP or any of the auth methods.
You can think of it as a link between third-party services or authentication methods to a blob of info but inside Vault. Similarly, we have an entity group which can contain zero or more of these entities as its member, as well as any subgroups. How is this useful? Let's first take a look at the current state of things. Teams are using ACL policies through issue tokens to scope permissions. Some of the use cases that might pop up might be: users who want to have their own workspace within the KV secrets engine, or users who are logging in with the userpass auth method who want to update his or her own password. For each of these cases, if we didn't have ACL templating, we would have to create a very specific policy for each of our users or groups or services.
As the number of entities increases, and as the number of use cases that come up also increases, the number of policies that we mix and match also exponentially increases. With a number of increased policies, as you can imagine, it can bring us all sorts of nightmares. How can we improve the efficiency in managing these policies? The answer to that is ACL templating, which is a Vault 0.11 OSS feature.
We can address the first case, for instance, with the following ACL policy, which lets us give entities access to their own KV subpath. The beauty in ACL templating is that we can use the same policy and apply it to all the entities within the identity system. We don't have to craft a specific policy for each user and manage the life cycle of it like the creation or deletion of these policies when the users are added or removed. If our org has 10,000 users, for example, we can reduce 10,000 different policies down to just 1.
Second policy example that we have is quite long and barely readable, but trust me, if we don't have the ACL templating system, this would be next to impossible to replicate. It does something that's very simple, which is to let entities from the userpass auth method update their own password.
Without ACL templating, we would be either crafting a very broad ACL policy that would essentially allow a user to update the password of any other user, which is not ideal to say the least, or we would have to craft a very specific policy for each of our users in this particular auth method, which is not scalable.
These are some of the other parameters the ACL templating system provides. We can match entities by name, by ID, by mount accessors or by meta data. Pretty much the same applies for groups.
With these two new features, we've been successfully able to ease management. We're able to give control back to teams through namespaces and we've got the ability to create flexible ACL policies through the ACL templating system, but as our org grows, so does our usage. We eventually hit an inflection point that requires us to scale up our existing architecture and bump up our performance. With that, I'm going to hand it off to Brian to tell us how we can do so.
» How to scale Vault
Brian: Like Calvin was saying, as we onboard more teams and applications to Vault, their usage is going to go up, we're going to have to figure out how to scale Vault so that we can meet the needs of those applications and users. Today we're going to discuss scaling three distinct operations within Vault, and it's important to break it out this way because they all use different types of features within Vault to scale. First we're going to talk about scaling write operations, scaling read operations, and then finally scaling authentications.
First off, before we jump into those three things, I just wanted to cover some general scaling advice. Here we have a typical-looking Vault cluster, it's got three Vault nodes, they're each talking to a Consul backend, and a single Vault cluster can scale pretty far. You just need to think through how to scale up your storage backend because every Vault request, for the most part, will end up hitting the storage backend. Additionally, scaling up your Vault nodes so they can handle more requests, more open files, and more memory.
Let's talk through scaling write operations to Vault. We define a write operation as anything that results in a storage write to your storage backend. If a request comes in to Vault and it does an update or delete into Consul, that's what we consider a write operation. An important distinction between write operations is a shared versus a local write. We'll touch more on this when we start talking about the next feature, but essentially shared writes are things that are global to your whole Vault installation: things like Vault configuration, such as what secrets engines are enabled, what auth backends are enabled, and then the configuration for each of those. Local writes are things that are specific to a single Vault cluster, things like leases and tokens, and data around dynamic secrets.
Some common actions that applications might do that will result in one of these local writes is generating dynamic database credentials, so if you create a MySQL credential, that's a local write, generating dynamic cloud credentials, generating PKI certificates and so on.
The features that enables this type of scaling are called secondary clusters. They're Vault Enterprise features, they've been around for some time. I think they came out in Vault 0.7 and they allow us to provide horizontal scaling of read requests and the local writes. It works by using inter-cluster replication, so it uses Vault's replication components to replicate data from a primary-secondary relationship.
If we add a couple secondary clusters, our architecture starts to look something like this. We have a primary cluster and two secondary clusters, each cluster has its own HA Vault setup. It has its own Consul cluster and the primary will replicate shared data from the primary cluster to the secondary cluster. Shared writes are always handled on the primary cluster, so things like Vault configuration, configuration for you storage backends, KV puts are handled by the primary cluster and then replicated down to the secondaries. That's just so that the secondaries can have the same configuration as the primary and be able to operate in the same way.
Secondary clusters, as well as primary clusters, each have their own local storage, so things like leases and tokens, data for dynamic secrets, are able to be handled on each local secondary. Each cluster is in charge of keeping track of its own credentials and revoking them when needed. This allows us to scale both read operations and that type of local writes.
Next we're going to talk about scaling read operations within Vault. Inverse to writes, a read operation is any type of operation that does not result in a write to the storage backend. If a request comes into Vault and it only reads data, it doesn't change or update or delete data in Consul, that's what we consider a read operation. The interesting thing about these, it's not necessarily an HTTP GET method or a Vault CLI read, it can very well be a HTTP POST or Vault write. Some common actions that applications might do that result in a read operation are getting data from the KV storage, doing a transit encrypt/decrypt on some data, or assigning and verifying it, assigning SSH client keys.
We mentioned previously that secondary clusters enable scaling read operations but it is a very heavy action to keep adding secondary clusters just so you can scale read operations because each secondary cluster needs its own Consul backend. It's generally designed to scale across data centers or across geos. We wanted to introduce a way that would allow us to scale these type of read operations within a single cluster.
That brings us to performance standbys, which was new in Vault 0.11, our previous release. It's an Enterprise feature. Performance standbys provide horizontal scaling of reads within a single cluster. With performance standbys, write operations are still forwarded onto the active node so the active node is still handling the write operations, performance standbys handle the reads. It works by using intra-cluster replications, so it uses the same replication components as secondary clusters, but it does not go to a separate Consul backend. All the nodes talk to the same Consul backend.
Performance standbys are elected similar to HA standbys. Here we have four node Vault clusters, each of these nodes is talking to the same Consul backend and typical HA, all these nodes will come up, they'll talk to Consul, they'll all try to grab the leader lock, one of them will win and it will become active and now the active node is gonna handle all the requests. If the active node is lost, one of the standbys can fail over and become active and so this is generally a method of adding high availability to Vault.
When the active node comes up, it'll write some data into Consul's encrypted storage, the standbys can read it, inside that data is a TLS certificate that allows them to connect to the active node over mutually authenticated TLS connection. Now the standbys will request forward all read and write requests to the active node. In this setup, the active node is the only node handling any requests within the Vault cluster. Performance standbys take this leader election to the next step and now after all the standbys connect into the active node, the active node will select a certain number of them to elect as performance standbys.
We've selected these bottom two and now we have one active node, two performance standby nodes and one standby node. The performance standby nodes now are able to handle read operations locally on that node and they don't need to forward it on to the active node. Writes are still handled by active node, reads are handled on page performance standbys and if we lose the active node, any of these three nodes are able to come up and become active and when that happens a new performance standby leader election will happen and assign new performance standbys. Now we have write operations, local write operations and read operations scaling linearly and the nice thing about performance standby is that as you add more performance standbys, you're able to linearly scale the amount of read operations that you can do.
The next thing we wanna talk about is how to scale authentications. As you add more users and machines and applications to Vault, all of them will end up needing a Vault token before they can retrieve any secrets from Vault. Just some general advice around scaling your authentications is to reuse tokens where possible and that just means that if a single user or a single machine is going to be accessing Vault over some period of time, it probably makes sense to keep that token around and reuse it versus re-authenticating every time you need to talk to Vault. Along the same lines, determining the appropriate TTLs for tokens and leases will take a lot of pressure off the storage backend because if you are only using a token for about a minute or 30 minutes, it probably doesn't make sense to keep that token around inside Vault for 30 days.
If we tune that TTL to be a little bit closer to the actual usage time of the token, it'll be cleaned up faster out of the storage backend and out of Vault's memory and allow a new token to be able to take its place. The last bit of advice is to just use the appropriate token type for your use case. What are token types? This is kind of a new concept because up until today there was only one token type. Service tokens are what we now call normal Vault tokens and a normal Vault token is just, if you're familiar with Vault, what you know as the Vault token and service tokens have a number of features that make them nice and flexible and meet a lot of the needs of applications. Service tokens can be renewed so they have a default TTL and if you extend that TTL, it'll extend the lifetime of the token. They can be revoked so when you're done with a token you can revoke it and it's no longer able to be used.
Additionally, you can add limited use counts to service tokens. You can say this service token is only allowed to be used once or this service token's only allowed to be used five times and when those uses are up, the token will automatically be revoked. Service tokens are also tree based so they have a parent and parents can have children and those children can have children and so when the parent is revoked the entire tree is also revoked. All the children and grandchildren and it's a really easy way to manage a lot of tokens and token sprawl and be able to revoke them just with one Vault command. Service tokens also have accessors and an accessor is just an extra piece of information that's returned with the Vault token and this accessor can look up the Vault token and it can also revoke the Vault token but it can't authenticate you to Vault so it's inherently slightly less sensitive than the actual Vault token.
Service tokens also have nice features called cubbyholes and a cubbyhole is just an area of Vault storage that's set aside for this particular token. Only the token can write or read from it and no other token can read another token's cubbyhole. Just an area of Vault that this token owns. Because of all these features, creating a service token is a very heavy action. If you think about it, we have to write the token value, we have to write the accessor value, we have to write the parent index and potentially some cubbyhole data. Right there, that's just four writes to the storage backend and then we also have to clean up these tokens, so that's four more deletes to clean up all those things. If you're on Enterprise, we're also writing wall entries for those and so creating a service token, creating a lot of service tokens, really explodes the number of writes that are happening inside of your storage backend.
But the nice thing about service tokens is, because they're stored in the local type of storage, they are able to be scaled as we add more secondary clusters. If you have two secondary clusters and you add a third, you now have one-third more capacity to create these service tokens. Additionally, because they're local storage, service tokens do not work across secondary clusters. The secondary cluster that you create the token on or the primary cluster, those tokens won't be available to use across those clusters. Batch tokens are a new feature in Vault 1.0, which came out today and they are an open-source feature and batch tokens are designed to be extremely lightweight. They cause no persistence to storage at all and they do that by taking the data we would have written into storage and instead encrypting it and returning it to the client. It's a client-side-only token. Vault has no knowledge of the token until it's handed to it and it can decrypt and verify the data inside of it.
These type of tokens are not able to be renewed or revoked because they don't persist any data. They don't have a lot of the features of service tokens so they aren't able to have child tokens. Batch tokens do have parents and if the parent is revoked then the batch token becomes invalid. They also are not able to have use counts and they don't have accessors and they don't have cubbyholes. So when you create a batch token and you use it to create a dynamic secret that results in a lease, the lease lifetime for that secret will be constrained to the lifetime of the batch token. If the batch token is only good for 5 more minutes, your secret will also be cleaned up in 5 minutes. You're able to configure rolls or mounts within Vault to return this type of token. You very granularly decide what user or application should be given a batch token and which user should be given a service token.
Batch tokens are particularly useful for applications that have a short running time and will need to authenticate the Vault a lot. Armon mentioned in the keynote the functions-as-a-service—there's a greater use case for this. If you have a Lambda job that needs a Vault token, it's only gonna run for about a minute and it's gonna create a lot of these functions. It probably doesn't make sense to use a service token and create a lot of writes for just this short amount of time. The nice thing about batch tokens is, because they are not persisting any data, they're technically considered read operations and so as we add more performance standbys we can also linearly scale our ability to create batch tokens. If we add two more standbys, we can create even more batch tokens. Additionally, because the encryption keys that are used to create the batch tokens are replicated across secondary clusters, you're able to use batch tokens as long as they don't have a parent that is a service token.
You're able to use them across clusters. If you create a batch token on one of the secondaries, you can use it on other secondaries or on the primary and that's another really nice feature that enables new workflows with batch tokens.
Just to recap. Now that we have changed our infrastructure a little bit and started using namesspaces and ACL templates, we are better able to manage our teams. Our shared writes, such as KV and configuration, are still handled by the active node on the primary cluster due to the way replication works, but local writes are scaled per secondary cluster. Service tokens are also scaled for a secondary cluster. Our reads are scaled as we add more performance standbys and so are batch tokens. That's all we had for you. Thank you so much. We'll be around after if you have questions. That's it. Thanks.