A Vault Policy Masterclass
Nov 16, 2020
This session dives into how to use Vault and Sentinel to define ACLs using concrete policy examples, so you learn to define accurate and flexible policies for your apps.
- Nicolas CorrarelloRegional Director, Solutions Engineering, HashiCorp
Nicolas Corrarello: Hello everyone. This is a Vault policy masterclass. If you have seen me talk about Vault in the past you may have seen me use this slide. I think by virtue of Vault having extremely cool functionality, both at the top and at the bottom — meaning being great at establishing the identity of workloads — but also having interesting workflows when it comes to managing secrets. Nine out of ten times, I have purely concentrated on that; how you authenticate your workload or how Vault does this cool trick to get you a secret.
I've noticed I don't spend enough time talking about the middle, which is the most important thing. How Vault does that brokerage of secrets and how policy connects that identity with that secret. How do you define that policy? How do you establish who can do what in Vault — and the under what conditions part?
Hopefully, over the next 25 minutes, I'm going to be focusing mostly on that and giving you a good overview of what Vault policy brings to the table.
Personas and Producers
During this presentation, you may hear me talking about different personas and their relationship with Vault when it comes to policy. You may hear me talk about consumers — the people that need to access secrets. They may put a lot of effort in focusing on what secret you get or under what conditions.
On the other hand, you also have these producers — we tend to talk about them when talking about Vault. These are the people that are mostly running and operating the Vault service. These are the people providing you with the capability that ultimately allows you to do secrets management.
Who Am I?
In case you haven't heard about me in the past, my name is Nico. As you can see in the picture, I lost quite a bit of weight going through our latest lockdown. I'm the Regional Director for Solutions Engineering here in EMEA for HashiCorp. That means I have a team of about 25 people, and I should mostly be approving expense reports. But they do let me play with engineering every once in a while.
I had the chance to merge a couple of things into Vault open source for you. As I always say, if you're using the Consul Secrets Engine or the Nomad Secrets Engine, you're very welcome. If I broke something for you, well, I'm truly sorry. Believe me; it was for the right reasons.
Throughout about three and a half years in HashiCorp, I’ve had the chance to help operationalize a large number of Vault deployments. That has given me wide ranging experience of how to use the product — working with some of your colleagues in the industry. You can reach out to me in the usual means; I'm available on GitHub, you can send me an email. I'm also on Freenode. Mostly — I'm old fashioned — I use IRC.
What Are We Talking About Today?
I'm going to give you an overview of the Vault policy system. This is not going to be in-depth because I am assuming you have some experience working with the Vault policy system.
But I am going to give you the essentials because — it happens to me — you're using Vault for a while. Sometimes you don't go back to the documentation, and you don't realize we released something cool. And these are things that often don't show up in the change me but do show up in the documentation.
We're going to go into some of the more consumer-based use cases. How policy affects storage or retrieval of secrets. We're going to talk about the scope of how to define policies. We're going to talk about how you set the organizational policy, which again, might not be relevant to individual secrets — but might be relevant to the overall usage of Vault within your organization.
As usual, when I talk, I'm going to try to apply scale without friction. We're going to assume that everyone involved in the organization is an adult, and we're just going to provide guardrails so they can use Vault appropriately. We're going to try to go through some of the cases where you are not setting up yourself for failure — as you are skilled. And as usual, 9 out of 10 things in this presentation are in the documentation and they're all HashiCorp-sanctioned.
Let's go quickly into some of the basic concepts of managing policy in Vault. By now, you should be aware that there are a number of ways to enforce policy in Vault.
The most common one — the one that 9 out of 10 people are used to — is working with what we call ACL policies or Access Control Lists. These are written in HCL, which is HashiCorp Configuration Language. The same language — same construct — you use, for example, if you're writing Terraform code.
These are very much related to a specific API path, and they are linked to a particular token by name. This means if you are logging in to Vault using a particular identity, certain aspects of your identity — might be your LDAP group membership, might be your IAM role — will be tied to your set of policies. So, when you get a token, you're going to get a token assigned to that set of policies by name.
Endpoint Governing Policies
But on the other end of the spectrum, we have something very interesting, which a lot of people are not massively aware of — which we call Endpoint Governing Policies. These policies are written in Sentinel, which again, you might be aware of due to your use of Terraform. We're going to talk a little more in depth about Sentinel. But the important thing about Endpoint Governing Policies is that — regardless of the identity — they are assigned to a specific endpoint. We're going to talk a little bit about evaluation in a second.
Role Governing Policies
Then we have Role Governing Policies, which are also written in Sentinel. But much like ACL policies, they're assigned to tokens by name. As you imagine, you cannot have coalition in the names of your policies.
And ultimately, you have the root policy. A lot people have seen this, and the truth is the construct of a root policy is not written. It does not exist in written form. Rather it’s a reference that allows you to bypass the policy system. That's why you never use the root policy. The root policy is only assigned to root tokens while they exist. As soon as you have any superseding policy — as soon as you have written any kind of policy — the first thing you do is to revoke that root token.
That root policy is in case of emergency and only aligned to the root token. You would never issue a token or set up a role where that role is assigned to the root policy. Again, it's just overriding everything. You need to know that the root policy exists, but you need to know that that is the thing you never use — unless you're using a root token in case of emergency.
We have a couple of policy systems, and we have policies that act at different times. It's always good to have an idea of when and how — and at what time — a certain policy is enforced or even evaluated. When you're doing an authenticated call with Vault, the first thing that's going to happen — and that means you have a token — is that I'm going to evaluate the policies aligned to that token: First, the ACLs, then the Role Governing Policy that is assigned to that token.
Once I passed both of those policies, there is a further layer of oversight — the Endpoint Governing Policy. Regardless of your identity, if this endpoint has a specific policy — and we're going to look into some examples — we're going to evaluate that. If you get a pass on the three, you carry out the operation. But the interesting thing about Endpoint Governing Policies is that they also affect non-authenticated calls. And of course, if you are not authenticated, you have no role in Vault — you have no token; there is no level of policy enforcement, except Endpoint Governing Policies.
Endpoint Governing Policies. Very quickly. If you have certain status endpoints or things that traditionally would be opening Vault, but you want to restrict — let's say, to a specific IP or CIDR or something like that — guess what? That's a great use case for an Endpoint Governing Policy.
Now, this is a lot of policy. You may be thinking this is a lot of work to maintain. And honestly, it's the other way around because one of the first questions I get asked when I start talking in depth about this is who is supposed to write this? Well, the answer is a combination. And as we said, different people interact with Vault in different ways. Even if your team is small, you will notice that you play that role of producer or play that role of consumer — depending on what you're doing. In most large deployments, you will have a team that is running Vault for you. Then your scope might be just consuming secrets.
Who Manages Policies?
It's always important to understand the delineation of scope and what I should care about. If I'm running a Vault service that is going to be consumed by a large number of teams — i.e. I'm a producer — my scope in terms of what I should be writing policy for is mostly for these three things; I need to be enforcing organizational rules. For example, I was working with a large bank, and they were telling me Vault is a super powerful thing. But we don't want to allow people to use everything from the get-go. Maybe we start with the AWS secret engine. We allow that — and I'm going to show you a couple of examples of how we did that back then.
We're going to talk about policies that are specifically related to the service, which is what we care about. But ultimately — very importantly — if you want your Vault deployment to be successful, you have to assume that the people that are using this are going to be adults. You have to promote a certain level of autonomy and responsibility. I'm not going to tell you how to configure your AWS eventbridges, but I am going to tell you that you can only use AWS. I enforce organizational policy, and I let people get as much freedom as they can to be successful. Because guess what? In this world, if you provide a central service that is not useful to people, people are going to stand on our Vault cluster, and that's exactly what you want to avoid.
Then we have a layer of consumers, which are more concerned about the workflow. Potentially, they're going to be looking at more of this application is able to write, read, encrypt or decrypt. Or potentially, they are going to coordinate across a number of smaller teams, that ultimately — potentially — are running microservices, like app owners or something like that — that just have a narrower scope of policy.
But the idea is they should still be mostly self-service. You don't want to have a system running where — to get anything done — you want to raise a ticket. Remember, this is ultimately replacing a system of coordination when it comes to secrets.
While it's not directly related to policy, I would be negligent if I didn't mention Namespaces. This ultimately allows that level of delegated oversight. Remember that you're going to have this set of consumers that want their own Vault — and they're going to get their own Vault. They're going to get access to their own sys policy endpoint, and they're going to manage their own policies — if you will.
That's why it's important for producers that want to enforce organizational policy to leave those policies — as much as possible — outside the consumer workspaces. And that's a great example of where EGP comes into play and gives you a good overview of that.
ACLs with HCL
Let's talk about ACLs for a second. With that, I'm going to review basics, so everyone has a certain baseline in terms of where we are.
As we mentioned before, ACLs are directly tied to a path and they are linked by a name to a token or a particular role in Vault. These paths may include certain Glob or Wildcard characters, where you have the
+ character that matches a single hierarchy, a single path — or a single word in your path. Then you have the
* character, which you cannot put anywhere in the path — you can only put at the end to delimit anything that comes after this; it's subject to the same rules.
Over that path, you're going to express a certain name of capabilities. These map directly to HTTP verbs. As you can imagine, create, update, match, to put post, read, match to get list, to delete. But we have two more, which are specific denies. For example, if you have a
*path above, you can put a specific deny on the bottom — and the deny has precedence because deny always has precedence in Vault. Then you have these pseudo-capabilities. This is very interesting because — for example — you can do
/sys* and give access to all the system API endpoints. But some of them require pseudo. This is where you can separate higher privilege from lower privilege.
If you go to the API documentation, you will find marked — very clearly — which API endpoints need pseudo. It's not a pseudo operation as we're used to in the Unix/Linux world. It's just this particular person has pseudo access to this endpoint — which means it can carry out elevated operations.
Always remember that ACL policies are modeled around API calls. As we said, we have the path with the Glob/Wildcard; we have the capabilities with the HTTP verse. But then we have the actual parameters that are coming in the API call.
Vault can restrict what parameters are allowed to be used — what are required. Like you have to provide information on this parameter — and which ones are denied. Like you cannot write a value that looks like this.
Then you have another aspect, which is Response Wrapping. As you all know, you can get a wrapped response from Vault that can only be unwrapped once to check if there was tampering in the path. This is very useful when you have someone passing a wrapped token from one person to the other. Or you have a configuration management tool that you don't want it to know the secret, but you ultimately want the secret to get there.
You can establish minimum and maximum response wrapping times. So when a consumer asks for something, they can maybe wrap the secret up to 50 minutes — because your organizational policy might be, if the secret is not consumed in 15 minutes, it cannot be consumed anymore.
ACL in Practice
Let’s look a little bit about what ACL looks like, which you might already be used to. I have an example from my own Vault. I have a Puppet server. Sorry, I like using Puppet. And I'm using Vault as a CA for that Puppet server. For example, I have an intermediate CA in my house, and the Puppet server is allowed to issue certificates in its particular role. It can create, read, and update. It can also read pems.
That's just a simple example of a policy. I have an API path. I'm allowing something to do a certain operation on that path. Now let's get a little bit more — call it specific, or not — on what you can do with this path.
In this case, I'm giving someone access to create any kind of key. If you look at the top example, I have transit — which is a traditional MUN point. I have my keys, defined API word. Then I have a
+, which means any name and I'm allowed to create any kind of key — but with certain constraints. When I'm creating a key, I have to ensure in this case I'm not allowing plain text backups. If I'm sending a true in that operation, guess what? It's going to fail.
Even if I have access to create keys, I cannot create keys that allow plain text backup. Also, on the type — I can specify multiple values and say, "Sure, you absolutely can create keys, as long as they are AES256 or RSA-4096". Then I have a
** which is telling — look, there are parameters, you're free to do whatever you want. But the ones I care about are those — and those are the ones I'm going to enforce.
I have another example using the encryption function in any of those keys. I can encrypt things as long as I'm not specifying the key version. Which in Vault terms means you are always using the latest key version to encrypt information. A couple of examples there.
About a year and a half ago, we introduced the possibility to template some of those functions so you don't have to write policy continuously. An interesting example here — when I started this presentation, I went back to four years of my inbox with questions of, "How do I do this in policy?" And I started bringing up some examples.
This was a very interesting one, where someone told me, "I want to tag people to get access to a secret. I don't want to have specific policies for groups. I just want to use some pattern in order to write less policies." And I went, "Oh, that's a great challenge." If you're familiar with the Vault Entity System, you can add metadata to your entities.
This metadata could apply to a group or a user. You can bring in that metadata information in the Vault policy through templating. For example, I just created a key-value pair application, and it's APM-A — and I'm putting that information in the path. Every time someone tries to write to
APM-B, if they are not tagged properly in the entity, this policy will fail.
This is a very interesting technique in order not to have to continuously write policies that — maybe — you don't need to. You can template that and use predictable strings. There are a number of attributes that go through templating. This is literally a copy and paste from our site. I'll just let you get through it. It's mostly identity-related, but ultimately that is probably what you can predict when you're doing templating.
With that, let's go a little bit into Sentinel. This is something that a lot people may not be used to because this is only in the commercial version of Vault. But the key point about Sentinel is our ACLs there are pretty static. Based on a path and an identity, you can do this — or you can't do that.
Logic within Policy
Sentinel allows a greater level of granularity because you introduce logic within that policy; if this condition happened or that condition happened and so on — let it in or let it out. And it's working on top of ACLs. You may have access to this, but then you may enforce something else on Sentinel.
Sentinel is designed ultimately as a rule system. You have to think when you're writing a Sentinel rule; this is going to end in true or false. Based on that — if I have a false, I'm not going to go forward with the operation. If the main rule in Sentinel — and we're going to see a main rule in a second — is not true, we're not going to go forward.
Imports for Parameter Parsing
It has a number of imports that allow you more complex operations, like parsing an IP range or CIDR, doing an HTTP call — I'm going to show you one in a second — manipulating strings, and so on.
RGP or EGP
As we mentioned, it can be applied in two different places — directly on the role linked to the token or in the endpoint with EGP. I'm going to call a lot of RGP and EGP going forward.
Under What Conditions?
Let's look at a Sentinel rule, and let's look at the under what conditions. You can see I'm also using that template. In this rule, I'm saying if I'm requesting — and this is specifically for SSH — to sign an SSH key, and the valid principle doesn't match that entity name — is that, if it's true, let go, if it's false, don't.
In ACL, you may have access to sign keys, but in Sentinel, you're introducing that extra layer of granularity where you're saying, only if this looks like that. Now, the bit about Sentinel is you can start combining these rules and go forward with that. I was working with an organization that wanted that rule — but also wanted to check if they’re logging in on business hours.
As you can see, I created more rules. I'm saying I have a work-based rule, which is time between Monday and Friday — and work hours. You may have seen this example a lot while talking to us. Work hours is nine to five. Now, I'm adding a rule saying, if my previous rule and workdays and work hours are true, then let me go forward with this.
This is how you can use Sentinel to combine a number of rules or a bit of logic to provide a true or a false and grant access or not. Now part of my rule was superseded by something that we shaped later — which was that we can now put templates into the allowed users’ parameter in Vault. Someone did this and superseded that rule. I would like that better; it's even super simple. I want to use that just to show you an example of how you can author a Sentinel rule.
In my experience, Sentinel is used more and more at that level. If we go back to namespaces, you will see that there's this group that is ultimately managing my Vault for people. As I mentioned, this group doesn't want to start writing policy that can be modified in the child namespace.
We had a very interesting case way back then. They wanted to delegate namespaces, but they immediately felt like when this starts scaling, people are going to start creating child namespaces for everything. They created a rule where you couldn’t create child namespaces for existing namespaces that were delegated — just another interesting example.
Do You Have a Ticket for That?
I would like to bring another one that we used with SSH — and this is fairly recent. As I mentioned, we do have an HTTP import. Potentially if you have an SSH sign-in request — and you want to check if that sign-in request is going to be tracked or is legit — you can just grab that IP from the request and post it via HTTP to assess them — like a ticketing system. I stopped this with Sinatra and Ruby to test it. Ultimately if a ticket exists around that server, well, guess what? You get access. And if you don’t — see what I did there? It was a very interesting case when I started trying, so something cool that you can do with Sentinel.
You Shall Not Mount
In our example — and I mentioned this. What if I only want to allow certain secret engines? But on top of that, maybe I just want to allow seal-wrapped data. Seal Wrap — for those of you who aren’t aware — is a Vault functionality that allows you to encrypt data in a manner component with FIPS.
Maybe I’m giving this team access to very sensitive data, so the only way I want them to store data in Vault is seal-wrapped. When they create a mount — and they should be totally able to because you want to give that level of delegation — as long as they are seal-wrapped, or as long as they are AWS or SSH. Some examples about how you can use Sentinel here.
There are a couple of things in Vault that are not related directly to the policy system — or it's not written policy — but it is policy functionality that you should be aware of. One of these is control groups. Control groups are an extremely interesting thing — it involves more of the human workflow.
Sometimes — and 9 out of 10 times — you will probably go into Vault and say yay or nay, depending on what you're doing or how you're doing, or which path or which CIDR or whatever. Sometimes you're going to need a human to accept. Control groups provide an extra layer of authentication with a multi-man role. If you want to access a secret, I’m going to give you a wrapper, and 3 people out of 5 in an ALF group have to go and approve you to get that secret. And then you can unwrap it.
It's a very interesting feature to implement. It is very widely used in Europe because of GDPR — multi-man access is one of the aspects to take into consideration, but very nice to have.
Audit and Extra Headers
Another interesting thing that not a lot of people are aware — is on that audit side you can get straight information from the HTTP call. You can configure custom headers that are passed directly to the audit backend.
When people are going through the audit trail, you can provide certain contextual information into that API call. That allows the person reading the audit side — and remember, these are generally two different people — to understand why that was happening. It makes audits much easier — something to take into consideration.
Conclusion & Wrap Up
With great power comes great responsibility. The beauty of Sentinel is, again, you want to give that freedom. You want to allow people to move as fast as they can. The cool thing about Sentinel is we can provide those guardrails to allow people to do that.
Even if you’re using ACL, just think of some of the examples I showed you. If you want to create keys, that’s fine. But you’re going to do it to an algorithm and a cipher that the organization is comfortable with.
As usual, Vault is a security product. We try to accelerate people as much as possible, but there are always security procedures that need to be taken into consideration. Policy doesn't need to be that complex. We showed it with the way you write Sentinel, which is pretty readable. We showed it with templating policy. It doesn't need to be that complex. Just ensure it enforces what you need to enforce.
And ultimately, as I always say, read the docs. This presentation is current as of October 2020. If you're not in October 2020, go back and read the docs. A lot of things change in Vault; we release a lot of software every month. Please go and have a look at the docs to see if anything changed. With that, I would like to thank you very much. I hope you're having a great time in the conference and I think I'm going to be around for questions. Hit me up.