Beyond Secrets, Automating PKI with Vault

By switching from a manually approved PKI process to Vault PKI, one company went from taking a week to rotate all their secrets, to minutes. See Vault PKI's ACME server and CIEPS features in action.


Welcome to Beyond Secrets, Automating PKI with Vault. I just wanted to give you a brief agenda before I get started. I'm going to start with an explanation of PKI and not go too deep, but just talk about what it is, and how it works. Because I know some people are less familiar with that and it's important context to have for the session. Then after that, I'm going to talk a little bit about how PKI is managed, I would say typically, today.

Then based on some of the pain points of doing things, the current or the traditional way, how Vault can help with that process, make it more secure and make it more efficient. With that, let's go ahead and get started.

»What is public key infrastructure?

Most of you are familiar with looking at your browser and you see the little lock up in the corner, and there's a ton of stuff that's going on behind that to give you that secure connection between a client and a server. But basically, per definition, it's the issuance of digital certificates that protect sensitive data, provide unique digital signatures for users and applications and secure end-to-end communication. So it's really a critical part of zero trust security, and like I said, it's used for websites, you use it every day you see it — load balancer, servers and all kinds of other things.

Now it's made up of three core components, and that's the key pair, the public and private key that is used to securely transmit data between points and also authenticate whoever is sending the data. The other is certificate authorities (CA). So certificate authorities are considered trusted third parties and they're governed by a strict set of rules. Those rules are determined by an independent body. So an example would be the CA browser forum, and it's made up of the world's largest browser companies, not a big surprise. And so that's where the whole thing starts with the trust and the certificates and everything else.

Last but not least, the actual end certificates, what we call leaf certificates, that go on the endpoint wherever they need to be installed. Those let you be sure that you can validate the identity of that endpoint. What they do is basically three different things. So being able to authenticate unknown users using digital signatures so you know who you're communicating with, and then being able to encrypt the data in transit with that public and private key to make sure that everything's secure when you're transmitting it.

Related to that, there's also a component of data integrity. So you want to make sure that you don't have bad actors changing data or changing things en route from point A to point B, and that is accomplished with cryptographic hashes. So basically a digital digest of the data that's being sent is created with a hash, it's encrypted with that private key, it is sent to the endpoint, and then it's decrypted and the hashes can be checked to make sure nothing's changed. If those hashes aren't the same, if the digest isn't the same, then you know that your data has been messed with.

So all of this put together creates what we call a chain of trust. And the chain of trust starts with that certificate authority or what we call the root CA in the beginning. And the root CA is signed, it has a public key, and in the middle of all this, there's an intermediate certificate authority, or several of them as the case is. And that serves as a buffer between the root CA and in your endpoint.

The endpoint is that leaf certificate that we talk about that gets installed on a website or an application, wherever you need that to be so that it can verify its identity. That's really important. Having that chain of trust. Like I mentioned earlier, what that might look like is you click open your browser and you're trying to go to a website, you get a certificate back, it has public key information and other stuff. And then you basically check your local trust store using that chain of trust to see: is this a real trusted root certificate, is this a trusted root CA?

Then you're going to take a piece of data, you're going to send it back, you're going to use the public key to encrypt it. And then if you get that lock and the secure transmission happens, you know that whoever sent you that certificate had the private key, they were able to decrypt it and you've got your secure connection. So there's lots of gory details in there with the public and the private key. There's lots of different scenarios that you use this in, but I thought the browser explanation was probably the easiest one to use to explain it.

»PKI pain points - A case study

Now that we're on the same page as to what PKI is, let's talk a little bit about how this process is managed today and, I would say, has been for years. I'm using a customer example. It's a customer that was managing things the traditional way and then at some point they decided there's too much pain here. This is too challenging. And they incorporated Vault into the whole process.

I could talk a little bit about the benefits when we get to that point. But just to set this up, this customer has something like a thousand hypervisors that are running about 20,000 VMs and they're running them across cloud and on-premises estates. So running stuff in AWS, and they have I think something like 4,000 certificates that are supporting these Java-based applications in these various locations. That gives you an idea of the size. 4,000 certificates is a lot to manage, relatively speaking.

It's also a lot to manage when you have to rotate them, and you have to keep them up to date, and keep from having downtime if you have certificates that expire before you rotate them.

What their process looks like is this. Typically, they have some user, it's usually an app developer that submits a request for a new certificate because it's been a year, and I'm going to keep saying a year because that may sound weird, that's a long time to leave a certificate out there. But it just happens because it's a pain to rotate these things, especially with large environments.

So they're going to go to a portal that they use to make that request, they submit the request ticket, and somebody from DevOps is going to take care of creating that new certificate. It might be a day, it might take a couple of days. It just depends on what the backlog is. But the point is that there's human interaction there and somebody has to actually complete that task.

Now, once that's done, they're going to manually update the stores, the key stores, the trust stores, other things you update when you create a new certificate. But then it's time for the networking team to submit their request for their certificates, their leaf certificates that need to go on their load balancers and other things. And they need to do it because they're the only ones that have access. That's for security reasons. So that's what they're going to do.

Once all the certificates are created, multiple teams become involved in testing and validating to make sure that with their new certificates, everything is working from end to end. Until that happens, nothing's going to roll into production or get updated and made live. So that whole process currently with a lot of companies is taking about a week. And with this particular customer, that's how long it was taking for them when they needed to rotate things.

On top of this, if you look at in a total year's time: how much time, person-hours, are you going to spend managing and rotating certificates for your entire estate? It's going to add up to about two months a year. That's a lot of time. That's a lot of time that could be spent doing other things that might be more strategic or whatever it is, but just simply rotating certificates in a year's time.

»Modern PKI benefits

So what's the problem? The problem really is that if you think about today's world, environments, infrastructure, we're always talking about dynamic applications. We're talking about multi-cloud environments and we're talking about microservices that you're connecting to build services and applications. We have expectations of being able to do things across environments to scale up and scale down on demand. And performance expectations that don't exactly get supported very well by manual processes, people getting involved, and multiple tools.

Our environments and the way we build and run applications and services and infrastructure has evolved. But I don't think the process to manage that with certificates has evolved very much. And just as a a trivial pursuit thing, if you ever go to a trivia night and it's tech trivia: Diffie-Hellman's not the one that came up with the public key, private key concept.

The British Secret Service published information on this in 1969 when it came into being. None of this went public until the nineties, of course, when the World Wide Web launched in '93 and it became a thing. And if you look at the processes and the way most companies are managing PKI right now, it really hasn't changed a whole lot since the nineties when this whole thing went public and they started managing things this way. So anyway, it's just something to keep in mind.

When we look again at this process, that's really what it represents. It's like, "Hello nineties. This is the way we're doing things and we've been stuck with it or doing that way for a while." In order to keep up with the advancements that we have in applications, cloud infrastructure, and platforms we really need a new way to do that. This needs to be more agile. It needs to be more secure.

So when you put Vault in the equation, you get the same value that you get in managing any other kind of secret. Managing PKI certificates is just another type of secret that we need to manage and actually more complicated than a lot of the static secrets management that folks are using Vault for right now.

So you get increased agility because you're taking the human error out of it and you're taking those roadblocks out of the process. You're also going to reduce the risk because you're tackling things with policies. You can apply policies that are associated with roles when you're managing your secrets. You're reducing cost because you're reducing downtime due to having expired secrets because they weren't automatically renewed and they weren't automatically deployed at the end points like they need it to be.

You're reducing outages, for the same reasons. A lot of these have similarities, but you can start to see the pattern here. You're increasing your overall security because you're creating certificates with short times to live, and you're rotating them more frequently because it's not a pain in the butt and it's not taking two months out of your entire year to do it, you're going to have certificates that don't sit out there for the entire year. So you get the idea.

»Vault PKI

What does that look like managing things with Vault? In PKI with Vault, things can happen in the matter of seconds. It's not going to take a week, it's not going to take the time it takes managing things with traditional processes. You can automate requests coming from the application into Vault — and that application, by the way, gets authenticated before it gets to Vault — to take any kind of action because you're using roles and you're using policies.

Then once again, you're going to be able to securely deploy things to the endpoint because of the security that Vault provides in the centralized controls for access management.

This is a quick look at some of the capabilities that have been around for a while in Vault. And I don't know how many folks have looked at it or know somebody who's looked at it for managing PKI. But there's a ton of capabilities that have been there for a while. Not the least of which is managing the certificates, being able to manage the private key issuing, being able to create signing requests, functioning as an intermediate CA, Vault can function as a root CA supporting all sorts of different key types.

But what I wanted to show you just to position against that is, in the last several years, HashiCorp has been continuing to enhance the capabilities for PKI management with the PKI secrets engine.

Some recent enhancements:

ACME support, support for Vault functioning as an ACME server was released in 1.14 in July.

  • There is a FIPS 140-2 compliant version of Vault that can be used if somebody is in a situation where they need that.

  • It also now supports OCSP in addition to CRL for certificate revocation.

  • And also the managed keys capability can be offloaded to HSM's, KMS's.

  • There's CA rotation using multiple issuers, that is also a fairly new capability.

  • And then cross-cluster unified CRL, so you can revoke certificates even if it's not on the same cluster that originally issued that certificate.

  • So it just makes it easier to manage things horizontal when you have to take those actions.

And then recently in the 1.15 release that just came out, we are supporting empowering customers to create their own custom policies and then also issue, if they need to, custom certificates. Some of the information or extension on those certificates, for business or compliance reasons, we're giving them the capability to do that with CIEPS, and that's the certificate issuance policy service and sits external but connects securely to Vault. So there's just a lot going on there in terms of capabilities that Vault has to manage PKI.

From a configuration standpoint, it's very configurable and like we talked about a little bit earlier, you will probably want to use multiple CAs depending on the situation and what you're supporting in terms of applications with PKI's or services. You can support multiple intermediates either laterally or vertically depending on what you want to do with them. And in addition to that, like I mentioned, you're applying roles and policies against those intermediates. So if you want to have different policies applied against different services, maybe different TTLs on those certificates, then you can have different instances of those intermediate CAs.

As I start to build it out, this just gives you an idea, and this is fairly simple, but I've seen some pretty elaborate configurations based on the hierarchy that customers have needed to use for PKI and Vault. It supports it really well.

In the case of our example customer, when they moved to Vault and started implementing things and automating things versus their old PKI management process, they basically went from a week to rotate all their certificates if they're doing that as a one-time task to being able to do it in minutes.

The way that they configured their hierarchy is: they might have one intermediate CA that was serving their web servers and some of their services, and then they had another intermediate CA that was servicing nothing but networking components. They had different roles defined for those different intermediate CAs. And then of course different policies apply to them.

Vault also integrates very well with external or offline root CAs. So that's another piece of the story as well.

»How do you scale with PKI using Vault?

First of all, I want to say Vault is extremely scalable. A single Vault PKI node can process thousands of requests per second. So it's not a situation where it is not performant, but some folks get into hyper scaling. And so I wanted to share with you some of the ways that you're able to configure Vault if you want to get more scalability or more performance out of it.

A lot of it is for horizontal scaling, so you can add more standby nodes and you can also scale vertically just like you would any other place where you beef up the hardware, more CPU, more RAM, and so forth. You can implement API rate limiting and then also offload any other functions to a completely separate cluster and be able to dedicate clusters specifically to the Vault nodes and make a separate one for transit requests. And then you can also consider geographic locations from a performance standpoint when you're configuring PKI from a scale out standpoint.

I think the other message to get across here is that Vault integrates well with existing infrastructure. You can see the ecosystem here that's supported whether someone has manual processes or automated processes, Vault integrates and sits in the middle of that as the hub extremely well because we have the identity access management on the frontend. It also integrates, like I said, with external CAs, internal CAs, however you need to configure that from a hierarchy standpoint.

I wanted to talk a little bit about the automation piece because we looked at that diagram of what Vault looks like managing PKI, and there's also a lot of options for how you automate things. So you can use Consul templates, you can use Vault Agent templates, ACME of course is another way to automate things. And I'm going to talk a little bit in more detail about that in a moment.

But the idea is that the agent can automatically authenticate with Vault. The agent can automatically deploy those certificates to the endpoints where they need to go. The agents can also tell when those certificates are going to expire and then rotate those certificates and then the whole thing happens again. And then the agent can also update its updated secure connection with Vault and renew the tokens with Vault.

A lot of this is happening through agents and capabilities and there's a lot of different options as part of this ecosystem to automate that whole PKI management thing, just like there is with a lot of the dynamic or automated management that Vault can perform for other types of secrets. So I wanted to talk about that. And there at the bottom it's showing the integration with other HSM's. Offloading key management can be done by integrating with Vault with other types of key management capabilities.

»Vault as an ACME server

So with that, let's talk a little bit more about ACME. So the situation was that ACME is an extremely popular certificate automation management capability protocol that's out there. And in terms of what Vault was able to do with it, we were not able to perform that function as the server, so we didn't really have that complete end-to-end story of certificate management. Customers were having to build their own custom scripts and there was a lot of manual process that was involved in setting up and configuring ACME. There's a ton of ACME clients that are out there, clients like Certbot, and that's one that I have in the demo that I'm going to share with you in a minute. But basically it was a problem where we got a lot of requests from customers saying, "It's such a common protocol, we would really like to have integration with this and we would really like for Vault to support it."

So with 1.14 back in July, you are now able to configure Vault as an ACME server and set up a secure connection with an ACME client, it's really easy to do. So I'm going to walk you through that and show you that here in just a minute. But you can see the cycle and how PKI management works is exactly the same, it's just that now Vault is functioning as an ACME server. And I basically explained that slide, but it's so simple because everything's functioning the same way and now you just have ACME as the server, and then I've got Certbot in there as a client.

»Demo: Vault PKI automation with ACME

With that, I'm going to flip over to my laptop to show you a demo of that. To set this up, I actually mounted the PKI engine in HCP Vault. I'm using HCP Vault, it's a cluster running on AWS. And then my client, the Vault client, and then also the Certbot client are running an EC2 on AWS as well, so it's all cloud.

We can see the PKI secrets engine is enabled and I've created a couple of issuers. I've created an intermediate CA and then also a root CA. Normally you'd put those in different mounts, but that was for efficiencies' sake. And then notice the common name called Long-Lived Root X1.

You'll want to remember that for later on in the demo. I've also created a role. This role, like I said, you can have policies assigned and different things that you want in terms of the certificate. For configuration this is really simple.

This is an all-new user interface that makes it easy to set up that came out in 1.14. The first thing that I'm going to do is I'm going to grab the public URL of my Vault cluster, and I'm going to put that in as the mount API path. You can see you've got the admin namespace in there. I'll also just put it in the AIA path.

The next thing we do is click to enable ACME as the Vault server. If we scroll down, you'll see something called EAB policy. That's the external account binding policy. That makes it possible to have secure HTTP between the client and between the ACME server, so you want to make sure that you select that. The other things here in the UI are all really reasonable defaults, so nothing else we need to do. I'll click save and all of our configurations were successful.

Really straightforward, it's really easy to do with the UI. Then I'm going to jump over to my terminal where I have the Vault client running and I have Certbot. The first thing I need to do is tune the headers for ACME, and those are the ones that are used specifically for Vault functioning as an ACME server, things like Replay-Nonce, etc. Then once I've got those headers configured, I'm going to go ahead and request the new EAB token so that I can set up a secure connection between the ACME server and my client. I can see that came from the ACME directory, that looks like the right directory. I've got my ID and I've got my key.

Once I grab those, I'm going to go ahead and export those and get that configured on the client side. What this does is it binds the EAB token to the Vault session token. So we're creating that connection in a little bit of a different way, but you do that with the EAB token when we're using Vault as the ACME server. Now I've got that configured and I can go ahead and send my certificate request over from Certbot and we'll see what happens.

All right, and I've got my certificate back, everything worked like it was supposed to. So let's just go ahead and parse that so I can see what's in the certificate. All right, now that I parsed it, I'm going to scroll up and we'll see something that we saw before.

There it is, our root CA, Long-Lived Root X1, we can see and now we know for sure that certificate came from our ACME server that's running on Vault. All right, it's okay, you can clap. Yeah. It worked and it was so easy. All right, so that's ACME.

»Vault PKI external policy service (CIEPS)

The next thing I want to talk about is a new capability that came out in Vault 1.15. I know it's a mouthful and it's funny to say, CIEPS, but certificate issuance external policy service. This, like we talked about specifically, is when customers are in situations where for business reasons, compliance reasons, they might want to have different data on their certificate. They might want to customize it. It could be some large cellular phone company that wants to put certain data on their 5G packets for a certificate.

That's really hard to do, and in some cases we were having to manually process requests. It's not common, but it's also a pain for the customer. It's a pain for us and for other folks and they need to be able to have control over this. So the service that we created, the other good thing about it is it leverages the security of Vault. So you have to securely access through Vault to get to this service. Prior to this, sign-verbatim was used and things were just passed through and there wasn't a ton of security around it.

This way the application coming in with certificate requests has to go through Vault, and then Vault is actually configured to point to the service. The service is listening and then the service gets the certificate requests and it can choose to do whatever it wants to do with it. So we're not controlling it, the customer is. And so if they want to reject it for whatever reason, that's fine, or they apply their template to it and it comes back through Vault, Vault signs it and then it goes back to the application.

That's the basic flow and what the use case is that this is solving for. So with that, I want to quickly jump over to a demo.

»Demo: CIEPS in Vault

Here's CIEPS. Okay, so for this I'm going to go ahead and enable the PKI secrets engine here, because it's not enabled by default, and then generate a root certificate. Once I've done that here on the Vault side, then I can jump over and in another terminal. I have the CIEPS service running and listening. It's listening on localhost port 8443, and you can see that it's just sitting there waiting for something to happen and that's the way it's supposed to be.

Once you set up that service, it's just going to wait for Vault to come over with a signing request or whatever comes over from Vault. So let's jump back over to the Vault terminal. Once I've done that, I'm going to configure Vault to point at the service, and you can see I've enabled external policy. That means it's going to that URL, it's looking for the CIEPS service and it can send things over there.

We've also pinned this to a trusted leaf certificate and that's what we're going to use to send over to the service and see what it does with it. Let's go ahead and send a CSR over to CIEPS and see what happens. Okay, so here we are. Now there's something over on CIEPS, it received our certificate signing request and it's done what it's going to do with a certificate and sending it back to Vault. So we'll jump back over to Vault and we can see that we've got something back. Now I'm going to parse that certificate and we'll see what we find in the certificate.

If I scroll up, I'm going to see 'CIEPS demo server certificate.' That's actually custom information that CIEPS service put on the certificate when I sent over the signing request. It may seem pretty simple, but it's really powerful if someone wants the capability to be able to customize these certificates. There is a reference service on GitHub that customers can look at to see how they build their own service. They build it, we've got some models for them and things that they can use to do that.

Anyway, thanks so much for watching the demos. I think we're out of time. There's so much to talk about. It's hard to put it in 30 minutes, but hopefully this gives you an idea of how you think about trying to help change some of these traditional methods of managing PKI and how it helps to do it with Vault. Thank you so much.

More resources like this one

Vault identity diagram
  • 12/28/2023
  • FAQ

Why should we use identity-based or "identity-first" security as we adopt cloud infrastructure?

  • 3/14/2023
  • Article

5 best practices for secrets management

  • 2/3/2023
  • Case Study

Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones

  • 1/20/2023
  • Case Study

Adopting GitOps and the Cloud in a Regulated Industry