Get a retrospective on HashiCorp Vault's last four years as well as a look at what's coming in Vault 1.2 and beyond.
As HashiCorp Vault turns 4 years old, Vault's principle engineer, Jeff Mitchell, takes a look back at how Vault evolved in its development over those 4 years and gives us a view behind the curtain into how engineering was thinking about the product's design throughout its earlier stages.
Then Jeff gives a sneak peak at what's coming in Vault 1.2 (currently in beta), which will include features such as:
The first talk I gave while a HashiCorp employee was on this stage three years ago, it's really nice. While I'm being a bit nostalgic, you know, I decided to title this talk, "Vault is 4!" It's called Vault Update because I didn't get a better title in time. The subtitle for this is "Vault Update, mid-2019 edition." I'm a Principal Engineer at HashiCorp on the Vault side of things.
To paraphrase from Led Zeppelin, "Four Years Gone."
Just to set the stage here, I'm going to talk a little bit first about where Vault was, where it's been going, and how we think about it. Because, as you might imagine over the last four years, how we think about Vault, how we develop Vault, what we do with Vault has changed internally. We've gotten a better understanding of what people need, what people want, etc. I'm going to start off with a retrospective before I talk about what's coming in the near term.
In year one we were focused on basics. This is when Vault first came out. The first thing that we had was encrypted K/V (key-value) pass-through. Those of you that have been around HashiCorp a long time might remember Atlas. Atlas was this thing that sort of ran a lot of our tools together and provided this unified UI. There are a lot of reasons that we got rid of Atlas over time and it sort of turned into what's now Terraform Enterprise. Customers were saying, "What are you doing with our secrets? We're giving you our API tokens, like what are you doing with them?"
We had to say, "Something secure." The first use case for Vault was to protect those secrets. It was backed by Consul and Consul had K/V. We said, "Look, K/V actually works really well for a lot of things, but we need to be able to encrypt it on the way in, so at rest it's encrypted. Decrypt it on the way out. We need to control who has access to these secrets."
The next thing was leasing, so once we had access control, we had tokens, it had a validity period and so on. Then it was okay, what else can we do with this fact that we are now tracking time? For leasing we started out with MySQL in PostgreSQL, and you could say, "I want to generate database credentials that are unique to me. I want Vault to revoke them after some period of time and that will help me get to this sort of like zero-trust model."
The final basic thing that was started was encryption as a service, so that's our transit backend. One thing that I found really amusing about a month ago is Mongo DB. They made a lot of headlines and said, "We have a way to stop data breaches," and their way to stop data breaches is to encrypt fields instead of encrypting the whole database. We've been telling people this for four years.
Transit is very, very fast and people use it at scale all over regardless of whatever database you're using to encrypt the fields instead of encrypting the overall structure at rest. Because if someone gets access to your database and they can query what's in there and it's all unencrypted, they can still get everything. This is something that we saw early on and we talked to people about. That was year one—getting all the basics in.? Tracking things and storing things and so on.
Year two was about scaling. This was when we came up with Auto-Unseal. A lot of our customers were saying, "Hey, we need to be able to have Vault automatically come up if there's like a power outage or some kind of data center interruption. We're a big enterprise and we can't call people up in the middle of the night that are all over the world to provide their unseal keys."
We said, "Okay, how do we protect this?" We came up with this mechanism of being able to have the root of trust, have the key that decrypts the key ring, be something that can be protected by an HSM. That was also our first enterprise feature. Now of course, you know everything, we have many, many Auto-Unseal methods and most of that's an open source.
Then there was replication, so we came up with the first version of replication in year two—that was performance replication, which allows you to essentially have globally distributed clusters for the secrets replicated around. Then you can declare your local Vault for most actions. Later on we would add disaster recovery replication and some other things.
Finally identity. Everyone has identity already. They're using Okta or OneLogin, or they're using LDAP, right? People have notions of identity and they said, "We need Vault to reflect this." We started adding our identity system and that's been a very important part of things that came later.
Year three was about governance. We started out with control groups. If you are familiar with GDPR, we took the terminology for this feature directly from GDPR. It's about multi-authorization. Being able to have one person attempt to perform an action, get approvals from other people, and then have them authorize that action via multiple steps.
That was a big part of the governance story and identity became a big part of that. That's why it was something we had to do early to serve as the foundation to these things. Sentinel—a lot of our products have full support for Sentinel policies and replication mount filtering. This is used all over so people will say, "Okay, all of our primary data lives here, but because of various laws in various districts, we cannot have this specific data replicated over there. We cannot have this other data replicate over there." They use mount filtering to ensure that only the right data goes to the right places. Year three was in large part about governance.
In year four, we basically said, "Okay, we're going to look at the basics, we're going to look at scaling, we're going to look at governance, we're going to do more." Things like batch tokens, which allow ephemeral tokens that don't require any storage and have a fixed lifetime that that's used for things like AWS Lambda. You can do that at scale.
Agent auto auth. That's the ability to bake Vault agent into your AMI, or bake it into a container or something and have it automatically get a token for you and provide it to your application. Keep it renewed. Re-authenticate as necessary.
Performance standbys, agent caching, this is in the scaling part. So we have things like performance standbys where your HA nodes can suddenly act as redirect nodes locally for certain operations, things like reading K/V values or using Transit, then you can basically just add nodes in your data center in the same data center and it just scales linearly.
Agent caching—you can actually use the agent to prevent you from having to get more tokens or more database leases. You just query the agent whenever you need it and it gives you one that's already valid or gets a new one if it's not.
On the governance side, things like namespaces. The ability to essentially section off parts of Vault and parts of Vault's API. Where each namespace can have its own policies and its own mounts and so on. We had a lot of requests for SSO and so we built an OITC auth with full redirect flow capabilities. The question is what now?
To better understand where we're going, I want to talk about based on what's happened here, how long that journey in terms of not just features but how we think about Vault, and how that's changed. Vault started off as a collection of functions. Things like: "store this, track the lifetime of that." Someone would say, "Oh, we need to sign now, not just encrypt as a service. We need signing as a service." We would say, "Okay, we can build that in, right? That's a function that we can build in."
But where we've been going is a collection of services. Vault isn't just, "I can add signing as a service to Transit." It is the fact that I can run Transit or have a whole plugin system that can run whatever you want. It's things like identity. Identity is a system that then powers a bunch of things in Vault. It's things like the replication system that powers lots and lots of things in Vault.
It's moved to a collection of services that are internal and provided to all of these different plugins, and all of these different backends. As a result, we've gone from thinking of Vault as a tool. A tool is something that you can you just say, "All right, now this tool does this other thing. Here's an extra flag." If you're running a Unix utility, I've added another flag and it's a tool and it still does what that tool does, but now this flag allows you to tweak things slightly. It's gone more towards a platform.
The way that I like to think of it is: tools perform functions, platforms provide services. Vault, at this point, isn't just providing functions, it's providing services. A really good way to think about this is Transit. That had signing as a service and encryption as a service as I mentioned. Transit is basically a set of functions that lives in a Vault plugin, but the thing that powers a plugin and gives it the ability to scale and gives you the ability to have authorization, authentication, all those kinds of things is Vault's core and the services that we've built into Vault's core. The way that we really think about Vault internally right now is:
It's authenticating (You have to authenticate to Vault to use it, you get Vault tokens)
It's authorizing (you get policies, we have RBAC, we have Sentinel)
It's auditing (we have an auditing system that can output to Syslog or output to file)
It's distributed (we can distribute it out across geographic regions with replication)
It's scaling (it scales quite well at this point)
Most of the functions that are there can scale to tens of thousands of requests per second within a data center. You can just keep adding more nodes for a lot of the functions and extensible platform. We have a plugin system and that plugin system uses the same API that we use for all of our internal backends that are shipped with Vault. Same exact thing, it's just over a gRPC connection. We think of it as a platform, and as a platform, it provides security services that are multi-cloud.
You can run this in AWS, you can run in GCP, Azure, wherever you want to run it. We support things like auto-authentication for all of these clouds. We support authenticating two Vaults with all of these clouds. Multi-platform, right? You can use Vault to power secrets and our security services for your VM's. You can use it for your containers, you can use it for your Lambda functions. Any of these things are possible in Vault.
It's a multi-service. We have different permission sets that can be used to give different services. You want to give the ability to access this encryption key to that database? Great. You want to give this ability to access a different decryption key to some other database or some other service? Great. You have full control and it does this with the same functionality and same API.
The reason I call that out is I get asked a lot these days about AWS secrets manager and people say, "What about AWS secrets manager? What does Vault do that's different?" There are some similarities between what they do in terms of minting credentials, rotating credentials, things like that. Vault does this without relying on AWS. You don't have to be in AWS. If you are in a multi-cloud environment, which most of our customers are, then you have the same functionality, same API with Vault. You're just going across different clouds. When you deploy your applications across those clouds, they act the same, they behave the same, they call the same functions.
All right, so now I'm going to talk about 1.2. We see the 1.2 release as a microcosm of everything that I was saying here. I'm going to kind of walk through that a bit and talk about the new features, but what's coming in the 1.2 release? We think it really shows how Vault has become a platform that interoperates with a very wide world of security technologies.
We have different areas and approaches that are represented in this release on legacy and modern. We're interoperating with new things from 25 years ago and things that are now as recent as 5 years ago and stuff that's coming out in the future. These are functions that are used by enterprises, they're used by education, they're used by everybody else, and they're across clouds and on-prem. With the subset of things that are coming out in this next release we're spanning all of these gaps.
This, I think, really shows how we think of it as a platform and why we think that this platform is something that we can build whatever other interop we want on top. Along that way, you get the same authN, the same authZ, the same scaling, same replication, etc. You just mount your plugins and you go.
In 1.2, we have some new authentication methods. One of those is Kerberos, which is being upstream to us by a company named Winton. They're a company in the UK, I believe that does investment management. Thanks Winton. I need to put one little asterisk on here—because of the way this is getting upstreamed, where this was developed by a company. That company's lawyers have been wanting to talk to our lawyers, and lawyers are still talking.
We are pretty confident that this can get rolled out by the time the 1.2 final comes out. I feel pretty good about that but if it doesn't make it into 1.2, it should be sometime very soon after. They're just sort of, it comes down to indemnification, we're giving our code. What are you warrantying against? All that kind of stuff. That's in all the click through stuff that nobody reads except lawyers. The lawyers really care about it. It's being worked on, so it may not end up in 1.2, although I'm optimistic it'll come pretty soon after. It's basically feature complete.
Kerberos allows you to use a SPNEGO, which is HTTP-Header-based authentication to use Kerberos to authenticate to Vault. It does LDAP look up from mapping to user and group policies. This is super cool. I mean this is something that Kerberos has had for a very long time. It's still in use by lots of educational institutions for using active directory that's built on Kerberos. My former employer used Kerberos all over the place. It's used everywhere. It's really cool that we are now being able to interoperate with that, even though it's this very old technology.
Pivotal cloud foundry—spanning the era from old to new—PCF is much newer but we also are now able to authenticate with it. We're using a new feature that they've baked in a called App and Container Identity Assurance, which is a fancy way of saying they create TLS certificates for you and then put them in your app so you can access it.
We have fairly strict tolerances for timing. We make sure that this was an application that was just started up. We allow binding policies by application IDs, organization IDs, space IDs. You can say, "I want to make sure if it's this application ID, give it this set of permissions in Vault. If it's this space ID give it this other set." It's pretty flexible. We have a new PCF auth method.
Obviously we've been supporting various things in databases for a long time, but in 1.2, we have two new features that bridge that gap between legacy and modern. This isn't yet supported for all of our databases, but we're working on it. It's supported for I think two of them at the moment. This is credential rotation, existing accounts.
A lot of users have come to us and said, "Hey, so here's the problem. We want to get to the zero-trust model where everybody has her own username and password for our databases, we aren't there yet. We have a lot of applications that rely on a very specific user but they're okay being passed in the password." What we can do is we can say, "All right, we'll take on that user, we'll rotate the password for you on an interval, and authorized applications can query for the current password whenever they want." For existing account credential rotation Vault becomes the source of truth.
By the way, one thing I forgot to mention that I do want to mention here is all these things I'm talking about in 1.2, these were all derived directly from user requests and requests from enterprises. Everything that you're seeing here, these are all because the community came to us and said, "We really need these things." We say, "Great, let's do it." I'm pretty happy about that.
We're also generating credentials for Elasticsearch in 1.2 which is great. Widely used database, many of you probably know it and so you know we just added support for this in our database backend. Again, sort of bridging old and new kinds of approaches.
Key Management Interoperability Protocol or as it's commonly known KMIP. Could I just get a show of hands: people that have heard of this? Four or five, maybe six. Cool. This is one of those things that like if it means nothing to you, then it means nothing to you. If you know what this means, it means everything because this is one of those things that is a standard. It's basically a PKCS 11 successor. How many people for the PKCS 11? Okay, way, way, way more. Okay. All right.
Take PKCS 11, remove the requirement to bind against a C library and instead say we're going to make this network-capable. That's basically KMIP. It's basically networked KMIP where instead of defining the C interface it defines encoding and decoding. You see a lot of support for it with certain types of databases and a bunch of products can interoperate with it, and its growing in adoption.
I will say the PKCS 11 people would not like that first bullet point because they are still working on a PKCS 11 version 3.0 even though KMIP is now working on version 2.0 of it. Now you have basically competing standards that are both owned by OASIS and that's fun. Anyways, we want KMIP because we don't really want to write a C library. It's a defined standard encoding and protocol for carrying out security cryptographic related operations. It has about 40 different functions and it's centered around this concept called manage objects, which is basically like a key. I want to create a key, I want to fetch the key, I want to add attributes to the key, delete the key, revoke the key, et cetera.
If you look elsewhere, then you'll see a standalone product that often retails for millions of dollars USD and then has licensing in the thousands per client. We've been talking to some of our customers that are really into this idea of us supporting KMIP. They said, "Yeah, we've been wanting to go to KMIP because we have a lot of things that can support it, but the servers are literally out of our price range. We're already paying for Vault and they want us for just this one piece of functionality, they want to charge us millions of dollars for a high availability setup. Then we have to then additionally, on top of that, charge thousands per client and we just cannot afford that."
In Vault we implemented the protocol and we added a secret engine. I think that this here is really one of those 'power of the platform' kind of things that I was talking about before. Where we built this platform that replicates and authenticates, the KMIP authentication is a little bit different but you control it through Vault's API and through the authenticated authorized API. When you do this, it replicates to all of your Vault clusters for using replication. The secrets and the objects.
In KMIP, we support scopes, which are basically like enclaves of keys. You can within the same mount say, "I want to create a scope called this and a scope called that." Then in those scopes, you have roles that can be permission sets. You can say, "This client is allowed to create keys but not delete them. That client's allowed to find keys with list attributes but not fetch them." It's, it's pretty flexible.
It's early days, we'll enhance it over time. In talking to people about this, then people said, "Oh, so you do encryption with KMIP." I said, "No, we don't do encryption with KMIP." They're like, "Oh, I thought that you were implementing this." Here's the thing. Most of the users of KMIP that we've been interoperating with and that we've been addressing and that we've been talking to are using MySQL Enterprise, MongoDB Enterprise, VMware vCenter. Those are the three initial use cases for targeting. Those just fetch keys. They fetch keys and they do the encryption locally, they don't need to encrypt.
Instead of going through the encryption stuff, we said, "Okay, we're going to make it so you can query keys, list them, fetch them etc." It's very much driven by requirements coming from our users, our customers, and we're going to enhance this over time as more requirements come in. The other thing I'll say about this is, this is the one piece of functionality out of everything I'm talking about here that is enterprise only. Everything else that I'm talking about here is open source.
Show of hands for OIDC? Should be probably everyone. Yeah, pretty much. All right, so we've had a long history of doing things with JWT/OIDC and Vault. In 0.8.3 we added Kerberos Service, principal auth. You could be on a Kerberos machine and you could take your service principal token and send it to Vault and we can authenticate you.
In 0.10.4, we added support for kind of arbitrary JWTs. You can say, here's the sign-in key and I want to authenticate this and you check on claims for roles and all that. We then added support for that and Kubernetes and Agent Auto-Auth. Auto-Unseal, 1.0 Kubernetes projected account auth, which was kind of a newer way of doing it. Then finally 1.1.0, OIDC redirect flow/SSO support. You could actually go to the Vault UI. You could redirect to Okta or wherever else you're using for your identity provider and then come back and be authenticated to Vault.
In 1.2, we're adding something called the OIDC identity tokens. This is the ability to mint OIDC tokens within Vault. What you can do is you can say, "Here I have this identity information that's coming from my metadata or from the authentication provider's metadata, and I want to create OIDC tokens that include this information. I want to be able to use it for whatever. You could use it for downstream models.
One of the big drivers for why we did this is we would get a lot of people coming to us and saying, "We want to use Vault. We trust Vault. We want to use it as the thing for one app to authenticate to another app, but we can't figure out a way to do it decently with Vault tokens." I've seen a whole bunch of different ways that people have tried to do this, where they would say, "I'm going to take a Vault token and I'm going to take a blank policy that I'm going to attach to that token and then I give it to another application and they say, "Okay, does it have this particular policy attached?" If so, then it's authenticated but you have to make it a blank policy because you don't want them to actually be able to use that token for anything. It becomes this big mess.
For a long time we thought 'how can we fix this?' We decided to leverage what we've been doing with authenticating and identity and build in a way to pull out OIDC tokens. We support multiple keys and if you're using namespaces, we support multiple keys per namespace. We eventually want to enable some global keys that you can actually say, "For all of my Vault installs across all my namespaces, I want to be able to reference these global keys." That's not in 1.2. We have roles that use keys and define scopes, really I should probably have said claims here if you're familiar with it.
What I mean by static and templated—here's an example: so you can see the country claim up there is static. It is always going to be NL if you're pulling things out of this. But, for user info we're using identity information. The username is coming indirectly from the authentication provider. It's coming from an entity alias and then the groups are whatever groups I've been assigned to within Vault. Then there's also some other extra things you'd do like
time.now. Not before, it's
Then that turns into when we actually go through the template, you know the country is still the same. The time has been filled in and the user info has been filled in with the name and the associated groups. We support two types of verification. The first is the standard .well-known endpoints and the other is an introspection endpoint kind of what Kubernetes provides, which also validates that the entity that's associated with it hasn't been disabled.
So, there are some claims that we always put in such as what namespace it's associated with. I believe we always put in what entity it's associated with and so that way you can say, "Has someone disabled this entity in the interim time?"
This is really, really cool. In 1.0, we added the ability to generate OpenAPI schemes on-the-fly. This is actually something that in 1.2, you can actually pull this out, you can do a path help command and put the format in JSON and what it returns is actually OpenAPI. We have ways of basically pulling out OpenAPI from your entire collection of Vault, all of the mounts that are there, all the engines that are mounted on the plugins, and basically give you a full schema.
What we're doing is we're using this in 1.2. We're starting to use this to create the forms in the UI. Why is this cool? This is cool because what this means as we change parameters in Vault's code, then it updates those parameters if we change a parameter description, for instance, and it just automatically updates within the UI. It also means that when you're running your own plugins, eventually your plugins will be supported in the UI without us having to code anything. So your custom plugins will just be usable within Vault's UI.
This is super cool. The only thing that is currently fully dynamic is LDAP auth. That was sort of the canary in the coal mine and the rest are going to be coming over the next few releases.
I didn't have a better name for this feature because we've just been calling it TokenUtil because it's a helper. It's a helper library that we wrote called
tokenutil. I didn't have a better name for this, but what this means is in 1.2, all token parameters are the same across all auth methods, with full backward compatibility. Prior to 1.2, if you were configuring auth method to get a Vault token, then some of them would say: you have maybe TTL and periodic and CIDR-bound that you can bind IPs to. You would maybe have that for some and then this other method wouldn't support periodic tokens and this other method wouldn't support batch tokens or whatever you have. It was sort of wherever people said, "We need this support in this auth method." We just would add it to that auth method and it was a one-off thing.
In 1.2, we've actually turned all of these fields. We've pulled all this functionality into a helper and then we've seeded that helper through every single one of the auth methods. There's consistent names for the parameters, consistent descriptions, consistent behavior for how they act. There are few exceptions where there have to be, so things like if I'm using Centrify authentication, it doesn't support renewability because the way that the Centrify stuff works isn't renewable. There are certain things that, out of necessity, are different for particular auth methods. But where there isn't a particular reason, they are all now working exactly the same way. As I said, it brings more functionality, to many auth methods.
I'll talk about the tech preview status. This is using our Raft Library. It's the same Raft library that powers Consul and Nomad. We're using Raft directly within Vault. It's not memory-bound. One of the things that people have had troubles with is as they add more and more and more to Vault, they often don't realize that Consul's working set is stored in memory. The more data you add to Vault, if you're using Consul as a backend—which you know is what we support in Vault Enterprise—then your Consul memory usage grows and sometimes people don't realize that and they put a lot of data in and then Consul goes down, then Vault goes down.
We wanted to solve this for a while, but it's been a tough thing to do. Now in 1.2 Tech Preview, we have the ability to have this transactional and HA storage built directly in. It works the same way that you're accustomed to with Consul and Nomad, but it doesn't require them. It also supports HA and replication.
The reason it's a Tech Preview is for two reasons. It's not because it doesn't work, it's because there are a couple of things. We want to add recovery modes so that if something goes wrong, you can still unseal enough of Vault to be able to perform operations on the Raft library or on the quorum stuff. We want to support the ability to have some kind of recovery stuff in case things go South and we just didn't have time to put that in.
The other is that there's no production testing of this yet. It has not been out there. It has not been getting tested by the broader community. It's only being tested by us and by our unit tests and our integration tests. We want to put it out there, let people put it in their Dev clusters. Let people play around with it, report bugs to us. We just didn't really feel ready saying, "Hey, go use this. Go migrate all of your storage to this." We expect it to be ready by 1.3. There we might go from Tech Preview to Beta just depending on how much feedback we've gotten. 1.3 or 1.4 is when we'll say, "We think this is ready to go."
All right, so we're excited right? We're excited about this release. We're excited about where Vault has come. I think that the broad capability outside of the dynamic UI and Raft stuff, like all of these capabilities we've added are being built on this same interface. You get all these different systems, services that we built into the platform. Hopefully I've sort of made it clearer why internally we think of it as a platform rather than a tool. It's just come a long way in the last four years and it sort of stopped being the thing that just does some encryption or the thing that just does some storage of K/V.
What's next? Alright, I can't talk about this. I know probably a lot of you were hoping that I could. There are a couple of reasons for this. We have major new features planned. We have really cool stuff we have planned. We also have a lot of major upgrades to some of the Vault subsystems. The problem that we have right now is that the timing is a bit in flux. The reason timing's in flux is that there are some features that just require some lawyers to do some due diligence on things, which is nothing scary. It's just something we have to do as a company. We're not sure when that will be done. That might delay a feature from one release to another.
There are some other things that just based on where things land, we might have to shift stuff around. We can't really say what's coming out yet and we can't say when it's coming out. I'm very sorry about that, but what I can tell you is we continue to listen to the community. We continue to listen to our users, both open source and enterprise. We've heard you loud and clear on a number of big asks that you've all been making. We have some very big plans in store. In kind of a less broad way. I would expect that 1.3 is going to be—the way that we're thinking about this internally—somewhat of a stabilization release. While we sort of build out some bigger features for 1.4 and 1.5.
We also want to make sure that after all the big pushes of these features that are coming on 1.2, we want to make sure that we have plenty of time on our roadmap to dedicate to improving Raft, adding more features to KMIP. Fixing bugs that have been reported, all that kind of stuff. I think that 1.3 is likely to just make everything better that is there, more fancy new features will be coming in 1.4, 1.5. I think a lot of you will appreciate that some of you were maybe hoping for something big and splashy coming in 1.3, but at this point in time I'm thinking it's going to be more stabilization than anything else. That's it.
Thanks for letting me ramble at you for a while about Vault and about what's coming in the near term, where we're going, and how we got here. Like I said, Vault is just over four years old. I've been working on Vault for just under four years and it's been a great ride and I do want to say that we really, really appreciate the community. The community has helped really build Vault into what it is, both by telling us what we need to do. What do we need to build in Vault? What are the systems? What are the services? What are the functions?
The community is great. Honestly. Like good feedback, good people. They help each other out and we're super thrilled about it. I'm super happy that everyone is a part of this. Thank you very much. Have a good day.
Managing Secrets the Kubernetes-Native Way with HashiCorp Vault and Trousseau
Intelligence Community Guide Article Series
Secure Together: Consul + Vault
Incident-Driven Remote Infrastructure Access with Boundary, Consul, and Vault