Building Vault Plug-ins: From No-Go to Dynamic Secrets
Oct 07, 2019
Learn how to build your own Vault plug-ins when your system doesn't quite fit on the ones included in Vault.
There is always a case where an organization has either a custom or third-party system that you could leverage generating secrets for, or maybe an IAM system that doesn't quite fit on the ones included in Vault.
In this talk, HashiCorp solutions engineer Nicolas Corrarello shows how he went from zero to dynamic secrets by building his own Vault plug-in. This talk doesn't require you to know Go (although there will be code shown), but it does require a minimal level of understanding of object-oriented programming.
Regional Director, Solutions Engineering, HashiCorp
In case you didn't notice, I don't have a very Anglo Saxon last name. I come from an Italian family. If you don't get my English, I'm very sorry about that. This is the best English I've got. I have an Italian grandmother, and we are like nine grandkids. She says she loves everyone equal. I'm the favorite.
In any case, we ship a lot of software, and I love them all equally. But I do have a favorite. That is Nomad. I absolutely love it—many reasons. It's incredibly simple to deploy and get a workload running. It handles all the heavy lifting, registers services with Consul.
I'm not trying to sell you Nomad, but when I joined HashiCorp, it was a very new product. So every time we shipped something new, it was cool. HashiConf two years ago—this happened, and we announced that Nomad had an RBAC system. It had ACL, so you can manage what you do with Nomad.
I'm an organic speaker. I'll get to the topic in a second. Let's go with this. At that moment, it hit me, this is cool because we also Vault—that manages access managementVault. I can put Vault in front of Nomad and use the workflow in Vault to get tokens. If I'm a human, I can go authenticate using LDAP, get a token. If I'm a CI pipeline, I can use Opera or something like that— get a short loop token.
Coming up in the next 30 minutes or so, I'm going to explain to you how I did that. I went from never writing a line of code to having an MVP before HashiConf ended in 2017. You can all read the agenda. I don't need to go through it.
In terms of who I am, I love putting my picture there. II started doing this thing where someone took a picture of me with the picture slide, and then I kept getting tunnel vision with my picture slides. But this is a serious conference.
My job is approving expense reports. To be honest, I have a team of about 16 engineers. But somehow, the Vault team—which is super nice to me—I keep annoying them. They allow me to send PRs. I was employee number 70 or something. This is my third HashiConf US, and I'm based just outside London in the UK.
» Why build Vault Plug-ins for Nomad?
Why would you do something like this? I'll tell you the couple of cases I've seen in the past two or three years. There was this company in the Nordics. I cannot go into too much detail—let's just say they needed to cryptographically sign something very specific. They were importing that chain of trust crytographically; they were importing keys to Vault, and they didn't want to put them out. They didn't want the software to get it. They just wanted to consume those keys in terms of signing these documents.
Vault out of the box doesn't do that, but as Jeff said in HashiCorp Europe, Vault is turning into a platform—as opposed to what it was maybe when I started in HashiCorp three years ago; just a tool to deliver secrets.
I know a company that wrote a plug-in to broker access to an internal system that was like our software—just API keys. They wanted to have the lifetime of all these API keys shortened. I had one that linked with a third-party API that was specific to an industry.
Now the whole story is— as you know—if you're using Vault, are many of you using Vault? I'm not going to ask for a hands-up. But for the people who are watching this on YouTube, there are a couple of nods.
Someone wanted to broker access to an Oracle database like we do with MySQL or SQL Server. This was about two years ago. And guess what? That ended up being the master branch of Vault. That's something that we now support. I know a few companies that write plug-ins that could be useful for our people and don't open source them. I'm not going to go into specifics on that.
Now, on to why I did it; I'm in sales, so when I tell a story, I like it to fit because I'm not really in sales. I'm a solutions engineer. So if I go and tell you, "Here's where Vault fits, and it does access management and here's where Nomad fits and handles your scheduler”. Guess what? If I cannot show you that, that breaks my story.
Honestly, the only reason is Vault just rocks. I’ve had an interesting three years with this particular software. You know, I come from the ops side of the house. I'm a super-—pragmatic person, so the whole approach of Vault always made a lot of sense to me.
Remember I'm from the UK; If this joke doesn't make sense, it's because we drive on the other side. I wasn't driving when I took that picture. My insurer is a HashiCorp customer, so they might see this. Anyway, keep left unless overtaking. I think this sign should be in every ops team. If you're not making things go faster, just keep left. Now that sign is ironic as this motorway—the M25— looks like this most of the time. You can keep left or right, you're not going anywhere.
» What do I want to accomplish?
If I need to do something in Nomad back in the day, I can have a long-lived token that is provisioned by Nomad as it's shipped. But we do things better now, and we have a product that does that. I want to plug into that workflow and say, “I'm going to establish my identity and based on a preexisting policy, I'm going to get a Nomad token that allows me to do what I need to do.”
This is funny because I wrote this code two years ago, so I can actually show you how it works. And I'm going to try it. I'm going to establish my identity with Vault. In this case, I'm using the only identity method I have, which is OIDC, so I'm going against Google. This popped up a browser screen so I can do the Google login. You don't need to see it. I'm just clicking my name. That sends a call back to Vault. Boom. I got a token.
Now that I got a token and Vault knows who I am, Vault knows what I can do. And based on that, I can go and read credentials. I can do something like
vault read nomad/creds/admin. I wrote this API two years ago. If I got it right, I get a token. You can go ahead and write it pretty fast.
The truth is it will expire in an hour, and that's the default. Even if you have it, I'm watching you for the next 35 minutes. You can't do anything with that token. I can also potentially revoke it, which is another cool thing about Vault. This is if I'm an operator, and I want to deploy a job.
Another interesting angle to this is a while ago I wrote this little app that tracks my library. When someone goes and adds a book, I can request a token, which in this case is for a dispatch job in Nomad.
It's a parametrized job. It's something that I pre-programmed, a function— I tell it to run. This is Ruby code, by the way. It’s doing is the same thing I just said—it's reading a credential. The application gets a secret and tell its own scheduler, “Go run something else.”
It's not only that you are doing the tail end of the deployment, and you want to push a new Nomad job and get something done. You can break your interactive workload in your application and say, “I'm going to leverage this scheduler and do what an operating system scheduler does—but at the network level. ”Go and schedule this task." Again, I'm not selling you Nomad—–you can do all this with Kubernetes, which I know might be in vogue.
This is what I wanted to accomplish. I can't believe that this demo runs. This is running against my home cluster, which is on the other side of the Atlantic—see if you can find the cluster that runs that. That's three raspberry pies plus one extra Raspberry Pi for the load balancer—running at home.
» A good idea is never enough—you need to execute
To do this, you must know that Vault is a heavily complex system. It has a lot of logic. On the other hand, two years, ago we created an interface. You can leverage all this complexity through a plug-in system. You don't need to know all this. You need to know some of this—I'll be honest with you—but you don't need to know all this to get things done.
When I went in my usual excitement to try to convince people to write this plug-in, they were like, “It's a great idea, we all like that—we will get there.” But it wasn't enough for me. So I said, “I'll just go and write it.” Maybe to prove a point. Maybe because I'm just annoying that way.
I did have a few problems. The first one was I didn't know Go, back then. I would argue I'll still have no idea of most stuff, but I know enough to be dangerous, which is good. I was pretty jet-lagged back then as I am right now. If I don't make sense, again, I'm very sorry. But I do understand a fair bit of how Vault works, and that gave me an advantage.
I understand both how Nomad and Vault works quite extensively. I'm also in the US. I'm in a conference. I'm surrounded by Vault engineers that I can annoy. Because I'm there, they can’t ignore it. They couldn't ignore me on Slack, I was literally next to them. It was like, “Chris, Chris, Chris...” He still loves me for some reason.
I knew that we already had a plug-in that was doing the same thing that I wanted—or a very similar thing—which was the Consul secrets engine we have in Nomad.
As a side note, the Consul and Nomad APIs for ACL, are—or were pretty similar—back in the day. It's funny because Consul was pretty basic. Nomad looked like a 1.5, then Consul redid the API, and it looked like a two—only imagine if they were the same actual software.
The fun part is that when we shipped the second Consul ACL API, I was literally the last one to touch that code. Guess who has two fingers—broke the new integration from Vault to Consul? If you have complaints about the documentation of that one, I'm very sorry, but I’m happy to explain the reasons why I documented things that way.
» What do I need to accomplish?
We said Vault gives you all the heavy lifting. So the system that maintains leases and TTLs—you already got that. Policies, you already got that. Syndication, you already got that. That's fine. It's part of the platform. It's another very good reason to choose Vault instead of writing a script to do the same thing.
It gives me a lot of heavy lifting that we already did and that we are potentially already leveraging with several products. I need to teach Vault how to create and renew a token in Nomad—and how to store the roles. These are the mappings between a path in that Vault API and the policy token in Nomad. And I need to have a Nomad client to interact with Nomad.
I had this idea on my mind. I started spelunking. To start, I instantiate a backend. This is the heavy lifting we have already done. You instantiate a structure that you will inherit things from. You're going to declare a number of paths that you're going to expose in the frontend.
It's always good to remember what you're working with. The only way to expose things outside of Vault is through a single interface—and that's the HTTP interface. That's a lie because there is another way, which is Cayman. But that is quite new and not particularly relevant to this. So let's go with that: The only way to expose things out of Vault is through an HTTP API.
So, here, I'm declaring what are going to be the frontend paths that I'm going to have present in that API when I mount this secret engine—when I add this to the API path. I'm going to declare what type of backend it is. It can be typelogical, which is for secrets—or typecredential, which is an authentication backend.
I think there are a couple more, but you will have to go to GoDoc, which is what I generally do. I'm storing a standard secret here. That is the base frame rails in terms of how to create your secret engine. Before I start writing the different logic in the API paths., I have to create a Nomad client that is inside the structure of the secret engine. That Nomad client is going to have configurations. Vault needs some sort of gold credential—or something in terms of storing and reading that token—to use to talk to Nomad.
There is a very important point around that. And you can see there is a read config access, in terms of how to handle the configuration. This is the description of how that config access endpoint is created. I'm storing three things—a URL for accessing Nomad, a token that I need to access Nomad, and an extra parameter, which I call lead scheme, which is the same as Consul—which is I'm using HTTPS or HTTP.
There is one extra thing that you should know, which is particularly relevant for Vault Enterprise and I'm trying to find it in my own code. I'm hoping this wasn't an old version of the code, but it might be.
In Vault, a lot of people using Vault Enterprise? In Vault Enterprise, you have this thing called Seal Wrap that allows you to translate whatever you are storing in Vault through an external cryptographic source, like an HSM. If you want to protect particular paths in that way, you have to instantiate a special path in the storage backend.
I'm not sure if I have a code example. If anyone has particular doubts on how to do it, they can go to GitHub and read this code, or they can ask me. I can show you exactly where to put it. Just be assured that whatever you store in Vault, this is quite sensitive. This is a gold credential that will allow you to create further tokens. Make sure it is protected to the maximum level of security.
So I exposed that function. Then I wrote a function that connects that API with the storage entry where I'm going to store the configuration. You can see here—and this is one of the most beautiful things out of it—do you see any cryptographic function being called or anything? It's that storage object there—that storage, it's doing all the heavy lifting for you. I'm literally reading the adjacent document from the storage backend. It could be many, which as you also know, it should be Consul. But all that heavy lifting was just handled by the internal structures involved, which was fantastic to me.
So let's recap. I have a path to sort a configuration. I have a client. That client knows how to read the configuration. Now I need to do some heavy lifting.
» A look at the logic
I found a secret token function that gets called when someone asks for a token and chooses specific parameters. What to do with renew? In my case, it's create new, because Nomad doesn't have the TTLs on tokens, so it does have the concept of renew in a token. Every time someone hits renew, I'm creating a new one. Or I'm just renewing the lease in Vault. That operation is transparent to Nomad. And I have a revoke—literally, “Delete the secret.”
As you can see, mostly logic here. When I hit it a renew, I'm updating the lease in the storage, saying, “Give the secret more time.” When I do a revoke, I'm calling the Nomad ACL. That C object there is in the Nomad client to delete a particular token based on a value that I have stored on the list.
We are also creating different paths in the API. Remember the example I showed you—I can show it to you again. I did a read to
nomad/creds/admin. Admin is that path in the API—it’s linked to a specific policy in Nomad. But this application I just showed you—which is running on the same cluster—is looking for a different path, which is
nomad/creds/dispatch. These different paths generate tokens that have different TTLs, and they are aligned to different policies.
For example, the admin tokens are the ones I potentially use to deploy a job manually. That dispatch token only allows this application to run an instance of a particular job with a certain set of parameters. To enable that logic, I had to store these roles–this mapping in the API. I had to teach Vault how to create logic around it; where to store it, the value, the parameters, and the types of token. Literally, store a JSON entry with that information.
This is probably the meat of it. Create a secret that calls to a particular operation in the API–which is the
path/read. When I do that
path/read, I'm going to do the actual Nomad logic of creating a secret. I'm generating a random name for the token—you see that C object there? Remember I told you, C is the Nomad client? I'm just calling that Nomad client—simple.
As you see, I haven't put any of the logic there—that makes Vault great. Vault is doing that on its own. When I call
path/token/read, it's going to do the logic, create a lease, store—in this case, return—the token, but only store the accessor to the token. In Nomad, you have tokens, and you have access to those tokens that only handle the administrative operations on a token.
» The best secret is the one you don't know about
This is important. You seriously know what the best secret is? It's written there. It's the one you don't know about. So when I did this first iteration of my code, I was storing everything in the backend. I was storing the secret on the accessor. Then I had this comment from Chris, who asked me, “Why are you keeping this secret ID? You don't need it. I can revoke a token without having the tokens with the accessor.” So it was pretty much like, “Now I get it!”
Consul back then didn't have that, and I was basing the logic in Consul. Consul just had tokens. It didn't have accessors, so you had to store the token. Here's the beauty of it. In most of the database plug-ins—most of the secret engines—Vault is not even storing your secrets. This is probably one of the coolest things, because when your InfoSec department starts throwing wheelbarrows of theory at you—in terms of where you can store or not store the secrets—you can answer them, “We are not storing any secrets in Vault.” Hopefully, they'll go like that.
» Let's talk about the plug-in
I said plug-in when I started this. So far, I showed you the logic. But if you're running your own plug-in for whatever reason—if you have created your own plug-in or your own logic into Vault—you probably don't want it merging into Vault, or maintaining a fork. That's why we create a plug-in interface. You can go and grab that logic, put in a separate deliverable—something that you maintain, something that is at no point tied to the Vault mainline—you interface to it. That's what we need.
We created a plug insert system for these plugins to be loaded. I want to say at runtime—I’m 99% sure that this is at runtime. When Vault comes up, it's going to read the mount table, and it's going to load the extra binary sync to memory and extend that functionality.
But if you go and operate Vault, lights keep blinking. You don't need to maintain your fork if you have your own logic. You maintain it completely separately. You have to add a little bit of wrapping to your logic code. I can’t remember exactly from which secret engine I took the example, but you can pretty much copy and paste this on the outside.
Just take this code, copy and paste it, and then create your function structure in Vault.
However you paste, this will tie up with your backend. This is going to do the heavy lifting in terms of the Mutual TLS with Vault, establishing RPC connection. From there, it's going to instantiate your backend, that will bring up your logic.
» Lessons learned
Go is awesome
I know this is controversial, but error handling in Go is the greatest thing that ever happened to me—seriously. I'm used to Python or Ruby—remember I know the developer. Having to manage exceptions is very painful. It was quite refreshing to go
if err == nil in the code, that's cool—keep going.
Most of the stuff while I was writing this, it looked magical, it felt magical. For someone that hadn't written a single line of Go, it was like, “I'm going to get a compile failure any second now, any second, any second,” and the thing built. I guess it works. But it worked. It was interesting. If you're running Vault—and I'm generally more on the run side of Vault—always remember that the best-kept secret is the one you don't know about.
The best-kept secret is the one you don't know about
I know a lot of people will put Vault here, and then they don't want to break their workload. Maybe they'll put Ansible here, and Ansible will pull secrets from Vault and throw them there. Then Ansible knows a secret as well.
Same thing with Terraform. A lot of people will use Terraforming data, in a sort of runtime-y way, where Terraform is not my infrastructure as code, it's also my deployment tool. It pipes the process or whatever.
Terraform can and absolutely should consume secrets from Vault. It should consume secrets from Vault that die within that Terraform runtime because if they are disclosing logs—if they are disclosing the state, guess what? When Terraform starts up, it's going to create a child token of whatever token it got, and then at the end of a run, it's going to revoke that token.
So every dynamic secret created by Terraform is going to be revoked as well. Think on that. Don't pull secrets from Vault using Terraform that shouldn't be on Terraform—that Terraform doesn't specifically need.
There are a ton of alternatives. The Vault agent now has a templating engine, so use Terraform to deploy the Vault agent—and the Vault agent will write your templates. The neatest way to do this is to have your application consume from the API. I know it's generally not possible.
Attach all your functions to the backend of your plug-in
Don't leave them hanging in your Go package. Always call back to the backend—you have only one object to instantiate.
30% error handling, 60% mapping interfaces and 10% writing the logic
If I had a pound, British pound for each
if err = nill for error handling—maybe 30% of my code! The beauty of it is 60% is just mapping interface that already exists in Vault. How much effort? It's probably 10%. It's fantastic.
GoDoc is your friend
I keep saying this to the team that writes Vault that the communication and GoDoc is absolutely fantastic. You can write plug-ins in different languages or, so I'd been told. I haven't seen it, but it is a gRPC interface. Technically you can. I've never seen it happen.
Document what you do
At HashiCorp, we also maintain the website. Every time we merge something into Vault, merge documentation on it. Both the API documentation and the basic product orientation that you see. On top of that, we try to write a guide that explains how it works. The PKI backend, for instance, has 20-30 parameters. You probably need examples. Those are on learn.hashicorp.com.
If you're writing a plug-in and you're publishing it, please write documentation. Try to mimic our structure. Just show how it's configured on a very basic level, and then put an API reference so people can understand the power of your plug-in without having to go to the code and read it.
If you need more reference, the full code is there (repo). At the time of merge, it was written up to best practices. If you can track the PR historically, you will see that there are at least 80 comments, and that was my first code review.
We are hard when it comes to code reviews. But on my second one, it was easier. On my third one, I only had one. The full code is in the repository—it’s up to best practices. Either way, just a secret. It took me a day to write a plug-in. It took me three to write the tests—whole different animal.
If you want to see the flip side of that, there is a deep dive that Joel did on the Vault AWS Auth backend because when we shipped it, he had a better idea on how to do workflow—and he contributed, so it was good. There is a talk about how the GCP authentication plug-inwas made, also very recommended. And I'm right on time. So with that, I'm going to be around. If you want to ask questions, just find me, or you have my GitHub, you can find me on freenode. Thank you very much for listening.