Vault is a cornerstone of Hootsuite's microservices, CI/CD, virtual instances, and processes. Learn how they adopted it and see if the tools and best practices they created would work for you.
At Hootsuite, Vault is the basis of identity-based security for people and systems alike. It's used as the gatekeeper for access to the SSH bastion, production-readiness certification, AWS resources, database credentials, role-specific TLS certificates, and old-fashioned static secrets.
It was a journey to achieve to their current aptitude with Vault, requiring new tooling, tradeoffs, and a hard look at user experience. Hootsuite engineer James Atwill shares this journey along with the tips and tooling Hootsuite acquired along the way. He'll share their successes as well as their mistakes, along with a look forward to where they are headed in the next year.
Like Rosemary said, my name's James. I am a staff developer in the operations department at Hootsuite. For those of you that have never heard about Hootsuite, our mission statement is to champion the power of human connection. What that means right now is we are a social media management platform—the most widely used one currently, with over 16 million customers, I'm told. We've got a thousand employees across the planet.
I'm going to talk about four things. The first thing I'm going to talk about is nothing about the deployment of Vault. Then we're going to time travel to Hootsuite of the past, which was a scary place. Then we'll come back to the present, and we'll talk about how we use Vault for people, and what we do at Hootsuite for Vault with our services.
A lot of this stems from questions I've seen on the Google group—people chatting in Slack, questions I've had from people individually. I'll post these slides up in the Slack group as PDFs when I'm done.
I'm not going to talk anything about deployment. If you have not looked at Ansible Playbooks—or Terraform resources, or anything else like that in the past eight months, I encourage you to do so.
The documentation for deploying Vault is much better than it was a year ago—two years ago—and there are lots of good examples for Ansible Playbooks—probably for Chef as well—that you can copy and paste, and do your deployments. There's good information out there for hardening Vault and everything else like that.
We are going to travel back to a scary time for Hootsuite—a time before Vault. I was only a senior developer at the time—we had a different logo. This time is a mishmash of times. It's anywhere from a year to two years ago. It was a scary time for security. There are lots of scary things that I could show you. Fortunately, I can't show you any of these scary things with time travel because it would break the timeline—so I have a bulleted list.
One thing we had—we're based on AWS—is we had separate accounts for all our developers. They had static AWS keys. Yeah, that's scary—I know. We had unreliable coffee machines—yeah, also scary. We had our Ansible Playbooks in Subversion—inside of it, we had SSH keys that were hardcoded for doing various forms of deployment.
We had AWS keys that you could easily search for and use—if that was your thing. We had database secrets that were also encoded in various playbooks. All the deployment servers had public SSH keys and, as a developer, when You’d generate a key pair, you'd hand your public key to your coworker; they’d put in your key, and that’s how you would access more and more servers—and that's how things worked. It was really scary.
We used various forms of encryption for all of these things. Most of it boiled down to Ansible Vault for Ansible—and there was one password for Ansible Vault. Once you knew that password, you had access to everything.
Also at the time, apparently we had no tea—there was just coffee.
This is not good for a number of reasons. Number one; it's not good for caffeine. Number two; it's a scary place. Because it's a scary place, I don't want to spend too much more time here. We're going to travel back to the present. That's better—we're back to the present, where everything is nicer.
When we decided to look at Vault—we were like, "We need to have a plan. We need to have a goal." Vault—if you've never experienced it before—is a bit mind-bendy—it's a bit mind-twisty; it can be hard for people to understand how the dynamic has changed when it comes to Vault.
We decided the first thing we're going to do for people who are using Vault—we're going to put it right in their face, so they have to use it every day, and get more comfortable with it.
We came up with a goal. That goal was people come in in the morning, they log in to Vault, and they get a token that they can use all day long, to do whatever they want to do. It's an audacious goal from where we were two years ago—but let's talk about how we made this happen.
The first thing is our technical staff are running Macs. The onboarding documentation that we had told them, "Welcome to Hootsuite, please install Homebrew, and follow this other guide." This built on our onboarding process for developers, it built on having a common platform for imaging machines, and it made getting things on to laptops much easier.
One of the tools that we asked them to install was our internal Homebrew repository, and from there, we said, "Hey, there's a package called
hs-opskit," because it's Hootsuite and everything starts with HS. OpsKit brought in a bunch of tools; it set up some configuration, it brought in some files, and the onboarding documentation would say, "Run these things. Don't worry if you don't understand why you're running them. Just run them."
One problem we ran into with this is developers would look at these instructions and be like, "I don't think I need to run this. I'm going to write my own getting started guide in the Wiki." So, for a while, developers would come on and be like, "Well, I know better than everyone else. I'm going to have my own getting-started guide."
One of the problems we had to eventually deal with was calling out people's getting started guides that weren't quite right. You may run into this as well if you do something similar.
When we initially did this, we didn't have the shell setup, either—it adds some things into your batch RC, or batch profile, or dot SHRC file. It adds some functions that make them available in your shell.
When you do an installation of what we have as OpsKit, you get all of these tools. You get Vault, you get JQ, you get a bunch of Kubernetes things—you have the AWS command line, you've got a reasonable version of
sed that supports
-i, and a bunch of other tools that are handy to have when you're writing shell scripts, or you're deploying bits of code. You know every developer has these things installed for them.
I'm going to talk to you about two little other subcommands that were handy. One was help (
$ opskit help). It opened up a Wiki page, so as a person that has no idea what to do, you do OpsKit help. It brings you to a page that is always being updated—always has the latest information. People struggling with whatever crazy problems they have—that's where they go and can find out what's going on.
The other thing I mentioned earlier was the setup shell. This was a great way for us to wiggle into people's shells, and get things added that we happened to want. I've got a little code snippet here of what we did, but it's an idempotent way to add an extra shell script in to your shell startup.
One of the things that is in people's paths is the
vaultlogin function. This function wipes out your current Vault token and runs a setup script, which takes care of four things. The first thing is it makes sure that you have the OSX keychain helper installed.
It's a special version that we use that supports multiple environments, but instead of having a
.tokens file sitting in your home directory, it puts everything in Mac OS keychain. It also makes sure that you are pointing to the right Vault address. At Hootsuite, we have three separate Vault environments—and I'll talk about that later.
It makes sure that your current token is either valid or not valid—if it's valid, it says, "Hey, you're already good to go. You have nothing to do. "If it's not valid, then it bootstraps in a
vaultlogin for you, using LDAP.
Some things here that are missing—that are important to realize. Number one is that your user ID—that we have for people's Macs—happens to match the user ID that's in Active Directory. It's also the same user ID that we have for people's emails. It makes automation so much simpler because you know that there's going to be matching all the way through.
vaultlogin . There are a few things that happen behind the scenes. The first thing is that all the mapping for LDAP is done to policies with Active Directory groups. The process for getting added to different Active Directory groups is documented in our Wiki.
Our IT department takes care of approval processes—making sure everything is fine—and they also make sure that other people who are in that group are still supposed to be in that group over time. Whenever there's a group membership change, it goes to the audit log; the audit log gets reviewed by the security department. There's monitoring; there's auditing, there's the recording of all of these changes.
As far as what membership you get inside Vault—it's a standard Vault policy mapping that you can find on the main Wiki page. You type
vaultlogin; it reminds you that you need to MFA. You type in your password—we use Duo. You say yes for Duo, and bang, you're in.
MFA was important because we have a requirement that any authentication that happens is going to have multiple factors,—this makes sure it happens. That said, this predates a lot of the OIDC work that's been going on. We're looking—in the first half of next year—to switch to OIDC (new methods for minting OIDC tokens coming in Vault 1.2) for doing some of this.
There are two policies. One is called
vault-operators, the other one's
vault-operators-latch. This is actually two policies that I have on my Vault token when I log in. The first one gives me access to special things. The second thing takes that access away. I am enhanced with superpowers—but then I have kryptonite, and I can't use any of those superpowers.
We have another function that's embedded in your shell, and that's
vaultsudo makes another token, but it strips away all the policies except for—in this case—Vault operators. This gives me access to do the things that I need to do—gives me a one hour token with these superpowers in one specific shell, where I'm doing the things I need. Once I've done whatever it is I need to do, I can do
vaultshed, and I shed my token away.
I'm now logged in, and I have a way of accessing things that I may not have been able to access before. Let's talk about things I can do with this token. The first thing is SSH. We have a tool called
hs-ssh-client, and it does some pretty simple stuff.
It creates a public-private key pair. It creates a unique one for each person. Initially, we were using people's existing key pairs, but we found that some people had them encrypted, some people had them pass phrase protected. Some people had them hardware passphrase protected. Some people had GPG going on. It was tedious.
Now we dynamically create a key pair, and we ship the public key to a Lambda that we have. This Lambda's job is singular in that it takes a look at who you are, and will then sign your SSH key with whatever access you're supposed to have.
The SSH key signing is textbook SSH key signing. It's what you see in the Vault documentation for how to do this. The access request system is something we've built ourselves. It's built into JIRA. It integrates with the rest of our tooling. From there, you get a set of things that are allowed to access.
One of the things that I mentioned is that, somehow, this Lambda knows who I am. This Lambda has access to doing three specific things inside of Vault. The first is signing SSH keys. The second is looking up Vault tokens, and then looking up entity information. We've crafted this hacky, cool way of managing identity.
As a client, I create this Vault token, and I strip away all of the policies. I strip away everything. I don't even have the default policy, and it winds up becoming this identity token. We wrap it and ship that token to the server.
The server's job is to unwrap it—looks up who I am, and pulls out from there my identity information. I'm able to leverage my Vault token as my identity for accessing SSH. What happens now is if you want to access SSH somewhere, you log in to Vault—from there you can use HS SSH client, and away you go.
That was SSH taken care of. Now, before I move on, I want to say that in reality, we have a Bastion server sitting in the front there. You need to go through Bastion—everything that's there is logged, and audited; it's pretty cool. That's SSH.
Similar to SSH, we have
hs-iam-tool is a goal-based tool that we've written. It's a simple tool, but it leverages conventions that we've already put in place. It knows the host address for our Vault servers. It knows where to find various Vault credentials. It knows the paths inside Vault for accessing things.
It knows what to do because everything has been set up already. It takes care of writing credentials that are fetched from Vault, and writing them into your AWS credentials file. That's its main purpose in life.
We use STS AssumeRole for fetching AWS credentials, which means that the creds you get back are good for about an hour—then they expire, and you need to get new credentials. It has four main subcommands—it has five, but we'll pretend it has four.
The docs one works similar to help I was mentioning earlier, so if I run
hs-iam-tool docs, it launches me into the Wiki at a specific page. One thing that's worth noting here is that if you're going to do AssumeRole like we did, there's no good way to revoke AssumeRole tokens.
They're good for the whole hour, and either you need to revoke everybody's tokens or you have to wait for the hour to expire. If you find that's not true, come and tell me, but I've talked to our AWS TAM, and they were like, "Yeah, this is fine. You should be able to revoke it." Then they were like, "Oh no, you can't revoke it."
When you run this tool that we wrote, it's pretty simple. It hits up Vault, and fetches the credentials, and writes them in your AWS credentials. Now, there are a few extra flags there. For example, I can say, "Hey, I want to access the read-only credentials, but I want that to be my default AWS profile." It will fetch these keys, write them into your default credentials file, and you're able to do AWS from the command line.
The sync is pretty handy because it keeps track of when that lease is going to expire. You can run that as much as you want, and before it expires, it's a no-op. Then, when it's about five minutes from expiring, it'll go and fetch new credentials for you.
It came to a point where you can now pass in a Vault role, and say, "Hey, I'm a shell script that does AWS. I need to access things. I'm going to call sync on whatever role I'm supposed to be using." Now we have a standard way for people to access AWS with these short-lived credentials. It's been super handy.
We built in autocompletion—one of the things, when it comes to security tools, is usability—it’s super important. Often usability and security fight each other—you want things to be secure, but you want them to be usable; you don't want to tell people too much, but you want to tell them enough; you want to be able to debug it, but debugging it would give away information.
If you've ever tried to read a Vault log file and figure out what's going on from all the MD5sums, you know exactly what I'm talking about.
Once you have those AWS credentials, I can call AWS—get caller ID. It can see that I have an LDAP login—it will have my name in it. The user ID is, unfortunately, a long string. It's humorous when you're using the web console, because that user ID shows up in the web console, especially if you're doing support tickets back and forth.
That becomes your full name, which it tries to render but doesn't do a great job. It means that ops people can fetch temporary credentials, and like SSH, it's managed through AD. Like SSH, it has the same policies, the same procedures for getting added to new AD groups.
Starting in Vault 1.1.something, you can no longer take an existing token and enhance it with more policies. You have to log out and log back in again. It means that if you do get added or removed from different AD groups, you'll need to log out and log in again—word of warning.
The IAMs that are used here—they're all Terraformed, they're all managed. There are instructions for our ops help department. If people want to create new groups or new roles to access, they can do it all without me—which is good because then I can come here.
The last thing I want to talk about is how we bootstrap into the web console. Using our local credentials, we can use federation, and bootstrap you into a web console. I'm not going to go into details. I have a link here, and I'll push these slides out as PDFs into Slack later—but this shows you how to do it with Python. If you can read Python, you can translate it to any other language.
You're able to use your local credentials and open up a web console from there. Again, your web console is good for the hour that your original token is good for— if you need to refresh, you can refresh.
We wind up adding a print flag to this instead of booting you into the console because we found some people like to use the console in Firefox, but they use Chrome for everything else, or some people use Incognito mode for accessing AWS web console. They have their special ways—this gives people the way of doing it themselves.
Web will also do a sync. We encourage developers—ops people—to write little one-liners, that wrap this. You can say like, "AWS read-only,” you hit enter, and up comes a browser where you need to be—and your credentials are going to expire.
### “List” for autocomplete
Last thing I want to mention is
List is what we use for autocomplete. We do a list of AWS roles in Vault, get a huge list of them, and then send them back to Vault—and we're like, "Hey, which of these can I use?" Then Vault sends them back, and we show whatever list you're allowed to use. Like I said, it's for autocompletion; it's for debugging, it's for tracing how things are going. Makes it much more usable.
So, we use EKS for our Kubernetes clusters. EKS is Amazon's Kubernetes offering, and it's based on IAM. We leverage
hs-iam tool to get IAM credentials for accessing Kubernetes. We've written a small wrapper around the original AWS IAM authenticator—all it does is a sync.
It syncs, it fetches some credentials that you're supposed to have and then calls
kubectl for you. It says, "I'm supposed to be the production admin. I am connecting to the production cluster." This is what you see in people's kube/configs. We have another tool called
hs-kubeconfig, which will spew all this out for you. You don't have to worry about it.
As an end-user, once I logged into Vault in the morning, I have my Vault token. When I run Gitpods, the first thing it will do is do a sync of my AWS credentials—writes it out to the correct place. Then all of that gets wrapped with an AWS IAM authenticator.
SSH, AWS, and Kubernetes—all now with short-lived tokens, all relatively easy, all with tooling. It makes our life a lot better—but not perfect. One of the issues that we've run into is—we have multiple Vault environments—and that's not something that a lot of people have, I'm learning. It's really confusing for developers.
In Hootsuite we have a dev, a staging, and a production environment, so it makes sense that when we're first setting up Vault, you want to have a dev, staging, and production Vault, right? Except that doesn't make sense in real life, because you don't need all those things.
As a developer, it gets super confusing. I want to interact with staging. I would log in to staging Vault. But you don't log in to staging Vault; you log into production Vault. Oh, but if I need dev access to the dev Kubernetes cluster, then I should log in to dev Vault. No, you log in to production Vault.
A lot of the tooling ignores things like Vault adder, and Vault token, and says, "Hey, I know best. Trust me on this," and there are flags to override it, to use environment variables. It's been a little bit of a learning process for people.
Another problem we ran into—that I was surprised about—was getting people to update the software that's on their laptops with Brew, because we push out fixes for
hs-iam tool, for the SSH client, for other tooling, and we tell everyone, "Hey, you should all update. It's got new features. It's got bug fixes."
Of course, no one updates. There's no good way to enforce people to upgrade. We've been talking about different ways to let people know—to have the tooling check new versions, and be like, "Hey, using old version. I'm not going to run pretty soon," stuff like that.
One thing that we found is even though we've asked people to upgrade, the first thing we ask people to do when they hop into the ops help channel is we say, "Hey, what version did you install? Well, you should install a new version."
Interestingly enough, though, that doesn't always work, because there are people who've decided, "Well, you know what? I don't trust OpsKit. I'm going to install my own version of Vault. I'm going to install my own version of kubectl." Because no one should say “kube cuddle.”
What winds up happening is you say, "Hey, what's the Brew version of Kubernetes CLI?" "Oh, it's 1.12.something." Oh, that's great, but then why is in it when I run kubectl, it says 1.9?
We've learned that some people have curled that into their bin path, and been like, "Hey, I know what's going on." So you wind up with all these troubleshooting issues from people who are doing weird things with updating.
Another issue that we've run into is that people access our servers through VPNs. Sometimes our security groups are set to work great from the office, but not so great from the VPN. If you're doing something like this, make sure that you can hop on your VPN—make sure you can access everything from your VPN. Make sure as you're fiddling with security groups, that you test them out on your VPN—if you are like us.
Communication is huge. We had an old mechanism for accessing the web console that didn't work from the command line for AWS. We wrote
hs-iam tool, we patched it all up, and we're like, "Hey, everyone. We have a way of doing this now that works from the command line—that works from the web, it’s great! The tokens are short-lived. It's much more secure, and you get access to things that maybe you didn't have access to before. You should try it out. By the way, in two or three months, we're going to get rid of the old way." We had two or three people that tried it.
"Hey, by the way, everyone, a month from now, we're going to get rid of the old way. You should switch." Then, "Hey everyone, tomorrow we're getting rid of the old way." Yeah, everyone knows what happens. Tomorrow comes by, we get rid of it, and everyone's like, "What happened? Why did no one tell me?"
So, I don't know—to me personally—I would have shortened that window to two weeks, and been like, "Hey, by the way, next Friday we're getting rid of this, so you better get on this." I think then people at least take it seriously.
There was some pushback from people when we made the change to SSH and made the change to the IAM tool. One of the things—when we did this cutover—is we kept people's access the same. We didn't restrict access to things. It was a different way of accessing things that they already were supposed to have resources to.
The last thing I want to talk about is troubleshooting. We use a tool called Twinery, which is a choose-your-own-adventure, web-based tool. Anyone here ever used Twinery before? Yeah, I didn't think so. There's one person that we recently hired at Hootsuite that was like, "Dude, that's Twinery! I used to play a game that was like that. That's cool."
Twinery's a choose-your-own-adventure-book. We leveraged it for doing troubleshooting. You launch to a certain page, and you follow the guide—which is a troubleshooting guide—which says, "Is your VPN up? Yes/No. Are you able to ping the server? Yes/No. Have you tried this? Yes/No." Somewhere there's a Guru that eats you if you go the wrong way. We worked through—have you installed the right versions, have you done all the right tooling, et cetera, et cetera.
It's helped us because we can point people at this troubleshooting guide, and if they don't come back, that's a good thing—unlike an adventure. The one that's here—we use for how you use Vault inside your services.
We initially had this huge Wiki page that was like, "So, you're on Kubernetes, and you want to get AWS tokens? Well, here's how you do this. Oh, you've got an EC2 server, and you want to access your database credentials. Oh." and it was this huge wall of text, and I don't know about you, but whenever I see a huge wall of text, I scroll through it, and I'm like, "Stack overflow maybe?"
We turned this huge wall of text into, "What are you? Are you a Jenkins job? Are you trying to SSH somewhere? What is it you're trying to do? Here's what you need to do; here's how you copy and paste things and put things together."
So, with service accounts, we had some similar goals to people. For services, we want a really low barrier of entry. We decided if you can read a file, you can use Vault. We wanted it to be language agnostic. Didn't matter if you were writing in Go—if you were writing in Ruby, Python, Scala, whatever. We wanted to work across all languages.
We wanted it to work in our EC2 instances. We want it to work for our Kubernetes. We have CI jobs that need to access resources. We wanted them to be able to do it, as well. We wanted our devs to be able to access these things—and we wanted to leverage all the existing infrastructure that we put up, so everything flowed if it needed to.
Leveraging convention over configuration—we wrote a tool called the Vault control tool. Takes care of four steps. Number one, it takes care of authentication. If you're on Kubernetes, it uses your service account. If you're on EC2, it uses your AMI EC2 metadata to authenticate.
It'll also take care of authenticating you with a hardcoded token if that's your thing. It'll fetch secrets on your behalf, and write them out to files, and it'll manage any leases that you have ongoing—Your AWS leases; it'll keep them fresh—your SSH key; keeps them fresh.
If you're doing PKI, it'll rotate your certificates periodically. And when you kill off the Vault control tool, it does its best to try to revoke all of your leases, and revoke your token, and get rid of everything for you—to try to keep everything as secure as possible.
This predates the Vault Agent work that's been going on, and we're hoping to nudge Vault Agent in the direction that we want, so we can get rid of this, and just use Vault Agent.
It has three main modes for doing authentication. The hardcoded mode that we have—passing in a Vault token into this tool is good for developers on laptops. This is where you wind up using the dev environment for Vault. You use your token to get into dev Vault, and then you can use secrets there.
Our dev Vault environment, developers have a lot more access. They can do things that they probably should never do. Luckily for us, they don't do too much on there, other than write secrets. But they're able to read secrets that they weren't otherwise able to read.
Another thing about this is it's Go binary. It takes care of all the heavy lifting. There's no other thing you need installed. You don't need Vault installed. You don't need JQ installed. Because it's Go, we push the binary up into our Artifactory server, and we can pull it down through Brew for Macs—and pull it down through curl for other environments.
I'm going to say—when it comes to Kubernetes—we made one little mistake that we're currently running into. If you are doing service account authentication, and you're naming your roles inside Vault, one thing you're probably doing is you're like, "What do we name our role?" Well, my service is called foo, so I'm going to call the role itself, I'm going to call it foo, as well.
We did that—and if you're going to do that—I would say you should put the namespace in the role name. Don't call it foo, call it default-foo, or whatever it is. Kubesystemfoo. Whatever you've got going on. It'll make your life a lot better. People started to make other namespaces, with similarly named services, because it's a testing namespace. Now they can't define different credentials without becoming much more messy.
The Vault control tool reads in YAML file because it's YAML. From there, you can fetch AWS credentials—you can fetch SSH certificates. It can template through—using Go templates—an existing config file.
If you were previously taking from an Ansible Playbook, and you had all your secrets in Ansible, you can pull them out, put in a Go template, and use the Vault control tool to sub in the secrets that you normally kept in Ansible. And the static secrets—you keep them inside of Vault instead.
If you are the kind of tool that is Vault-aware—maybe you're using the Transit backend—maybe you are using things that are not readily available otherwise—you can pull out the token, and do Vault yourself.
One of the interesting things we've done is our Jenkins jobs all authenticate against Vault, as well. We have two Jenkins servers. We have one that's currently running on Mesos, and we have a new one that's running on Kubernetes.
Now, on Kubernetes, Jenkins uses its service account to authenticate to Vault, and the jobs themselves can bootstrap from using the token that's available, to authenticate themselves as an individual job. From there, you can have jobs that access SSH keys to go certain places for their deployment—or AWS keys to move things into S3 buckets. Whatever it is you need.
The identity system is hokey—by which I mean—as a job, I say, "Hey, you know what? I am this job, so trust me when I tell you I'm this job." There's no enforcement of identity right now, but by breaking this down, and having the individual using Vault stanzasthere, we're able to leverage existing infrastructure—for fetching secrets. Hopefully, in the future, we'll do better identity management, but for now, we have this in place.
The last part was leases and termination. In Kubernetes, you have a sidecar running, because who doesn't have a sidecar running? This takes care of keeping anything that needs to be refreshed, fresh. On EC2 servers, we have a cron job running, and when you shut down, we have Felix systemd—we have after, final target, one shot, exec—of tearing everything down.
Whenever you have a great tool that does everything, you're like, "Well, when don't I use this tool?" What we've told people is if you are an EC2 server, and all you need are AWS resources— and you can do IAM, and you can do this all by yourself, then don't use the Vault control tool. Go direct. Similarly, if you are a Kubernetes service, and all you need are Kubernetes secrets; go direct, as well.
We have a bit of a question mark when it comes to ACM because certificate management’s messy. ACM won't release private keys—Vault; you don't want to put your private key anywhere, so you have to choose where that's coming from. The rule of thumb we've come up with is if you can go native, go native. Otherwise, do
So, Hootsuite of today. Hootsuite of today, we have
vaultlogin for people. Developers are using short-lived AWS credentials. They use the same credentials for the command line and the web.
We have multiple, redundant coffee machines, on multiple floors. I have never gone without coffee. We have dynamic, short-lived SSH keys, for people that are going places, and SSHing into different servers. Similarly, I didn't talk about this—because I only have so much time—we have database credentials that are automatically created for people that need to go into database servers.
For services, we did a huge sweep, and we had to get rid of SSH public keys—we had to get rid of AWS keys that were everywhere. We have a bot called chaos bot, that if a key isn't used in 90 days, we go and delete your key. We've cleaned up a lot of things, and we have eight flavors of tea.
In summary—like one of the previous talks I went to—incremental improvements are key. Slow, incremental improvements to get where you're going. Tooling is also important. Make it easier for people to do the things that they need to do.
Lastly, going back in time can be scary. So, I would be remiss not to mention that we're hiring, and if working at a place like this seems interesting to you—that can make these kinds of changes—in probably about 18 months, then talk to me.
A Leadership Guide to Multi-Cloud Success for the Department of Defense
A Leadership Guide to Multi-Cloud Success for Federal Agencies
Database Provisioning Evolution at GoPay with Terraform and Ansible
Secrets Management at Swiss Federal Railways (SBB) with HashiCorp Vault