HashiConf 2018 Keynote: Vault Advisor
Nov 14, 2018
HashiCorp co-founder and CTO Armon Dadgar previews a new utility for Vault currently in development called Vault Advisor.
Founder & Co-CTO, HashiCorp
A group internally at HashiCorp that we’ve talked about a few times before is the HashiCorp Research group. Internally, it’s an industrial research lab, looking 12-plus months out ahead of our main product group. It’s the frontier of problems.
One of the problems that’s interesting for us is security configuration. I think a perfect example of this is: Imagine a perfect, bug-free firewall. How good is this firewall if it’s been misconfigured to allow all traffic? This highlights an interesting challenge with security software, which is there’s this challenge of implementing them correctly.
Then there’s this separate challenge of how do we make sure they’re configured correctly? They’re no better than their configuration. This problem applies not just to firewalls; it applies to all security software, including Vault.
As we talk about Vault, it necessarily has to be configured with: How do clients authenticate? What are clients authorized to be able to do? This must be configured, by definition, in advance of the client. The client can’t do anything until the system’s been configured to allow the client in. Typically, you have a security administrator or a Vault operator who’s doing this configuration. When we talk about what we have to do, as a security administrator, I have to think about: What is this application going to need or what does this set of applications need? How do I authenticate it? How do I authorize it in advance of it showing up? And so I’m going to make some best guesses. I might say I have a web app, an API, and a user service and all of them need access to the database credentials and email credentials and API keys.
I’m just going to write the policies that authenticate and authorize those servers to do that. So there’s some guesswork that’s going into this. In practice, though, it might be that we’ve guessed wrong. This is sort of inevitable, because if we guess in too restrictive of a way, if we give too little capabilities, by definition, we break the app. If the app needed access to a credential and we didn’t give it access, it’s not going to work. If we gave it access to a credential that didn’t need, it’s going to be fine. We erred on the side of giving it too much privilege and over-provision, but that’s OK. This is what you tend to see in practice. Most of these systems are over-provisioned, and over time we never really remove permissions. We keep adding things, and so we get more and more over-provisioned. But “If it ain’t broke, don’t fix it” tends to be the ruling mentality. So we live with this.
The challenge of living with this is that it represents unnecessary risk. All of these permissions that are granted but not actually used are adding risk into our configuration. What makes it unnecessary is that the application never needed it. If our web server was compromised and now it leaked out API keys, well that was a risk we never had to take. It was an unnecessary one because the app didn’t need it. Leaking the database credential was a necessary risk; the application needed that credential. There’s no way around having that risk. What we’d like to do, the ideal world, is to never authorize it to begin with, to have it the most restricted as possible, following the principle of least privilege. This would be the least privileged configuration of the system. It gets into this interesting question of: How do you configure these systems? What are the policies that would give you that? What’s the perfect policy?
What you find is there actually is no such thing. There is no perfect policy. Like everything in life, there are tradeoffs. On one side, we have our complex but low-risk policy. This might be: I write a policy unique for every single client of the system that perfectly and explicitly says what you have access to. This is a very low-risk policy, but it would be a nightmare to administer. On the other hand, I have a very simple policy. I can say everyone in my system is root. It’s extremely simple, extremely high-risk. It’s a terrible idea. What you find in real life is something in the middle. There is: “What’s the practical policy that has an acceptable level of risk and acceptable level of complexity?” It sits somewhere in the middle of this spectrum. So how do we find these middle policies? That is exactly what we’ve been looking at solving with a new project that we’re calling Vault Advisor.
The idea behind Vault Advisor is: How do we start by just consuming what is happening? We’ll tail off of Vault’s audit log and observe in the real world what clients, what users, what applications are consuming what set of credentials. We can observe real-world behavioral pattern. Then what we want to be able to do is create a diff. How do we generate a policy diff between, “Here’s how the system has been configured versus the way it’s being used in real life.” What we’re trying to do is form a closed feedback loop.
We want to connect back to our operator who’s configuring the system and say, “Hey, here’s a recommended set of changes.” And this is a very explicit design decision. We want this to be a human-in-the-loop system because these systems are mission-critical. Much against the marketing hype, AI is not quite there yet. We don’t trust the systems to go off and make these changes on their own. So how do we keep someone in the loop to be able to eyeball and say, “Does this change make sense? Is this a valid configuration for Vault?” And if so, great; we can adopt it and update the configuration of the system.
The goal is, as an operator, we can come in in the morning, log in to a dashboard and say, “What’s a recommended set of changes for me to make?” What we’d like the system to be able to do is explore the policy tradeoffs and find the Goldilocks policy. If we look at this configuration of the system, there are many ways we could configure it. One way would be: We’re going to write a policy per client, per web server API and user service. We’ll have 3 different policies and configure it that way. A different way would be to go vertical and say, “There’s a policy for database, email, and a separate policy for API keys.” Or you could say, “I’m going to have one policy that covers all of these things.”
So there are many ways to start slicing and dicing the same problem. But what you might want to say is, “Maybe there’s the Goldilocks of 2 policies.” There’s one that manages the database credential, one that manages SMTP and API. What this policy does is let us only introduce a slight bit of unnecessary risk. We’ll grant the API access to SMTP, even though it doesn’t need it, but we only need to manage 2 policies. So we can find that middle policy that makes the tradeoff. What makes this problem hard, and what’s not obvious in this trivial example, is that there’s an explosion in the number of possible ways of solving this. This problem is NP-hard. In this trivial example, yes, you could almost visualize all the possibilities, but if you scale this up even to a matrix as small as 10 by 10—10 entities and 10 secrets—all of a sudden there’s an infinite number of ways you could solve that problem.
There’s a meaty problem: How do you decompose this challenge and present these Goldilocks solutions? We don’t have time to go into the details of it. There’s going to be a great session tomorrow by our research team doing a deep dive into Vault Advisor and how it works. I highly recommend it for people that are interested. Go check that out. The goal of the project, though, is: How do we get this to general availability? Today, it’s still very much a research project. We’re actively recruiting beta-testers, so if you’re interested, please find Jon Curry. We’re going to publish this as a white paper, but beyond that, what we realize is that this is not a uniquely Vault problem. Anything that’s a security software is only as good as its configuration. So how do we look at broadening this beyond just Vault and looking at; What’s the discrepancy between how security software is configured and how it’s used?
See Jon Curry's talk on Vault Advisor: Vault Advisor: Preventing Security Incidents By Automating Policy Optimization