Learn about how service mesh, identity-based access management, and secrets management can help implement zero trust without increasing development friction.
Speaker: Rob Barnes
Hi, I'm Rob Barnes. I'm a senior developer advocate here at HashiCorp. I primarily focus on the secure product line. Today I'd like to talk to you about how to apply the zero trust mindset to common application architectures. But what is zero trust?
In zero trust, we do not automatically trust anything inside or outside our network perimeters. We also verify everything that tries to connect to any of our systems, whether it's inside or outside of the perimeters. The goal of this talk is to take a common application architecture and apply this zero trust mindset to it.
Here we have our demo application, which is called Hashicups. It's built in AWS. This is a very common application architecture across the globe. We have a VPC, which contains 2 subnets, a public subnet and a private subnet.
At the moment, all the magic is happening in the private subnet. The private subnet contains a PostgreSQL database, as well as an EKS cluster with 3 nodes on it. Each of these nodes has an API service running on them.
We're going to take this architecture and start to put the building blocks of a zero trust approach onto it.
In order to verify everything, the core building block for this is going to be identity. Everything we're going to do is centered around the machine's identity or the human identity.
And we can split this up into 4 different areas.
The first is machine authentication and authorization. How do we prove a machine's identity? How do we authorize what a machine is enabled to do.
The next area is machine-to-machine access. This is about controlling which machines are allowed to speak to one another.
Then we have the human element. How do we control which humans are allowed to speak to which machines?
Finally we have the human authentication and authorization pillar.
We're going to take these pillars and add them to our common application architecture.
In general, to perform their functions, machines need to authenticate with other machines, systems, or third-party services. And this is where Vault comes in.
Let's take a look at a more practical example in our application flow.
In our demo application, we have a frontend API, which needs to talk to the public API. The public API in turn needs to speak to the product API. And it's the product API that needs to communicate with the Postgres database.
But in order to do that, it's going to need some credentials to authenticate with Postgres.
This is where Vault comes in. Vault will administer these credentials on behalf of our application. And it does that in 2 steps. First, the application needs to authenticate with Vault.
We go about that with the concept of Auth Methods. Auth Methods in Vault are a modular concept that allow us to authenticate our applications or humans with Vault itself.
In this case, we use the Kubernetes Auth Method. The Kubernetes Auth Method works under the hood using the Kubernetes identity construct, which is a service account.
It will take the service account from Kubernetes and use that to authenticate the application to Vault. Once it's authenticated, Vault will go ahead and use another concept, which we'll call the secrets engines. It will use that to generate a short-lived credential for the Postgres database.
The secrets engines are another modular construct. In this case, it's a database secrets engine. And it's responsible for going to a database, creating a database username and password, and passing it back to the requesting API.
Once it expires, because these credentials are short-lived, it will go back to the database and clean it all up.
In some cases, the secrets engines can also be used for storing static secrets, but that's not the use case we're discussing today.
You can think of Vault as an identity broker. If we go back to our application architecture, we can start to add these building blocks to see how this works. In addition to the AWS architecture we looked at before, we're now bringing in the HCP element.
HCP is the HashiCorp Cloud Platform, and this is going to give us a managed Vault cluster. Inside our HCP portion of the application architecture, we have this concept of an HVN, which is a HashiCorp Virtual Network. It's very similar to a VPC in AWS.
We have set up between our HVN and our VPC. This will enable all of the infrastructure in our private subnets to speak to our Vault cluster, which is in our HVN.
The first thing we're going to need to point out in this application architecture is that, because we're using Kubernetes, we're using the Vault Sidecar Injector, which is pretty much a Vault Agent running in a sidecar next to our application. We'll see why that's important as we look at the workflow.
First, we need to authenticate to Vault. The Vault Agent is going to take care of that for us best we can directly to Vault.
Once it's authenticated, go in and generate a short-lived credential for the PostgreSQL database. Once that database credential is generated, it will return to Vault Agent and then Vault Agent will then write this to a file, which is accessible by the actual API that needs the credential.
That's the general workflow, depending on what your use cases are. You can have many different Auth Methods. We have some examples here. We have OIDC, which is a very common one. We have Okta, LDAP, you name it. We support the majority of identity providers for auth methods. And the same with secrets engines.
No matter what system you're using, if we do not have a secrets engine that we support for that, as long as it has an API, because of the way that our system is modular, you can go ahead and write your own plugins.
But it supports a lot of them out of the box, as you can see from the slide that's on screen.
We've talked about how to get credentials into the application. Now we're talking about the communication between 2 different machines.
What we're going to want to do is leverage some kind of identity-driven controls for the networking of that.
Let's go back to our application and look at how that works. Again, we have the same flow here. We have the frontend API, the public API, and a product API.
The first thing we want to do is control which API is allowed to communicate with the database. In this case, it's the product API.
As we pointed out before, we also need to control which API can communicate with the product API. And we only want to allow the public API at the forward/coffee's endpoint to communicate with the product API.
To bring this all together, I will explain how Consul is going to help us do this. We use Consul Connect for the machine-to-machine access. We have 2 nodes here, one on the left and one on the right, and each node has some kind of service or application.
We have the web on the left and the database on the right. Each of them has some kind of proxy running as a sidecar. We use an Envoy in this example, and Envoy is responsible for communicating with other proxies.
What we're doing here is adding in the Connect element, which is going to give us a couple of things. It's going to give us mutual TLS. And that's going to authenticate the Envoy proxy, which is acting on behalf of the database.
It's also going to authenticate the proxy for the web application.
That's how we can authenticate the identity of each of these machines.
The other thing it gives us is intentions. Intentions can be full of things like a firewall rulebase. We can state intentions, which say that the web service can in fact communicate with the database, or we could explicitly say it cannot, just dependent on our application architecture.
This is the role of Consul Connect, and this is how we're going to control machine-to-machine access.
Now that we understand the concept, let's add this into our application architecture. We introduced HCP earlier for Vault. The other tool that it supports is Consul.
We're going to have a Consul server, which is managed via HashiCorp inside our HCP HVM. We have Consul Agents as well on every single EKS mode.
The first thing that happens is the service registers itself to the Consul Agent. Then the agent reports that service back to the Consul server, which is in HCP.
That takes care of the first part of the machine-to-machine access implementation.
But we have a second area that we need to look at in order to effectively control this. We want to stretch our service mesh security. What I mean by that is we still have components like the database which are not part of our service mesh, but we can bring that into our service mesh with a Terminating Gateway.
This is gonna allow infrastructure components that are outside of our mesh to be controlled within our mesh controls.
As a side note, if we need to allow traffic from things outside of our mesh network to communicate with services inside the mesh network, we can use the Ingress Gateway.
And then, in cases where we have more distributed application architectures, we can use mesh gateways to connect 2 different meshes together.
Let's walk through this in our application.
We're going back to our application architecture here. The first thing you're going to want to do is to register our database as an external service on the service mesh.
The next one we're going to want to do is link it to a Terminating Gateway. By doing this, we can control what services are enabled to access that Postgres database.
This is what makes us expand the mesh security to things that are outside of our mesh.
To achieve this, we can use Boundary as a solution. Boundary will manage session access to machines in a controlled and secure way.
Let's go back to our application to look at this in more practical terms. In our applications, we have the human element.
In this case, it consists of 2 teams. We have a product development team and an operations team.
Let's look at some use cases. The product team needs to access the database to load data or products. We also have the frontend API that they need to access so that they can conduct tests and those types of things to make sure that things are working the way that they should be working.
In addition to the product team, the operations team are going to need access to the core infrastructure, as well as the database. This is for break-glass scenarios. If something goes wrong, we're going to need them to be able to access the database or access the EKS nodes to be able to troubleshoot any types of issues and resolve them.
Let's go back to our application architecture and add these components.
Boundary is sitting in our public subnet. We have a couple of components that I want to talk about, the first being the Boundary controller.
The Boundary controller is the thing that you speak to when you're making API calls to Boundary. You can think of it as the brain.
You would make an API call to it. You would authenticate to it. And you would request sessions from the Boundary controller.
In terms of managing the traffic in some geo-connectivity to the targets—we'll talk about what targets are in a moment—is the boundary worker. That's going to do the heavy lifting.
We can scale that as much as we need to handle our loads. Obviously, a system like this will generate some type of data which is useful for its operation.
We're using another Postgres database here, which is in a public subnet.
To see how this is going to work, let's take a look at the workflow for the operations team, for example. They're going to authenticate with Boundary. And once they authenticate with Boundary, they're going to request an SSH session with an EKS node.
It will give them a session using SSH on port 22. It's the Boundary worker that's going to manage that session for us. They've done all the authentication with the controllers. Now it's over to the worker to handle all the heavy lifting for that.
They can also do the same with Postgres, for example. If they need to connect to Postgres, you can have a session which starts up a PostgreSQL session on port 5432, and it will give them direct access into that as long as they are authorized to access that machine.
We can look at another use case, which is going to follow the same kind of workflow, but this is going to be for the product team.
They've already authenticated, and they're going to need access to the database so they can load the data. We'll start up that session for them. In addition to that, they also need to create a PCP session to the frontend API so that they can do the testing.
Boundary is going to manage all of these things for us.
Let's look at how Boundary manages these things in terms of controlling who can do what. This is an image of the main model of how Boundary is operating. At the highest level of obstruction, we have an organization. So organizations are containing groups.
We have the leadership group, the operations group, and the product group. And all of these groups contain users. A group is just a collection of users.
It's not a new concept for anyone that's worked in identity and access management or any type of role-based access control system.
Now what we're doing is assigning a set of permissions to each of these groups. And these positions, we call them "roles."
We are saying that a specific group is allowed to access a specific host catalog. A host catalog is just a collection of targets. When we think about targets, we can think about things like our EKS nodes.
If an operations engineer needs to connect to an EKS node, it's essentially a target that he is connecting to, so it goes to the host catalog to do that.
That's the identity and access management model for working with Boundary.
If we look to the right, we have also the product team. Their access permissions are a little bit different. They also need to access PostgreSQL, but like we've mentioned before, they need to access the frontend API on port 80 as well. So they will have the permissions to do that too.
And just as a side note, we also have a leadership group, which is read-only. They don't have any direct session access to any of these things. They just need to be aware of what's going on.
Now we can start to bring all of these controls together into a much bigger architecture diagram.
We started off simple and, bit by bit, we've started adding building blocks to this diagram. We can see we've had a Vault, which is being managed for us by HashiCorp using the HCP. And we have the Vault sidecar injector running on all of our EKS notes.
That's how we're controlling the machine authentication authorization. Then we've added Consul into the mix. We're using HCP to manage that for us. We don't have to take care of the backend of Consul.
We've added that into our application architecture, along with Consul Agents on each of the EKS nodes as well. And what we just talked about is Boundary. So this is living in our public subnet.
The final bit of glue is single sign-on. SSO allows us to unify our identity providers, by providing a smooth authentication workflow. As an example, Vault, Boundary, and Consul all support OpenID Connect (OIDC) as an Auth Method.
That allows us to bring about many different workflows that work for our organizations. We can take whatever identity provider we have and we can use OIDC to provide that authentication workflow.
And because OIDC is using a JSON Web Token under the hood, it's a similar case for applications. So you can provide a unified way of doing things.
To summarize, we've taken a simple but common application architecture and removed the assumption of trust.
By removing the assumption of trust, we've implemented controls to verify everything. We can verify the identity of a machine. We can verify the identity of a human. We can verify which machines can talk to which machines and which humans can talk to which machines.
We've done this across the board. This gets us very close to what zero trust security could look like. And it significantly improves our security posture.
I'd like to thank you very much for listening. I hope you found this useful.