From Mess to Mesh: Fine-Tune Access Control, Visualize Traffic Flows, and Troubleshoot Connectivity
Jan 05, 2021
HashiCorp Consul 1.9 has many new features to connect and secure your service mesh. This talk walks through some of these features, including Layer 7 intentions.
- Hannah HearthProduct Design Manager, HashiCorp
- Blake CovarrubiasSenior Product Manager, Consul, HashiCorp
Hannah Hearth: Hi there. My name is Hannah Hearth, and I'm responsible for the user experience here on the Consul team at HashiCorp. I've been here for a few years. You may have met me at previous HashiConfs. That pillow back there is proof that I was there in person and snagged a pillow on my way out.
I'm excited to be here today. By here — I mean taking up the pixels on your computer screen. I'm also excited to be one of the most pregnant people to talk at HashiConf. I've been working on this baby almost as long as we've been working on the features for Consul 1.9. Those new service mesh features are exactly what Blake and I are going to dive into today.
To kick things off, I'm going to briefly review how to set up a service mesh in Consul by registering proxies, configuring upstreams, and setting up access between services. Then I'll move into the new stuff where I'll show you our brand-new service mesh topological diagram that will help you visualize your service-to-service connections. Building on that visualization, we'll learn how to add a few key metrics like request rates, error rates, and latency to your UI diagrams. Finally, I'll pass it off to Blake, who will show us how to narrow down the scope of our service-to-service intentions, using our new application-aware intentions feature for Layer 7 information — like headers and methods.
Configure a Service Mesh
Let's get started. You'll set up your service mesh in Consul by first enabling Connect on each of the Consul servers. Throughout these explanations, I'm going to use a car metaphor. This step is a bit like putting gas into your car. It's a prerequisite for getting started.
Then in your service definitions, you'll add a Connect stanza which should give Consul an indication that you want these services in the mesh, and you want them to use sidecar proxies.
Naturally, after that, you'll start up those proxies. Now that we have the mesh setup, it's time to point your car in the right direction. To configure which services should connect with other services in the service definitions files, you'll add a list of upstreams indicating destinations that each service can connect to.
You've pointed your car in the right direction. But because Consul service mesh should be secured as denied by default in production, that car will come to a roadblock. You'll need to add explicit intentions that allow those services to actually talk to their upstreams. When you do that, your car will have permission to move past the checkpoint.
Now you can take a big exhale because you've completed a bunch of steps for a number of different services in your new service mesh, but you have no clue whether you've set it up correctly or not.
You could check your connections one by one, double-check your intentions, try a few test requests. But at this point, I don't know about you, but when I try to picture my service mesh and the web of connections that I've created, it's a little bit like when somebody gives you directions by saying turn right at the big tree, turn left when you see the cows — it would be so much easier if you had a physical visual map. In Consul 1.9, we're giving you that map to help you verify and troubleshoot your service mesh connections.
This is where I'm going to launch into the demo. Here we are on the Consul web UI, where we have a list of services. In this example, I'd like to create a service mesh where web can talk to app, which can talk to API.
You can see those three services in the mesh. We have web, we have app, and we have API, but I have no idea how those three are connected. Let's go ahead and click into app. On the app page, we're on the topology view, which is our new screen. In the center of the diagram, we see the app service, and on the left-hand side, we see web with a line going to app. That means we set up our upstreams, and we've set up our intentions from web to app correctly.
But from app to API, I'm seeing a red arrow with a red X on it. I'm going to click that. The pop-over says connection denied — add an intention that allows these two services to connect.
It looks like we've set up our upstreams from app to API correctly, but we forgot to set up our intentions. The easy way to fix that from this diagram is to simply click the add button. Once we do that, we've fixed the intentions problem. The next bit of troubleshooting that I might take is to actually click into the app to investigate those 100% failing instances.
I click on API. I go to the topology tab, hit the instances tab, and then click into one of the instances where I can investigate a failing health check. But it would be really nice if — back on this app page where I started, before having to click multiple times — I could see some metrics from this view so I don't have to make those clicks.
Basic Service Mesh Metrics
To view those key metrics in Consul, there are a few short steps. First, we'll go into the UI config stanza on agents that serve up the Consul UI and add a series of properties for our metrics provider.
We can integrate with various metrics providers, but our built-in option right now is Prometheus. By adding just a few fields, we'll be able to see all of these metrics on the topology diagram. The goal of this feature is to give you enough information to inform your next troubleshooting step, but it's not to replace your current observability solutions.
Our second metrics feature — as part of this diagram — is helping you get to the dashboards that you already use more quickly. Here on each service detail page, whether or not you've configured the metrics provider in the previous step, you can add a link to your metrics dashboard for each service. To do so, you'll add a dashboard URL template link in the agent config.
In this Grafana example, you can pass in variables that allow you to configure links to each services metrics dashboard in one location — one line of code. That will populate this link section on each service detail page.
The last thing we'd like to do is make intentions application layer aware. For example, with our old intentions model, we were able to say that web can talk to app. But what if we only want to allow certain requests with specific headers or methods? For this, I'm going to hand it over to Blake to let him dive into application-aware intentions.
Blake Covarrubias: Hi, my name is Blake Covarrubias, and I'm a senior product manager on the Consul team focused on Consul service mesh. Today, I'm going to talk to you about new application-aware intentions in Consul 1.9. But before I dive into that new feature, I'd like to take a moment to provide some background on intentions and how they differ from traditional application security models.
Traditional Application Security
In a traditional on-prem data center environment, you have a well-defined perimeter. Typically, you have IP-based firewalls deployed on the edge, which provide security for incoming application traffic. Then within the data center, east-west application traffic is normally assumed to be trusted.
This very dynamic and elastic nature then results in security rules sprawl in your access control list, which increases the risk for misconfiguration if they're being manually maintained.
Consul uses a different model known as service segmentation, where services authenticate with each other, using identity encoded inside TLS certificates. When services established connections to other services, mutual TLS secures the connectivity between those applications. Then those connections are authorized based off of policies configured by an operator — called intentions.
An example of an Intention
We're allowing a service named Web to talk to an application named DB. This is one rule that would apply to any number of application endpoints that may exist.
How this works under the hood, is when the web service wants to reach out and talk to DB, it sends that request through its local proxy. That proxy then does a lookup for the destination proxy address and then connects to the database proxy. They initiate a TLS session between them and mutually authenticate that they are signed from a trusted certificate authority. Once that session is established, the database proxy sends an authorization request to Consul verifying whether the connection should be allowed. This then relies on these configured intentions to allow or deny that traffic.
There are a couple of challenges with this model. The first is that there's an expensive per-connection authorization call back to the Consul agent. The second is that because intentions are all based around simply service identity and operate at Layer 4, there's no way to express more complex application-level policies.
This typically results in some mix of policy that is configured and deployed in the service mesh and some application control policy that's implemented directly in your application. You have inconsistent policy enforcement in your environment.
Intentions in Consul 1.9
In Consul 1.9, we're changing the intention model to solve these challenges. The first change is we're pushing intentions into the data plane proxy so that they're enforced in Envoy — removing that connection call back to Consul.
Second is that intentions are now going to be managed as configuration entries, which is the same configuration you would use for configuring Layer 7 parameters in Consul service mesh. We'll touch on in a minute why that's important.
Thirdly, intentions are becoming application-aware so that you can utilize things like HTTP request information, evaluate that criteria, and use that to determine whether or not a connection should be allowed. This one permits a service named Web to talk to a service named API. This is operating at Layer 4.
Here's an example now of a service intention that's application-aware, operating off of HTTP, allowing the service Web to talk to API — only if it's requesting a path that starts with slash. To dive into this feature a little more, I'm now going to jump into a live demo.
I'm going to open a web browser and navigate to web.ingress.consul, which is the address of my application. We could see this application is responding. It's healthy and alive. Now, if I switch over to the command line, I'm going to do an HTTP get, and we see the same information that we saw from the web UI — greetings HashiConf.
In addition to that root endpoint, this application exposes a number of other env points. Let's take a look at those. We can see a whole list of env points that it provides here. Let's dive in and look at that end env point. I'll do an HTTP get for web ingress/env, and we get the output back.
This is a list of environment variables that are being injected by Kubernetes into the pod for that application. You see, it's a lot of information here — potentially sensitive information —which we don't want to be exposed to just anyone.
In addition to this end env endpoint, the application also exposes a cache env point for saving data into a Redis key-value store. I'm going to create a cache entry called status with the value of
Attending HashiConf. We can see that that was accepted.
But what if we want to secure these endpoints and only make them available to certain users in the environment? We can do that by defining a service intention. This takes a second to apply. We can see that is denied — as configured by our policy. Similarly, if we now reissue the request to update the status, that will also be denied. Great, our intentions are working.
Now, what if we want to allow some users in the environment access to update the cache? We can do that by modifying our service intention policy to permit that access. I'm going to go back into the policy file and add a new rule, which says that users are allowed to access the cache endpoint — and perform delete, put, or post — only if the header named authorization is present in the request.
I'll write that configuration and then save that to Consul. Now I'm going to reissue that request — this time, adding the authorization header. Then we'll change the operation from a post into a delete to remove that status key.
We can see that is successful. If we go back and query the cache for that status key, we can see and confirm that it's gone from the key-value store. That's it for the demo.
To summarize, the benefits of application-aware intentions is they allow you to have a single workflow for defining policy across both TCP and HTTP applications within your service mesh.
They also allow for potentially higher request rates within the service mesh by removing the authorization callback to Consul, since intentions are now enforced inside the data plane proxy directly.
They allow for more granular access control — allowing operators to develop a least privileged security model where access to endpoints can be controlled on a very granular basis.
That concludes our session. Thank you, Hannah, for your presentation. If you'd like to stick around and ask any questions, we will have a Q&A following this presentation. Otherwise, enjoy the rest of your time at HashiConf.