Up and Running with HCP Consul on AWS, Service Mesh, and Terraform
Nov 23, 2020
See how you can build consistent networking between services running in AWS with Consul Service Mesh for the HashiCorp Cloud Platform with the help of Terraform Cloud.
- Cody DeArklandTechnical Product Marketing Manager, Consul, HashiCorp
You've seen this slide through a couple of presentations we've done so far throughout HashiConf. It's something that we build a backbone of messaging around when we talk about Consul. This idea that customers are looking to have their applications live somewhat as first-class citizens on their networks.
We're used to this model where people are very hyper-focused on 0.11, 0.12, and 0.13. But ultimately, what matters is web 1, 2, and 3 that represent those servers. Taking it up another level, what matters is web service, and those three nodes behind it are an implementation detail.
We want to get to a place where the network understands — in a very clear way — how applications behave, and they support that behavior. We do that across three towers that we work through. There’s progressive delivery, consistent security, and connectivity everywhere.
Progressive delivery is this idea of how do we gradually roll out services? How do we address application lifecycle, but from an application networking perspective? How do we roll out new services? How do we do something less dangerous than switching Web 01’s DNS entry to a whole new system and hoping that nothing goes wrong? How do we gradually control the flow of traffic that hits these services? How do we set up intelligent failover between environments? It's how do we progressively deliver these? This was a concept that was largely thought up by James Governor at RedMonk. We've championed this idea inside of how we approach application networking.
We also want to chase consistent security. Consul is very strong at taking very different connectivity environments and creating a consistency layer between them. How do we have that consistency layer be a place where we can apply policy securely — or consistently — between all of those systems?
When we talk about that consistent security, that's things like the intention model. Blake and Hannah talked about Layer 7 intentions. How do we take those models and apply them consistently to workloads that live in EC2 and systems that live inside an EKS cluster or Docker workloads? Consistency of security is important to us. We do that through things like our automatic mTLS encryption, things like the intention model.
The big one for me that I'm passionate about — coming from the enterprise space — is connectivity anywhere. Ultimately, we want the ability to have these systems connect globally in a mesh. We see this pattern that develops in the service mesh space, where these individual islands of service mesh exist. You get a Kube cluster, you throw a mesh on it, and you do your job. But ultimately, a service mesh should be comprised of all of the systems that support your workloads. You should be able to have that service mesh traverse all of these different environments and be able to have that progressive delivery — that consistent security — applied globally. That's why we put so much attention into how we connect these environments; how we federate environments together, and how we enable you to take workloads and place them on whatever workflow you need them to operate on.
In this talk, we're going to be talking specifically around HCP Consul. HCP Consul was announced at our last HashiConf — out in HashiConf EU. Ultimately this is a managed cloud platform where we run Consul inside of it. It gives us the ability to run Consul natively inside of the cloud — and have you consume this as a service. As opposed to managing the underlying infrastructure, managing upgrades, managing how the systems are built. You get to consume Consul like you would consume any other cloud service.
We see that being the model that people most commonly want when it comes to the service mesh space, service discovery, and health space. This should be a dial tone that you're able to call into and take advantage of these capabilities that live inside the platform.
We talk a lot about the phrase service mesh. But I like to think of service mesh as being about the outcomes. That dial tone you're getting is things like progressive delivery, which decomposes down into things like traffic splitting, resolvers, being able to failover between environments, and ingress gateways. All of these concepts live within that progressive delivery like towers.
I tend to talk a lot about outcomes when I talk about service mesh because, ultimately, that's what people care about. It's not the buzzword of service mesh. It's how do I make app A talk to app B in a very clean way? We do that through multi-runtime and HCP in a number of ways. It's an important thing for us.
It's running on dedicated infrastructure. When you get an HCP cluster, you're not sharing that. It's not me and my team are each provisioning our own Consul cluster — but on shared infrastructure. You're getting dedicated infrastructure to support this Consul cluster. It's not a colocated environment. You can still carve that space up inside your team using namespaces and ACLs, but it ultimately lives on dedicated hardware.
I've hinted at this quite a few times so far; the idea of consistent application networking. We want to give you a consistent plane to apply these application networking concepts between workloads inside of that domain. You've heard throughout HashiConf so far, the public beta is available now. We've seen great pickup already during HashiConf around people going out, spinning up HCP Consul. Go take it for a spin. It's great. You're going to see a cool demo where I go through setting it up and consuming it in a few here.
Why HCP Consul?
I touched on it a bit in the previous slide, so we'll breeze through these slides smoothly.
Consul as a Service, Managed by HashiCorp
Ultimately I like to think of HCP Consul like I think of any other cloud-managed service. For example, when I pull down an EC2 instance, I'm getting a resource. I'm not worried about the underlying system that this EC2 instance is running on. I'm worried about the service that I'm getting. I think of Consul in the same way. I'm able to consume this as a service against the cloud that I'm working in. I think that's a very powerful concept when you have something like Consul that shines very well in a consumption-driven environment.
Deployed Secure by Default
From an enterprise perspective, this thing is secure by default, which means that it comes set up with mTLS encryption, which is a baseline for service mesh. But it comes with gossip encryption in place, ACLs in place.
It's designed to be bulletproof from an enterprise environment for day one. There's not a lot of work that you have to do to go in and consume it in that way. It's that way out-of-the-box. When you bring systems into that, you're getting that same secure-by-default posture.
Consistent Policies Across Workload Types
Again, I've harped on this one. I'm going to harp on it some more; that consistent policy between workload types. Inside a public cloud, it's all still networking. Packets are still going from place to place. But ultimately, when you look at the details of an EKS cluster; that networking plane on EKS cluster; that networking plane on the Docker host; that networking plane behind EC2 — there are little nuances that are treated a little bit differently between those.
When you have Consul bolted on in front of them, you get that consistent networking plane. You can apply those policies the same way. You get that across things like service discovery as well. Being able to use just the raw DNS resolution capabilities inside of Consul; the health checking inside of Consul. Or if you up-level all the way into service mesh and network infrastructure automation.
Why Terraform Cloud?
The other half of this talk is about Terraform and Terraform Cloud. It seems at first glance, it might seem like an odd pairing, but I promise there's a method to my madness. When we come in with this managed offering of Consul, we need a way to manage it. We can expose the UI if we wanted. It's not always the safest and most secure way to do things. We can make that better with ACLs. But we're still hopping into a UI. I think that there's a better way to do that. We're going to show that in this demo as we take Terraform and we use Terraform to manage that HCP Consul environment.
But why use Terraform Cloud in this case? Why not run this locally on my workstation — pass Terraform manifests around and call it a day? Because it boils down to some of the same reasons why we use Consul in a managed way. This is a fully managed environment. You're getting that infrastructure as code managed for you.
Self Service Infrastructure
You're not worried about making sure you have a specific Terraform binary locally. You're not having to worry about configuring state locally and how you make sure that you and your teammates don't stomp all over each other's state. This is managed as a service. It's something available for you all the time. Because of that, you can start to use this in a self-service way.
This leads into the third point around GitOps. This idea that my Terraform Cloud instance is tied to that infrastructure as code definition of my environment. When I go into Git and pull a PR to add an intention — or I pull a PR to add another entry to my ingress gateway — I can use Terraform Cloud's governance capabilities as well as GitHub, GitLab, whatever actual VCS provider that you've tied in. I could use those providers directly with that infrastructure as code definition and apply it as I go.
When I pull that PR and merge it, that information can become part of that infrastructure as code definition of my running state of applications. This ends up living right alongside your infrastructure code and your application code.
We've gotten used to this idea of Terraform defining out our infrastructure, and we include that alongside the applications that run it. We can now use Consul to define out that networking as code and have that live alongside it. We’ve hit two bullets there between the self-service infrastructure as well as the GitOps integration.
I think that when you pair these things together, you get a very clean experience for how you can manage Consul at scale between multiple teams. With things like the new Terraform Cloud runners that you can run on private networks, we don't even have to expose the Consul UI anymore. We can keep that more secure by shutting down that public access and managing the entire thing end-to-end within Terraform Cloud.
With that being said, that's the slideware part of the presentation — that's the click-click part of the presentation. We're going to jump into a demo and have some fun showing you how this can be consumed.
Consul HCP and Terraform Cloud Demo
Let's start by taking a look at a few of the configurations of HashiCorp’s Cloud platform. I'll log in with my login that's associated with my GitHub account. When we get in, we can see the landing page where we can hit some HashiCorp Learn topics around Consul as well as deploy a new Consul cluster if we need to. That'll let us choose things like internal IP, external IP.
The HashiCorp Virtual Network
We'll take a look at the HashiCorp virtual network, which is our consistent control plane between HCP as well as the workloads that we deploy. I've got two peerings set up one with my default VPC and one with a VPC that has my elastic Kubernetes service cluster in it. Those are what we typically see because we definitely expect customers to be peering many VPCs into HCP. That's why we have this peering construct. It makes it very easy for us to bring on board additional VPCs into the mix.
Inside Our Consul Environment
If we look at our Consul deployment, we can see consul-cluster-1. Then we see a few details about our existing deployment. I can pull down the client configuration as well as generate a new token if I needed to. I can see here that it's deployed onto that HVN. If I click this link, we're taken into that Consul environment.
We can see I have a couple of services in here already, and that's because I've already configured my EKS cluster within this environment. These services represent the ingress gateway — as well as the terminating gateway — that live along inside of that EKS cluster. I've got no intentions configured, so this environment's empty.
To get started, we're going to need to bring some workloads into it. Let's switch over to the command line. We'll apply our couple of configuration files for our application. We brought in our frontend, and we've also brought in our API. If we switch back into the UI, we can see these starting to come up.
Our services are configured to talk to each other over a service mesh, hence why we see this connected with proxy. However, we haven't exposed a load balancer for the frontend to communicate because we want to use our ingress gateway.
To use that, we need to push a configuration for our ingress gateway as well as set up intentions for the communication between each of these services. We also want to be able to set up a database that lives inside of Amazon RDS to talk with our terminating gateway — so that we can expose services that are not a part of the service mesh to service mesh functionality. Finally, we're going to want to release a version 2 of our API and a version 2 of our frontend. We want to push configuration changes for that too.
Using Terraform Cloud to Apply Configuration Changes as Code
We're going to do this in Terraform Cloud because Terraform Cloud allows us to establish a consistent way of applying these configuration changes as code. We're going to integrate Terraform Cloud with our Git repository so that we can pull these changes at the time that they're committed from GitHub.
I'm going to grab the repository that I want to use, and I'll create a workspace around it. I'm going to push a few variables in for the communication with our actual environment. For example, I'm going to push an API token, which has used the interface with the ACL system for Consul. I'm going to push in the Consul DC — which is DC1. And finally, I'm going to push in the cluster address, which is the address for this inside HCP.
With these variables in place, I'm ready to run my first workflows. I switch over to runs. We can select queue plan, which will do an initial test run of our workflows. We can see it initialize, and there are no changes in place — so there's nothing it's going to do. Let's push a few of our configurations.
The Terraform Directory
Clear the screen, and we'll switch into the Terraform directory. We'll edit our main .tf file, and we're going to add some of our configuration changes for Consul. We've pushed in two intentions to allow our ingress gateway to talk to our frontend and allow our frontend to talk to our API. We've also set the service default for our frontend service to HTTP and set our ingress service to talk to our frontend. We'll save these changes, commit our changes to our repo, and push those changes up — new intentions and ingress gateway. If we quickly switch back into Terraform Cloud and click on our runs, we can see this starting to run.
We could have had this be a PR in GitHub that needed to be approved before it could run. These are some of the ways we can add governance in. In this case, we're sticking with the basic confirmation, which is required to apply this.
I'll click on confirm and apply and confirm our plan. This will tell Terraform to execute against that environment. We can see the real-time execution go — and four resources were added. If I switch in and go to intentions, we can see the intentions that were added. If I go in and look at our ingress gateway, we can see that it's referencing the frontend successfully.
Adding The Ingress Gateway Address
We'll switch back to the command line and grab the address for our ingress gateway. There's our ingress gateway address. If we drop our address in, we can see we're hitting our web frontend. Our connectivity to our API tier is successful, but we're not reaching our database.
We're then going to use our terminating gateway to connect to that and then set up the intentions to allow communication. Let's jump back into the CLI, edit our
main.tf file and add in our configurations here.
Adding Our Configurations
First, we'll add two blocks of code. One is a database service — or Consul service resource — named database. The second is a node resource with the following name that's mapped to where the RDS database is. They're set as external nodes for the environment.
We're also going to drop in a configuration for our intentions to allow our API to talk to our database. We're also going to drop in a terminating gateway configuration that allows our database service to be reachable.
Finally — because our API service is going to be also set up as a splitting target later on — we're going to set our API service to be an HTTP service as well. Let's write out our configurations and add them to Terraform. We'll name this one
TGW config and external service. We push these changes off, switch back into Terraform Cloud, click on our runs view. We can see it starting to run.
We can see it's planning to add in the services that we want it to; five to add, two to change. We'll confirm and apply — and we'll confirm our plan. We can see that apply is running. We've added five resources — changed two. If we go back into Consul, we can see we have our database service here as well as our new intentions. If we look at our application, we can see that our database is connecting successfully.
Our Final Configuration
For our final configuration, we want to be able to lifecycle out our API tier and our application frontend. We can see our application frontend here. If we look at our APIs here, we can see we're running version 2.0 of that behind the scenes. Let's hop over to the command line and apply some of our new configurations.
We're going to switch back to the directory and apply our new API service — as well as apply a new frontend service. If we quickly switch back into our environment and go back into Consul — and hit our services view — we can see that our API and our frontend have six different versions. But we have a couple of V2's in there as well as the regular versions.
If I go into these V2s, I can see they're tagged with the version 2 tag. In this example, we're going to use service resolvers and service splitters to control traffic flow to these parts of the environment.
We'll switch back into the command line. We'll switch back into Terraform and edit our file. First, we'll drop in our API resolver and our API splitter. Our API resolver sets up a default — a subset of version one — and then configures subsets. These subsets are used from the service metadata tag that I showed previously. We're going to set up a V1 for the version V1 and a V2 for the version V2. We're then going to add a configuration entry for the API splitter and send traffic 50/50 between those two services.
Next, because we're running short on time, we'll configure another service resolver and another service splitter for our frontends. The configuration looks the same with a V1 and V2 subset, but in this case, I'm going to switch all traffic to the V2 of our frontend.
Finally, with these configurations in place, we'll write out the configurations, add the configurations to Terraform. We'll change our message over here — new API and frontend — and we'll push our configurations.
Once again, if we switch back into Terraform Cloud and hit our runs view, we can see this being planned out to be executed. It's going to reconfigure parts of our environment, we'll confirm and applyb— confirm plan — and it's going to apply these configuration changes.
Those configuration changes were all added, and if we go into Consul and look at our frontend and our routing tab, we can see we have two versions in here. If I go into our application and hit refresh, we're running on our application's dark mode. Likewise, if I come over and hit the API tier and go to health — and refresh this a few times — we can see that we're bouncing between the two versions of the environment.
We're quite happy with our configuration. Our application's resolving the way we want. We like our dark mode of our application more. We're going to switch all the traffic over to be finally on the secondary environment. We already did that for the frontend and set it to 100% traffic at the secondary location. Let's make that change for the API tier as well.
The final time we'll come in, we'll edit the main .tf file, and we'll change this API splitter to 0% going to the V1 and 100% going to the V2. We'll also change the default subset to V2. That way, next time when we come in to bring out V3, we're already set for the right default. Then finally, that's already set where it needs to be.
Our Final Terraform Cloud Run
For a final time, we'll add our configuration, finalizing splitter configuration, push our changes into the environment. We'll go enjoy our last Terraform Cloud run. Currently planning — confirm and apply. Confirm our plan, which is going to execute our run.
And if we go over to our health screen and continue to hit refresh, we're always hitting the version 6. We haven't dropped any traffic, and we were able to migrate traffic between different subsets of the environment very easily.
If we take a look in Consul underneath our API tier and the routing tab, we can see 100% traffic is going to the V2 service as well as 100% traffic going to the V2 service to the frontend.
Our Terraform Cloud run ran completely. We have applies across the board, and we were able to show how we were able to finally bring this entire environment onto a new set of running services live with traffic coming through.
Demo Summary & Conclusion
So taking it from the top, we were able to fully configure a set of applications inside our Consul environment living inside HCP. We were able to configure intentions using Terraform and applying them via Terraform Cloud. Running both of these as services means I don't have to manage any infrastructure locally. I'm consuming these as cloud services, applying my applications. I'm focused on keeping my applications running, which is ultimately what matters the most to customers in our environments.
I was able to show setting up all of the configuration details. These are all things I would have had to normally connect into Consul and do manually. But I'm applying these as code, which means they can live right alongside my applications as they're deployed and ran inside of the environment.
Furthermore, using this with Terraform Cloud means that I get to be integrated with Git so I can use all of Git's capabilities around code management and versioning. I also get all of the governance capabilities of Terraform Cloud and the mixture of Git while keeping my secure surface mesh environment.
I hope you enjoyed this demo and enjoyed this walkthrough. I hope you enjoyed HashiConf, and you have a great day.