Presentation

Nomad Networking Demystified

Learn about Nomad's networking capabilities, such as sidecar-less service mesh when integrated natively with Consul Connect, and see how the new container networking interface (CNI) support works.

Speakers

After a brief crash course in HashiCorp Nomad and its history of networking-related features, Nomad engineer Nick Ethier will showcase the latest networking features and Consul service mesh capability integrations in Nomad. Finally, the talk will wrap up with a look ahead at what's coming for Nomad networking.

Transcript

Hey everyone. Welcome to my living room. We're going to talk about HashiCorp Nomad Networking Demystified. What I want to focus on here is the different ways that you can connect your applications together as they're running on Nomad and how that model has evolved over the years since Nomad's inception to now with our new beta release and even beyond that.

My name is Nick Ethier. Hello. My pronouns are he, him. I am @nickethier on GitHub, Twitter, everywhere. As Eric said, I'm an engineer on the Nomad team here at HashiCorp. I've been with HashiCorp for a little over two years now. When I'm not working on Nomad, I'm usually spending time with my family. We've got a little hobby farm here in Ohio raising some goats and cows and whatnot.

All right, before we jump into networking, I felt it pertinent to kind of give a short overview of Nomad. With as many of you viewing, I'm sure there's a few of you out there who may not be super familiar with Nomad.

A Brief Intro to Nomad

Nomad is a workload orchestrator available for use at any scale. What that means is whether you're a small shop with just a few engineers, Nomad's simplicity plays well with small teams deploying in production or if you're a large enterprise with hundreds, thousands, or tens of thousands of engineers, Nomad handles very large infrastructure and very large deployments very well. It scales to be just about any size.

Its deployment model is very simple. There's a single binary that can run in 2 different modes:

  • Client mode.

  • Server mode.

Server is really the brains of the cluster. That's where all of the state is stored on disk. There's no external database that needs to be involved. Nomad servers take care of that.

The Nomad clients are sort of the brawn of the cluster. They run on all of your infrastructure and work as assigned to them and they take care of maintaining the lifecycle of your workloads.

What's particularly unique about Nomad is that it takes a very pluggable and agnostic approach to the technology you want to use in your infrastructure. Containers are great but unfortunately, organizations can't just switch to running containers in production overnight. There's always a migration period. Maybe that's not the story that your organization wishes to tell with their infrastructure.

Nomad sort of meets you and your technology needs where you need to be. Whether you're orchestrating virtual machines, running on infrastructure whether it's just binaries that need to be ran somewhere, maybe you have a .jar file that needs to be run or if you do have container workloads and any heterogeneous mix of that, Nomad can schedule those across fleets of machines using a very pluggable, multitask driver approach.

Of course, it's got very deep integrations with the rest of the HashiCorp's suite with Vault and Consul integrations. It's multiplatform as well. Whether you're running on a Linux infrastructure, or Windows, or a mix of the two, Nomad's deployed on large and small clusters on both platforms and there's also support for running on Mac OS too, which is really great for developing and running things locally. Nomad has a dev mode, if you will, where you can run a single node cluster right on your local machine and submit jobs to it and really get some great day-zero feedback in real time.

I mentioned that Nomad is pluggable by design. Those task drivers are built in but there's also an external task driver interface where you can write a task driver plugin for your runtime if you have some bespoke runtime that you want to integrate with Nomad. We've seen several community drivers as well. Off the top of my head, I know there's an IIS community driver for Windows services. I've seen a firecracker and a containerd driver and some more interesting ones. People trying to do things with JavaScript and was with web assembly (WASM). There's a lot of power in that and it's really fun to see where the community takes it.

Nomad Glossary

I just want to go over a few terms that I'm going to use throughout this talk, so that no one feels left behind here. We talked a little bit about the difference between servers and clients.

For servers, there's typically 3 or 5 in a cluster. They maintain quorum and leadership. Data about the cluster state is written to disk on them. They're the brains of the cluster and then the clients are the agents that run on all the machines in the cluster.

Job

A job is sort of the top-level unit that you deploy to a Nomad cluster. We call it a job file or a jobspec. It's a specification that describes your application and how it needs to be deployed. Any constraints that it may have, it has to run on a Windows machine or it has to run on a machine of a certain type. Those things are taken into consideration when scheduling.

Task groups

A job itself is made up of one or more task groups. A task group is a collection of tasks that are co-located on the same machine and may typically share some sort of shared resource in the machine like volume mounts or network namespace. Then a task is the individual unit of work. There can be one or more in a task group. This can be a Docker container. It can be a .jar File, a VM, et cetera. These are those task drivers.

Allocation

Then, an allocation is an instantiation of a task group running somewhere in your infrastructure. If you have a job file that has one task group and in that task group you specify there needs to be 3 instances of that running when you submit that, Nomad will create 3 allocations and place those accordingly.

Nomad Case Studies

As I said, Nomad scales to both organizations, large and small. I wanted to highlight just a couple users of Nomad who have spoken about it publicly, really in just the last few months. These are very current. Just a couple weeks ago, Cloudflare published a really great blog post about their use case of Nomad and how it powers their 200+ datacenters all over the world. A quote from this blog post is, "Now Nomad has been deployed in every datacenter. We are able to improve the reliability of management services essential to operations."

Another user of Nomad, BetterHelp. BetterHelp is an online therapy company where they match up clients and count counselors worldwide. They use Nomad to power their infrastructure and they gave a really great overview of that on one of our most recent Nomad virtual days, where they talked about how they use Nomad to dispatch task and handle asynchronous tasks in their infrastructure. They said they launched over 2000 of these tasks every hour to handle their workload.

Then, finally, Roblox. Roblox uses Nomad to power their infrastructure where over 100 million users play or active monthly on their platform. They run it on over 10,000 nodes across 20 different clusters, both bare metal, cloud, Windows and Linux and it's given them the power to have a really great stable platform with 4 and 1/2 nines of uptime.

The History of Nomad Networking

Let's jump into networking now that we sort of have a primer of Nomad. If we look back at the different network improvements we've seen over Nomad. When Nomad was originally released back in... it's been almost 5 years now, back in September of 2015, we had static and dynamic port allocations and port mapping.

The next release brought Consul integration with service registration and some continued improvements through the year. The really exciting improvements have been coming in the last year. Last year, we introduced them shared network namespaces among tasks and a task group and integration with Consul Connect HashiCorp service mesh.

Earlier this year, we kind of built on those features and added a bunch of really great improvements that we'll talk about. Then today, we're announcing support for CNI (Container Networking Interface) integration, multi-interface networking, and Connect data.

Share Network Namespaces

In Nomad 0.10 with the introduction of shared network namespaces, we moved to adopt this sidecar pattern that's been popularized. The idea here is that instead of baking everything into your application, you can have your application running and then one or more of what we're calling these sidecars that take care of different responsibilities for your application. This could be a log collecting sidecar or a metrics collector sidecar or a sidecar that takes care of network communication, which we'll jump into that a little bit more. That's the Consul Connect integration.

In this diagram here, we have your application running. If it wants to talk to a service in the cluster, it's talking through this proxy and this proxy is doing all the load balancing necessary, the service discovery, maybe it's enriching those requests with tracing and whatnot. This shared private namespace where the network stack is private to that task group enable some really powerful patterns here.

What's also really interesting is these shared network namespaces are agnostic to the drivers. If you're a Java shop, in this example, we have a Java application that's running. Then maybe there's another group that has built this new metrics infrastructure. Instead of going in and baking a metrics collection library into every application, they've just published a Dockerfile or a Docker image that can run as a sidecar. This allows organizations to adopt new technologies without having to go and specifically integrate them into every application. You can do this with Nomad with its driver agnostic network namespace support.

Setting Up Network Namespaces

I want to kind of jump in here and go one layer deeper here with this integration to kind of talk about how it works. If you ran that previous example and you ran docker ps, you would notice that there's actually 2 Docker containers that are associated with that allocation, the collector that we talked about. There's also this pause container. I kind of just want to talk about why that is and why you might see that if you start using this.

Let's talk about how this is set up. The first thing that happens when we launch this allocation, the client launches it on host is it needs to create the network namespace.

Now, a network namespace in Linux is really just a file descriptor that is set when forking your process. For any normal executable, it's very easy for Nomad to create and manage the network namespace but for Docker, unfortunately, Docker doesn't let you to give it a network namespace file descriptor directly. Our only option is to create a container that's kind of the parent of the network namespace and then set the network mode such that it's inheriting the network namespace of another container.

This is what happens, when we create the network namespace, Nomad detects that it's got a driver that must initialize the network namespace. I call this the Docker bit because Docker is really the only driver that that requires this but when it detects that it's running a workload that involves Docker, it needs to launch and create the network namespace through the Docker driver. The Docker driver will launch a pause container with a container ID and it will return back that container ID as well as the network namespace path since we can get that from Docker's API. It returns that back to Nomad. When Nomad goes to launch the actual Docker task that collector, it will launch it with the container ID of that pause container as the network namespace parent.

Then, once it moves on to the Java tasks since it has that network namespace file descriptor, it can just launch it as it normally would. Now, what we have is a network namespace that involves both a Docker application running and a Java application in this case.

Encrypted Service-to-Service Communication with Consul Connect

That kind of talks about how we how communication between tasks happens but what I think more people are more interested in is the actual service-to-service communication. Nomad has had support long ago, since Nomad 0.2, for registering services with Consul and Consul gives you a lot of flexibility in terms of how you do service discovery. Consul exposes a DNS interface. Your infrastructure could be integrated with that either using like well-known ports or SRV records. There's also to the Consul template tool that is built into Nomad. You can define a template stanza in your Nomad job file that's using Consul template to render the instances of an upstream service.

Then, as those change, Nomad can either send a signal to your task or it can restart it. It can repopulate environment variables and restart the task as well if you have a less dynamic task. There's a lot of different options available. How you define a service in Nomad here is just with this service block and in the example here. You give it a name. You can define some health checks and Consul will take care of registering it and running those health checks for you.

Unfortunately, all of those methods sort of require some sort of extra setup. You have to make sure DNS is configured correctly. You have to write a configuration file to get your application to talk to something.

What we've always really wanted in Nomad is sort of a near-configurationless way of connecting applications together. We feel like we've really achieved that with integrating with Consul Connect. The way Consul Connect works is it follows this sidecar pattern where application network communication is a gateway through this proxy. We use the Envoy proxy by default. Consul will configure this proxy to allow traffic based on Consul's intentions. As traffic moves through the service mesh between these 2 Envoy proxies that happens over a TLS connection, and the services are authenticated. If an application is talking to an upstream database, you can be sure that, because of the mutual TLS, that it is not only encrypted but it is authenticated as well.

Then, on top of that, with Consul intentions, which is a service-based firewall rule, essentially, if you're not familiar, it allows authorization to happen. You have both authentication and authorization of your service without having to really configure anything else in Nomad just with a few lines of configuration. Let's look at what that configuration looks like in Nomad.

Configuring Service-to-Service Comms in Nomad Via Connect

To get your Nomad service joining the service mesh, you only really need just a few lines of configuration. In that service block that we talked about before, you just define a Connect block and you declare that you want a sidecar service. With these two lines of configuration, Nomad will launch an Envoy task for you. You don't have to configure that. It will make sure it can talk to Consul and get its configuration it needs to join the service mesh.

Then, when you want to talk out to another service from your application, you just have to have a couple more lines of configuration. You define what are called your upstream. These are the services that you are going to talk to. In this service's case, it defines a Redis service that it's going to want to talk to. It gives it the name of that service as it's registered in Consul, as well as the port that that you want it to bind to locally. What that's going to do is that sidecar proxy is going to bind to that port over the loopback interface. Since that loopback interface is private and it's shared among the tasks in that task group, your application can just talk over loopback interface to that port in plaintext and you're getting all those great security benefits of running on the service mesh.

Nomad's Consul Connect Integration with ACL

That's great. We've had this integration since last year in 0.10 but what was missing initially was support for ACLs (Access Control Lists). Let me run through a use case here that kind of defines why we need that.

Let's say you have a few different groups building services in your organization. You have a group doing a:

  • Users API.

  • Payments API.

  • Web portal.

Each group can deploy their application, define their intentions, everything's only talking to what it's supposed to but nothing's really stopping anyone from submitting a job to Nomad that is masquerading as another service. Maybe this is accidental, someone copied some configuration and didn't change the service name or something. Now, they're launching their application that's registering as the payments API and you're getting errors thrown in this case, or maybe it's a malicious attempt to try and exfiltrate some data or something like that.

What we want to be able to do is provide some tools such that you can restrict this from happening in a way that still keeps the cluster secure. Nomad's always had ACL integrations and it's supported integrating with Consul's, ACL system as well, but Nomad's ACL system is not really built for these fine-grained controls. It's more of a—who can perform the operator privileges in a cluster and who is more of a cluster user and you can just submit jobs and read the cluster, whereas Consul's ACL system is more based on who can submit what services with what names or prefixes and names and whatnot.

Now, like I said, Nomad has supported Consul's ACL system but Nomad servers sort of get a privileged management token and they create the service registrations on behalf of the user submitting the job to the cluster.

What we wanted to do was have a way for Nomad users to submit a job to Nomad with their own Consul token, but there are some nuances to this because what we don't want is this token then to be used directly to register jobs because this token could be a privileged token too, or it could have permissions that we don't necessarily need for that service.

Like I said, this works, it blocks this arbitrary payments job from being able to run and register with Nomad, but I want to jump in and talk about how we architected this to sort of increase the security.

Nomad's Consul Connect Integration with ACL: The Security Story

Like I said, that token that you're submitting at job registration time is not the token that gets used. That token is just used to check permissions and if that token has the proper permissions to register the services in that job file, the job will get accepted and that token is just thrown away. It's only used for checking permissions.

Then, what happens is the Nomad server does its thing. It schedules the job just as it normally would. When the Nomad client gets the job and starts to run that job, it sees that, "Hey, I've got a Consul Connect sidecar task that I need to configure and I need an ACL token for it." Well, instead of putting that privilege token on all your clients in all of your infrastructure, we don't want to do that. We want that privilege token to really stay as limited in scope as possible on the Nomad servers.

So Nomad clients actually call back to the Nomad servers and request a service identity token, which is a token that the Nomad server creates with its management token or its token with ACL write permissions, and it creates a token that is scoped specifically just to the permissions that are needed to run that sidecar. That token is only associated with that sidecar. You get this dynamic token that's created at runtime, lives only for the life of the service and then is destroyed, removed from Consul, and thrown away from Nomad's side. You get a very secure story here using Nomad's Consul Connect integration with ACLs.

Exposing Plaintext Endpoints and Health Checks

Changing gears a little bit, let's talk about some different features that we've added in the last few releases. One of those is exposing plain text endpoints through your Consul Connect sidecar proxy. Let's say, since your application is binding to the loopback interface and that Consul Connect proxy is what's gatekeeping all of the communication in and out. You might have an endpoint that you want to expose in plaintext that's not protected by the service mesh or is not authorized by intentions, and Consul supports adding exposed configurations to expose specific paths or GRPC services through the sidecar proxy and we integrated integration in a Nomad to do this as well.

You can see in this configuration, we're defining a health endpoint so you can do health checking. We're saying we want that defined on port 8080 and we want that to be exposed through Nomad's port mapping on the health port that we've defined in the network block there.

This allows your service to keep listening on the loopback interface but expose endpoints to the host. This is particularly important because of health checks and Consul's check definition that we have in Nomad because if you register a HTTP health check with Consul and your application is only listening on the loopback interface, the only way that Consul can reach into your network namespace to talk to your application is through that Consul Connect envoy proxy. You need to expose that and Nomad's case to the hosts so that Consul can make that health check correctly but this is a lot of configuration just for exposing a health check.

We added some syntactic sugar in Nomad to generate this configuration for you. If you define a check in Nomad, you can set the expose field to true. What that will do is Nomad will generate all of the exposed configuration, dynamically allocate a port, set that correctly, and then use that configuration when it's registering the check so that that call from Consul is routed through that exposed endpoint. These are some of those improvements that happened in 0.11, I call them paper cuts, that make the tooling work really, really well together.

Consul Connect Gateways on Nomad

No talk about Consul Connect is complete without mentioning the word gateway. You can run your mesh gateways on Nomad. This is a very simple example that's just using a raw exec driver. This is running a binary with no isolation to use the Consul binary that's on the host run a mesh gateway and register it.

Nomad 0.12 Networking Features

That's sort of past. Let's talk about today. Starting today, we have a beta available of Nomad 0.12 (now GA). I want to run through the different networking features that are available in that. Mind you 0.12 includes many, many, many, many more features outside of just these networking features but I'm just going to focus on networking for today.

In 0.12, we've added support for Consul Connect natives. This is letting applications join the service mesh natively in the application instead of using a sidecar proxy. Support for CNI, the Container Networking Interface, this allows for the customization of that network namespace and how it joins, how it's configured and how interfaces are added and can join other network technologies that are available.

Then, multi-interface networking, which supports port mapping and binding ports to multiple host interfaces as opposed to just one default one. Let's jump through each of these a little bit.

Consul Connect native.

The one-liner for this is sidecar-less service mesh. The way this works is by you know, integrating all of that TLS, fetching that TLS certificate, and doing those check authorizations into a library that's all done in application code. This is really something you don't necessarily need as you start to enter service mesh, but maybe as your organization matures and you want to squeeze out as much performance as possible maybe minimize complexity a little bit by removing this additional sidecar using Connect native is a good option. HashiCorp, the Consul team, has shipped a full-featured, first-class Go client library for this. Let's look at what that looks like from the Nomad side.

In your Connect service block, instead of defining that sidecar service, we're saying use native mode instead and then select the task that is associated with this native service. In this case, it's going to be this HTTP task that's defined later on.

That client library is really easy to use. Here's a very simple example code walk of it. And if you're not familiar with Go, don't worry. I'm going to walk through each of these. The first line in this here is to create a Consul API client. This is the HTTP API client that talks to Consul. We do this just with the default configuration. The reason this works is that when you select native mode in Nomad, Nomad is going to generate the environment variables necessary to build this client. This default configuration is using default configuration for the Consul API client overridden by any environment variables that are set.

Then, we're going to create an instance representing that service. We're going to do that using this Connect SDK. connect.newservice, the name of the service, and then we're going to pass in that API client we created. Then, we're going to create the HTTP server that serves your application using the address and the TLS configuration from that service instance that we created before. This is what's going to create those TLS certificates and whatnot. Then, we're going to take the server and serve TLS. Now, you're joining the service mesh essentially, a really powerful feature for mature used cases of Consul Connect service mesh.

CNI (Container Network Interface).

CNI positions itself as a generic plugin-based networking solution for application containers on Linux, the container networking interface. What does that mean? The idea here is that since we have a network namespace, there are many different networking solutions in the ecosystem. Things like Calico, Weave, Cilium. These all come to mind. The way that you join workloads to those networking technologies is through their individual CNI plugins.

Let's talk about how this works a little bit. When you want to use a CNI plugin, you have to define a CNI configuration. That's what's on the configuration on this screen. You can give it a name, the type of plugin it is, and then additional parameters for that plugin. In this case, in this example is using the bridge plugin which creates a virtual interface and joins it with a bridge on the host. Then, that bridge plugin has some additional parameters like the name of the bridge, if we're going to use the default gateway and then we have to use an IPAM plugin as well to allocate an address. In this example, we're using a well-known host local IPAM plugin, which is just going to allocate an address from a block of addresses. That's going to be unique just to the workloads running on that host.

The way it works from Nomad's side is Nomad first creates the allocation and its network namespace. It's got that network namespace file descriptor. It gives that to the CNI plugin along with the rest of this configuration. The CNI plugin sort of takes over plumbing in any interfaces that it's configured into that network namespace, setting routes, iptables, rules, anything it needs to do to connect it with its technology.

There are many well-known CNI plugins that have been built by the community and as a matter of fact, Nomad's bridge networking mode is really just an opinionated CNI configuration that uses the bridge plugin, the host local, IPAM plugin and then a few other meta-plugins to do port mapping and configure iptables but there are many more plugins available from these reference plugins and many more that the community has built as well.

Let's look at what actually using CNI in your Nomad cluster would look like. On the left, is our configuration for our client. We've defined the CNI path, the path to all of those CNI plugin binaries, as well as the configuration directory. When the Nomad client starts up, it goes through a process that we call fingerprinting where it detects all of the different facets about the host and one of those is CNI fingerprinting. It's going to go through this CNI configuration directory and go through each CNI configuration and report that back to the server as a CNI network that is available on this host. We've just got an example CNI configuration in this case.

Then, if you want to actually use the networking mode in your network block of your job description, you just set the mode to CNI slash the name of that CNI network. Then, it will find a machine in your cluster that's registered with that CNI network.

The other thing I want to point out here is the use of this port map plugin. What's really interesting is this port map plugin takes a well-known set of arguments to enable mapping ports from inside the network namespace to the host and this is how the bridge networking mode in Nomad does port mapping but that can work for any CNI network. Just have to include this port map plugging as one of the plugins in your configuration chain. If you do that, you're automatically integrated with Nomad's port blocks where it can allocate a dynamic port or a static port and then map it into the network namespace. It's kind of a really neat and powerful feature about CNI and you can integrate with these features just by adding a plugin to your configuration.

Like I said, during that fingerprinting process, the Nomad client is going to detect what CNI networks are available on the host. If you try and run a job with a network that doesn't exist, in this case, we're going to do a Nomad plan, which is like a dry run of scheduling. We'll be able to see how a job is scheduled without actually performing and registering that job.

In this case, I wanted to show an example where we defined a network in our jobspec that didn't exist in the cluster. If we plan this, we would see that it failed to place because there was no network available with that name essentially. If you ran this job, Nomad would still accept it. It would just sit in a pending state until either it was updated with the correct network name or a client joined to the cluster that fingerprinted with that network. It's always going to try and wait until it can schedule things.

Multi-Interface Networking.

A lot of times and this is typical with bare metal hosts, you'll have hosts that have multiple network interfaces. This could be because it has a private interface and a public interface with a public IP address or maybe it's a host that is part of multiple VLANs. You have different network segments to sort of segment the network boundaries, maybe there's a PCI VLAN and an application VLAN, for example. Up until now, Nomad was really only able to choose one network to allocate ports to and do port mapping.

Let's look at a concrete example here. Let's say we have a network load balancer that's running a Nomad and wants to expose Port 443 to the public interface on the host so that it can receive traffic and load balance it to applications in the private segment of your infrastructures network but it also wants to expose a port, in this case, for metrics collection, maybe there's a Prometheus scraper that wants to come along and scrape the metrics periodically from this load balancer. We want to expose that metrics port, but we don't want it to be publicly available. We don't want to have to go through and make sure that there's firewall rules set up on all the hosts to account for this.

With multi-interface networking, we can actually select which network interface a port is going to be exposed to. Let's look at how that's configured. If you're familiar with host volumes in Nomad, host networks sort of take a similar approach. You have to define them in your client configuration, give them a name, and then there's a few different rules for how you can select what interfaces, match that. In this case, we're saying, 'If this client has an interface that has an IP address in this CIDR block, register as a host network name public.' All clients get a default host network. This is the default network that's always existed in Nomad and by default, it's the interface that has an IP address with the default route.

In this case, for that load balancer, if we wanted to expose a port just to the public, we would say for that port, 'There's a new field called host network and a port block' and give you the name of that network. In this configuration, that's public and then that metrics port just uses that private by default network always. We don't have to set the host network in that case.

We populate environment variables a little bit differently now. We have multi-interface host networks. Nomad has this concept of a Nomad IP port label, Nomad port, NOMAD_ADDR label to get you the IP and the port of the different ports that are allocated, but we now have a host side and an allocation side. We want to be able to have that information available to your application at runtime. Now, we delineate between the host and the allocation IP port and address.

Looking Beyond Nomad 0.12: Roadmap

All right, let's look a little bit beyond 0.12. What you can expect us to be working on over the coming year or whatnot.

The first thing I want to get out of the way is some deprecations we're making in the job file. Before 0.10, the only way you could reserve ports and define network was configuration was on a per task basis, and now that we have group-based network namespaces and we can allocate ports at a group level, we want that to be in the way that users define things from now on. We're deprecating the use of a network block inside of a tasks resource stanza. Along with that is this megabits field.

In Nomad, you could say that your application took 20 megabits and you don't want to use that when doing resource scheduling, but we found that to not be a very reliable or useful quantity to do scheduling on. Oftentimes, HTTP applications are so bursty that you're not going to, in this case, consume 20 megabits consistently and constantly. If you had exhausted the megabits resource, most of the time, it was actually in reality nowhere near that utilization.

It's just a metric that's hard to track and especially as we introduce bridge and multi-interface networking where hosts could have multiple interfaces, it doesn't make sense to track megabits on a bridge because it's ultimately using a host interface, which also has its own megabits bucket. For a multitude of reasons, we're deprecating the use of that. It'll work through Nomad 0.12, but expect it to disappear in the future.

All right. Looking into some deeper integrations with Consul Connect, I want to paint this picture about deployments in Nomad.

Consul Connect has these L7 features where you can define service splitting where a one subset of services has a higher weight than another subset of services as it's load balancing among them. In Nomad, there's a concept of a Nomad deployment where when you update your job file, Nomad will roll out that update in a phased sort of fashion. There's some different knobs you can tweak there to control how that happens so you can have a canary deployment where it just deploys one new instance or two new instances, and you promote that deployment when you're happy with it or you can use that to do blue-green deployments where you may have four instances of an application running and you want to bring up four new instances of the application, split the traffic among them, and then eventually bring down the four old instances.

Well, with these L7 features, we can actually do some really powerful phased deployments like this. Imagine in our blue-green scenario, we bring up the new set of instances and then we gradually phase the service splitting the weight, we introduce 1% of traffic, 5%, 10%, we kind of ramp up traffic, Nomad's monitoring the health checks to make sure that we're not getting errors from that that standpoint. Then, eventually they're automatically or by an operator, you can promote that deployment and move all of traffic over and take the old instances out of commission. With the latest and greatest features in Consul 1.8 announced with Ingress and Terminating gateways, we want to be able to offer deeper integrations in Nomad with this as well. Expect to see work on this in the future as well.

All right, I'm going to take off my engineer hat here for a moment and try and put on my thought leading hat, where are we going with this? What is this culminating in? CNI and multi-interface networking are interesting features but where do we want to take this networking model ultimately?

Let's look at how the industry has progressed. We're moving from these monolithic to microservice infrastructures, from single physical servers all the way to these virtual machines with carved up container workloads running. At the same time, we're seeing multi-platform adoption as well for a variety of reasons. Sometimes, this is because of migration patterns, migrating to new technologies. Sometimes, this is new on a classic sort of container orchestrator as well as serverless infrastructure as well. There's always sort of this multi-platform, multi-runtime problem that's getting bigger and bigger.

Then, on top of that, as well as multi-cloud organizations large and small are looking to take advantage of multiple cloud providers looking at public and private, on-prem and in cloud options. The breadth of options and configurations that are available are always growing, but at the end of the day, you, developers, operators, security professionals, technical support, DevOps folks are the ones that have to actually manage everything. You're the one that gets stuck cleaning up the mess.

This is my dog. His name is Bo. He's a Great Pyrenees. He's usually all white, but not on this day. Where are we going? How do we clean up this mess? Well, in the beginning, we started with in this monolithic architecture, we had our firewall, which sort of maintained who could come in and out of our infrastructure in this castle-moat model, if you will.

Then, as we started going to microservices, that configuration grew in complexity. We got security groups and that lets us help group things together, but we're still mainly looking at IP and port-based management of security.

What do we need? What can solve this? Really the solution there is service-based identity and security with Consul Connect managed by Nomad. In this world, you don't have to worry about IPs and ports. You just have to define service A can talk to service B but cannot talk to service C. Everything else is sort of just handled for you under the covers.

All that managed by Nomad is really where I'd like to see things move. For me, I've been working on CNI integration, and multi-interface networking for the last couple months now but CNI is a very great technology but it's not the end-all-be-all for Nomad in the way that you connect your services together.

Consul Connect is really the place that we can provide the securest and the most feature-rich environment to deploy and manage your services at any scale. I appreciate you all tuning in to this. If you have the chance tomorrow, please, please, please go see my colleague, James talk about the Nomad Autoscaler tomorrow. It's going to be a really great talk. Please go and check it out. With that, thank you so much. Hope you're all safe and have a wonderful day.

More resources like this one

  • 3/15/2023
  • Case Study

Using Consul Dataplane on Kubernetes to implement service mesh at an Adfinis client

  • 1/20/2023
  • FAQ

Introduction to Zero Trust Security

  • 1/19/2023
  • Presentation

10 Things I Learned Building Nomad-Packs

  • 1/4/2023
  • Presentation

A New Architecture for Simplified Service Mesh Deployments in Consul