Over the past year, service mesh technologies have gained significant interest. Even though the idea of a service mesh isn’t new, the implementation details are new to some people. HashiCorp Consul is an open source tool that provides service discovery, health checking, load balancing, and a globally distributed key-value store. These features make Consul ideal as a control plane for a service mesh. This post discusses a few first principles around adopting service meshes and how Consul can be used as a control plane for projects like Istio, Linkerd, and Envoy.
Before diving into service meshes it is helpful to understand first principles and the options that are available when thinking about technologies such as the service mesh.
“Dumb Pipe” design focuses on simplicity and makes an assumption that the network is "dumb". This design framework can be best explained using the End to End Principle. It states that, communication or networking features should be defined as close to the application as possible hence the applications end up doing a lot of heavy lifting (retries, backoff, circuit breaking, request routing etc). As a result there might be redundant code in every application to re-implement these features. Often times load balancers are used to implement some of this functionality outside of applications. This makes the network simple to operate and easy to reason about.
“Smart Network” design takes an out of band approach when it comes to retries, backoff, circuit breaking, load balancing etc. It provides a network based implementation of these features so multiple applications can benefit from them with little to no application implementation work. This is essentially the approach taken by a service mesh. Features like telemetry, traffic shaping, service discovery and network policy control can be provided out of the box as well. However, the service mesh becomes an implicit dependency of the application and it becomes harder to reason about the system as a whole. As an example, request retry behavior is potentially specified in both application source code and network configuration.
Much like networking, service mesh usually consists of a control plane and a data plane. This separates the performance sensitive data path from the rest of the system.
The control plane is responsible for making decisions about where to send the traffic and to configure the data plane. Additionally, it is also responsible for features like network policy enforcement and providing service discovery data to the data plane. Since the control plane is a critical component of the service mesh, it is must be highly available and distributed.
The data plane provides the ability to forward requests from the applications, and is the data path, hence the name. A data plane may also provide more sophisticated features like health checking, load balancing, circuit breaking, timeouts and retries, authentication, and authorization. The data plane is in the critical path of data flow from one application to the other and hence the need for high throughput and low latency.
Protocol awareness is another important factor to consider when designing the data plane. The data plane might be implemented at different layers of the OSI model and may or may not be protocol aware.
Let’s take Layer 4 - Transport layer for example. Two common examples of Layer 4 protocols are TCP and UDP. TCP is used for most common protocols like HTTP, SSH, SMTP, and most databases. UDP is used for latency sensitive applications like VOIP, video conferencing, and peer to peer protocols. A data plane that uses this layer can be considered “universally” compatible since it uses these lower level protocols such as TCP/UDP to perform request forwarding. This data plane can provide high performance since forwarding is done without considering the contents of the packet. One of the drawbacks with this approach is that it is difficult to provide sophisticated or request aware features.
Now let’s take Layer 7 - Application layer for example. A data plane that uses this layer is application aware and can use additional information to perform complex routing decisions. For example, by parsing HTTP requests the request contents such as path or headers can be used to change routing and forwarding behavior. One of the drawbacks of this approach is that it can yield lower performance since contents of the packets are being inspected. Another challenge is the diversity of application level protocols. While HTTP is very common, there are countless other protocols that are used.
Consul provides many features that make it ideal for acting as a control plane. The architecture of Consul ensures it is highly available, and supports multi-datacenter topologies. One of the primary goals of Consul is providing service discovery which can work with both Dumb Pipe and Smart Network approaches. In the Dumb Pipe design, Consul can provide a first class DNS interface that can be used by applications to discover other applications and communicate directly without using an intermediate data plane or service mesh. Additionally, these applications can use Consul’s key-value store to store retries, timeouts and circuit breaking settings and request them when needed.
In the Smart Network design use case, Consul can provide service discovery and health checking information via an API that configures the data plane. The Consul K/V store can be used for persistent state such as network policies as well. In a previous blog post, we explored how Consul can be used for load balancing, this might be a good read when looking at common strategies for load balancing with Consul in the microservices architecture.
The goal of a service mesh is to provide service to service communication along with some higher level features like observability, policy enforcement, retries, backoff and security. It is composed of a control plane, a (typical) layer 7 data plane along with adaptors that might provide things like policy enforcements etc. Consul can be used as the control plane while there are many choices for the data plane that include projects like Envoy, Linkerd, NGINX, HAProxy, Traefik and Fabio, among others.
Pilot aims to abstract platform-specific service discovery mechanisms and provide a standard data format that is consumable by the data plane (Envoy). Since Consul provides rich service discovery API, Pilot can be configured to use that data to discover services running in a datacenter. Instructions for installing Istio alongside Consul are available at https://istio.io/docs/setup/consul/.
Mixer aims to be a policy enforcement component of Istio. It moves policy decisions out from applications into configurations that operators can manage. Consul has a key value store that can be used to store policies defined by operators and Mixer can then interface with Consul and enforce those policies in the service mesh.
Istio-Auth aims to provide service to service end user authentication using mutual TLS and also provide identity to each service running in the mesh. At HashiCorp, we also build Vault that has a PKI Secret Backend which can be used to generate certificates on the fly. This feature could be used by Istio-Auth to provide certificates to the data plane, which would enforce mutual TLS for service to service communication.
Consul provides a number of features which make it ideal as a control plane for a service mesh. This post details a few first principles that should be taken into account when looking at service meshes and shows how Consul can be used to drive a service mesh.
Interested in more Consul's features like service discovery, key value store etc? Get started by checking out https://www.consul.io/downloads.html.
The latest version of HashiCorp Consul on Amazon ECS adds support for AWS IAM authentication and mesh gateways.
HashiCorp Cloud Platform has added several new capabilities, including managed services for HashiCorp Boundary and Waypoint, and Drift Detection for Terraform Cloud.
The latest release of the HashiCorp Consul API Gateway allows users to generate multiple instances of a logical gateway — avoiding single points of failure.