Integrating Consul Connect with HAProxy
Jan 23, 2019
Build your own open source service mesh with this demo using Consul as the control plane and HAProxy as the data plane.
Monolithic networking topologies don't work as well in software's increasingly service-oriented architectures (the popular microservices architecture, for example). There's a greater demand now for tools that less network-specialized engineers can use to make simplify networking in service-oriented architectures, providing strong and reliable authentication and authorization between individual services through a centralized interface. The service mesh pattern is currently a popular method for meeting these new requirements.
A service mesh is composed of two main components:
A control plane (this talk will use Consul and Consul Connect for the control plane): This component manages service registry and authorization / authentication via service segmentation.
A sidecar proxy/data plane (this talk will use HAProxy for the data plane): This component sits close to the application, managing incoming and outgoing traffic and applying the policies managed by the control plane.
In this session, Baptiste Assmann, a principal solutions architect at HAproxy will demo a service mesh setup with a deep dive into Consul and HAProxy integration.
Principal solutions architect, HAProxy
Thank you for coming. The challenge today is to build a service mesh using HAProxy and Consul, and we’ll see how we made that happen.
Quick agenda: We’ll be introducing “service mesh” from a definition point of view, how you can build it with Consul Connect, and then how HAProxy will integrate in the mix, and what will be the next step.
» ‘Service mesh’ defined
So, what is a service mesh? “It’s a dedicated infrastructure for handling service-to-service communications.” We took that definition from Wikipedia, but I think the definition from Lakshmi [Sharma] was more interesting. She said that it’s separating the business logic from the network functions. Basically, you want your application to focus on business, and you don’t want to manage anything related to infrastructure within your application.
That said, your application may need to call other services, other databases, etc., and because of that, you will need to access your network. But we’ll see that the service mesh will help you fixing that. So that’s the purpose of the service mesh.
The service mesh itself is split in 2 layers. One layer is a control plane, and the other layer is the data plane. Of course, as you may understand, Consul would be the control plane and HAProxy would be the data plane.
» Main features of a service mesh
The most important is service discovery, of course, because your application has to reach out to other services, so it has to find out where those services are located. The idea is that you want it to be as transparent as possible.
Service discovery means that your service mesh will maintain a list of endpoints available for whichever service you want to reach out to, and it’s going to allow you to get connected to that service.
The purpose of load balancing is to achieve high availability and to spread the load over your multiple nodes delivering a specific service.
This is when you want to cipher your traffic on the way out so that you are sure that nobody can inspect it. It’s quite important nowadays, because more and more people are deploying their application on premises and/or on multiple clouds. As soon as you cross the internet, as soon as you are on an untrusted network, then you want to cipher everything outside your box.
We are speaking about HTTP routing. We are not speaking about Layer 3 routing, BGP, etc. The idea is that when you have service-to-service and some of your services don’t really know where to find the other ones, they will speak to HAProxy, and HAProxy will say, “For this URL path, for this whatever, I want to go to this service.” That’s what we call HTTP routing.
Authentication is when you want to ensure who people are, and you want to trust who they are, and you want to allow them to trust who you are.
Authorization would be firewalling, like allowing or denying service-to-service discussion.
Circuit breaking is the function which allows you to check if a service is available or not, to make routing decisions based on that. Stake and node are in the same pool, etc.
From an infrastructure point of view, you have 3 main components. One is your central service registry that will house your configuration. Then, on each of your machines—by machine I mean a box where you have an application, so that could be a VM, that could be a bare-metal server, that could be a container, a pod in Kubernates, etc.—your registry server will ensure the consistency of your global configuration, etc.
Then you have the registry adjunct that will run inside your box or machine. The purpose of the adjunct is just to be a gateway between the sidecar proxy and the central server. It manages a configuration of the local proxy as well, so you will have to tell it what services it’s supposed to access locally and what upstream services it’s supposed to get connected to for the local service.
Then we have the sidecar proxy. The sidecar proxy is a software load balancer. It ensures that the ingress traffic is trusted and can reach the local service, and it also does service discovery, load balancing, etc., of the upstream services that this local service wants to get connected to.
» What happens with the service mesh
Now, what happens? I have a client who wants to use my Service A (that could be whatever). This client will get connected to an external IP that I call “EIP,” which is hosted by the sidecar proxy. The sidecar proxy will do what it has to do. (We are going to define this a bit later, mutual TLS authentication, etc.) If the traffic is allowed to come in, it’s going to forward it to the local application node. If your application has to unswell, it’s going to unswell back to the sidecar proxy.
If your local Application A wants to talk to Service B, hosted on another pod, container, VM, whatever, then it’s going to talk to the load back where your sidecar proxy will be listening for the traffic. Then your sidecar proxy will get connected to the other sidecar proxy, doing the mutual TLS authentication, and then the other service will allow the traffic to go into the local service as well.
Service A could be a Node.js application. Service B could be Redis, a database, whatever you want.
» Advantages of the service mesh
First, as we said that we want to split business application from network functions, proxies are a positive return to managed network. That’s why the sidecar proxy would make sense in that part.
The idea is that all your service-to-service communication will be very simple to make. You will just rely on the sidecar proxy. That will do everything for you.
Proxy will also come with built-in support functions for most of the requirements we see in the first slide. So, HTTP routing, handshaking, load balancing, etc.
Also, last point, which is very important, you can let your application focus on your business and not on your network.
» Disadvantages of the service mesh
There are a few cons. It could look complicated to set up. As soon as we mention TNS, client certificate, etc., people sometimes get crazy. “It’s not going to work. I’m going to do insecure stuff everywhere, like the net, etc., allowing all that stuff.” It may be scary at the beginning.
Another point is that you will have to inject a sidecar proxy plus controller, etc., within your machine managing your application. It could be too complicated to manage, to deploy, to monitor, etc. So, that’s another problem.
Some people say that service mesh allows you to do service-to-service communication without changing a single line of your code. I would like to see that. In my case, it worked because you can bind your DNS name to loopback and then to the loopback interface. In that case, it would work. But in some other cases, I’m pretty sure that there will be some problems.
» Building the mesh with Consul
Now, we want to build a service mesh using Consul. The module that allows Consul to build a service mesh is named Connect. What I like in Consul is its Unix-like philosophy: Do one thing, and do it well. Well, it’s more than one, but it focuses on service discovery, service registry. It has a simple GUI to see everything. The same binary can be used as well as a server and a client, and it cannot embed a sidecar proxy. And it’s API-driven, so we like it, because we can integrate easily with it.
Now, how will authentication work within Consul? First, it’s a question of trust. Services must be able to ensure that the service they are trying to reach is really the service it claims to be, and the server-side service wants to ensure that the client really belongs to the same organization as it. If the mutual trust or authentication can’t be done, then of course the connection will be denied, and you won’t be able to connect one service to another one.
» How Consul helps
This could be complicated. Consul will fix that for you. Consul will pre-build for you a dedicated CA, and it’s going to generate the certificate for you. You can plug Consul with Vault, so you can have your own CA, and Consul will just use it and expose the certificate to the sidecar proxy. We’ll see that a bit later.
Authentication will happen mandatorily between the sidecar proxy. You can also use authentication on the external IP, and we’ll see later that we can do customized configuration, so we can create multiple binds for the external traffic, some that would be untrusted, so no authentication, some that would be trusted with strong authentication, with client certificates, etc.
Of course, if you open a service on the internet using that, you don’t want to force all internet users to have a certificate generated by your own CA. That’s not doable.
The authorization piece, once again, is going to happen between the 2 sidecar proxies. What will happen is that your Service A proxy will get connected to your Service B; mutual authentication will happen. Everybody’s happy with that. Then Service B proxy will just take from the client certificate the SPIFFE string. It’s a specially crafted X509 certificate, and we have a string with a service name, some kind of a service ID, etc.
Your Service B proxy will just take this one, send it to your local Consul node, and your local Consul node will return you 200 or 401, based on, “Is Service A allowed to talk to service B?” For each incoming request, each incoming connection, this will happen. Which means that, if you change your rule, the next connection will be denied or will be reloaded based on the changes you made on the rule.
The name in Consul is an “intention.” We’ll see that in the GUI right after.
» Configuring the proxy
Then we have the proxy configuration, so of course, since we must start up a sidecar proxy, we must provide it some information on how it’s supposed to get configured to load balance the local service. So the sidecar proxy will be given a proxy ID and a token, and when the sidecar proxy will create the Consul node, it’s going to do it with the proxy ID, and then the local node will be able to deliver the configuration it’s supposed to execute.
We also have access to the list of the upstream services. Remember in the previous diagram, we would have a list of, “This local service is supposed to get connected to Service B. And on Service B, this local service, there is no list, so there is no upstream, so no configuration for that part.”
And then of course, the local proxy will get access to the route certificate from your CA and also the TLS certificate used by the local service. This TLS certificate will have 2 purposes: decipher or cipher the traffic with the client. So a server-side certificate and a client-side certificate to get connected to the remote services.
Because remember, in that client-side certificate, we have the SPIFFE string that will be used to do the authorization part.
So a quick, deep dive if you want to do yourself a software that will use the Consul piece. So in the Consul API, you will have what we call the “target service name.” This is the name of the local service that we are going to load balance, to give access to. In my previous case, that would have been Service A. You would have the local service address. This is the external IP address you want to use to expose this service to the other node. And this is this IP address that will be used by Consul in the node list for this particular service.
You have a leaf certificate, as I said before. This is called in Consul a leaf certificate. This is a famous certificate with the client- and server-side thing. Local service address is where your proxy is supposed to route the traffic to find the local name and bind address, and bind port is where we’re supposed to expose this service to the external other servers.
» Connecting upstream
Then we have the upstream. The upstream is a list in Go, and it’s a list of services that this local service is supposed to get connected to. And for each upstream we will have the IP port to bind locally. So for example, in my previous example, my Node.js server would find my ready server on loopback 6379, for example, or whatever. So this will be provided there, and it will be also called “destination name” into the API.
So, a quick JSON piece that is describing my Service A. This is a Node.js server that is going to bind on the IP address 192-something. On the port 8000. So if I get connected on the port 8000 and I don’t have the client certificate, HAProxy is going to deny me. I will have to have the client certificate to get connected to it.
Then we can see a list of upstream. There is only 1 for now, and one of the upstream services is Redis. So Server B, in my previous diagram, would be that Redis service. And we see the local address where we want to bind to. There is no destination address, because to find out where to get connected to that Redis service, it’s simply a Redis service into Consul. So to know where you can get connected to, you have to go back to Consul API. Say, “Give me the endpoint of the nodes for the Redis service.” Get it, and you can route the traffic to that one.
» Simple to use and built into Consul
So Consul Connect, as I said earlier, will come with a built-in proxy. The advantage is that it’s very simple to use. You don’t need to install anything. It’s already built-in with your client. That said, its purpose is to be there, to be very simple to use, but there is no other feature inside. So basically, if you want to go in production with that, you can. But you will see later that the HAProxy has some advantages.
Now, I guess some people here know already HAProxy and may wonder how we can make this happen. So here is the answer. HAProxy, once again, is also a Unix philosophy by itself. It’s a simple, I would say, load balancer which is very, very powerful actually. The more Unix philosophy kind of thing is I take 2 software which are dedicated and I make something very great with that, which is building our service mesh.
As you may know, HAProxy is pretty static. We are making it more and more dynamic. But HAProxy itself won’t be able to get connected to Consul to configure itself. So what we are going to do is to integrate Consul and HAProxy using a small Go binary that I wrote. And that Go binary will be what I call “the controller.”
» The role of the controller
If you remember the JSON file, the JSON was configured to code a daemon whose name is “HAProxy controller.” Of course, Consul won’t manage HAProxy. Consul will manage the controller, and the controller will manage HAProxy.
How it works
What is the traffic flow now? Something a bit more fancy. So, when a connection comes in on the external IP, if it is on the second part, then HAProxy will do the TLS with the client and show that the client is trusted, etc. And then we’ll take the SPIFFE string, and then we’ll send a request to your local Consul adjunct to ensure that this client is allowed to get connected. If it’s allowed, traffic follows it to your Node.js application. The Node.js application needs to talk to Redis. Whatever Redis is, it doesn’t know, but it has to talk to it.
Then the Redis connection will be followed to you local HAProxy. In that case, the configuration will happen sooner. Your local HAProxy will just get connected to your remote HAProxy—mutual TLS identification, everything is fine. Then your remote HAProxy will take the SPIFFE string, and with whatever service name from Server A, create that into your Consul adjunct locally. Your Consul adjunct will return 200 401. If that’s 200, then HAProxy will allow it to go in.
What happens at startup? So when everything is down, you will start up and then you will have some kind of convergence. So your Consul server will just get all the services registered inside, and your HAProxy controller will also get connected to your local Consul adjunct and will get the configuration from the Consul adjunct. Each time there is a change on the Consul adjunct, then the controller will also replicate the changes into the HAProxy configuration. And each time there is a change in your service in your upstream, then the adjunct will also replicate these changes.
» Consul can update you on hash changes
So in Golang, this is translated into SDK. We use a Consul client SDK or the Consul adjunct SDK. We use it to get information from Consul, and we use it as well in what we call the “blocking mud.” Blocking mud means that you have a hash corresponding to an index, depending on the piece that you are monitoring. You have a hash corresponding to your current configuration. You can say to Consul, “I want you to update me as soon as the configuration has changed from this hash.” So basically your request will be blocked by Consul until there is a change, and if a change occurs you will get the new configuration with the new hash. Then you can apply the configuration then open a new connection on Consul and say, “Hey, now I have a new hash; just update me when this hash has changed.”
» The HAProxy client SDK
And then we have the HAProxy client SDK library. So at that point in time, you may ask, “Wow, where was it hosted on GitHub?” So basically it’s sitting in GitHub for us but it will also be ready for XXX. The idea is that our client SDK allows you to manipulate HAProxy without even knowing what the HAProxy is. What you want to do, for example, when you want to create an upstream, you want to create a backend. And then within that backend you want to create servers, which are the nodes of your services. Just coexist functions from a web SDK and the magic will happen for you.
That is the same when you want to expose your local service on the external IP. Then you can use the create frontend, create a listener with the permission you want, and the magic will happen.
So part of this dev will be released into the community by the end of the year. And we are building as well what we call the HAProxy controller, or the adjunct controller. That will be a smart binary in Go. The code will also be released in Q1 next year. And the idea is that our controller will be aware if some changes can be applied on the fly. So we are going to apply them on the fly. And if some changes need to reload HAProxy, then we are going to reload HAProxy.
All of this logic will be hidden from you and you don’t have to bother with it.
Now we have to do the authorization. The authorization in HAProxy will be done in Lua. Lua allows you to execute your own code within HAProxy, and you can do a lot of things. If you are familiar with HAProxy, there is one thing that we call the HTTP requests, which allow you to create rules based on whatever is happening.
» Allowed or denied
So what I do here is I have either an HTTP or a TCP request based on the protocol being load balanced. So TCP when it’s Redis, HTTP when it’s Node.js, or whatever was the application. And this Lua code will just ask HAProxy, “Hey, forward me the TLS client certificate.” And then from that it’s going to extract the SPIFFE string, and then it’s going to get connected to the Consul server, get the response corresponding to that SPIFFE string, “allowed” or “denied.” If that’s “allowed,” it allows the connection in. If that’s “denied,” denied. As simple as that.
» On screen: The Consul GUI from the service mesh
Now, are you familiar with that? I guess so. Otherwise you would not be here. This is a small Consul GUI extract from my wonderful service mesh I have in my laptop. We see that we have the Consul server itself. We have a Redis service and a WWW service. Each time we create a sidecar proxy, then you will have a dedicated service called “service name-proxy” for this particular service. So we see the sidecar proxy for Redis and the sidecar proxy for WW, which is my W application.
Now, in the Consul GUI, you also have what we call the “Intentions.” So the Intentions, remember, are that firewall rule set, which allows you to allow service-to-service talk. It’s highly recommended to set your Consul policy to “deny by default,” because if you don’t set it to “deny by default,” every service will be allowed to talk to every service, and then you will have to manage yourself the intentions, which say, “This to this, no,” “This to this, no,” etc. Even if they are a white card, it may be painful because each time you create a service, you have to create the intention. In my case, I don’t care. I blocked everything by default and I just open what I need to do all the time. That said, when you open everything by default, it’s good for testing that everything is working and we’ll see that the HAProxy will help you fixing that.
So in my case, I have created an intention which says, “Service WWW is allowed to talk to service Redis.” And basically, when the request comes in HAProxy, my Lua script is going to query my Consul adjunct, and the Consul adjunct will check the rule set and will say, “I have a rule which matches WWW to Redis. Allow.”
» The service mesh demo
It’s time for the demo. So, that’s my service list. On the right, I have my wonderful curl command, which can be used to execute some HTTP Git. I get connected to a specific port, Port 8002. If you remember in my JSON file, I said that 8000 would be the port that is secured. So if I go on Port 8000 without my certificate, and I need to do SSL, I will be denied by HAProxy because I don’t have the SSL client. HAProxy will load you something to say, “Hey, somebody came, TLS unchecked,” etc.
So let’s go back to Port 8002. So I still have a good answer. Here are my intentions. So I have an intention which says, “WWW is allowed to talk to Redis.” Nice. I want to break that. Save. OK. And now, my Node.js application is returning a 500. So what happened is that my Node.js application got connected to the local proxy. The local proxy got connected to the other proxy, TLS authentication, etc. The other proxy said, “Hey, Consul, tell me if WWW is allowed to talk to Redis.” Obviously not. So my Node.js application just returned a 500 saying, “Well, I can’t get connected to Redis.” That’s the point.
We can also delete it, just to show you that it’s important to have a denial by default. And of course, if I delete it, I’m still not allowed to make that happen. Just to ensure that it worked, either we could say, “Everything to Redis,” or “WWW to Redis.”
I forgot to show you the stats page of HAProxy, but that’s fine anyway. So the idea is that for each sidecar running, we can have statistics exposed, etc. And you would see that in your stats page from the WWW service, sidecar, you would have the WWW service frontend and backend, plus the Redis frontend and backend. And if you go to the Redis sidecar, HAProxy sidecar, then you would have a single frontend and backend for the Redis service itself. So I purposely took TCP base service because HAProxy’s very famous for HTTP, but now you know it works as well on TCP, on rogue TCP.
Now, I said earlier that HAProxy would be better than Consul proxy itself. Why? First, we can do L7 routing, which is not available in the built-in proxy. Second, we have the best—and that’s not us saying it; that’s the guy from Google who maintains this page—we have the best SSL TLS stack. so you will benefit from perfect firewall secrecy, all that stuff, outside of the box with HAProxy, with almost nothing to configure.
From an observability point of view, HAProxy is very famous for importing a lot of statistics that you can expose yourself into Prometheus, etc. It’s also very famous for being very verbose at logging, and we’ll see that in the next slide. Something also very important: We think that most of the services you’re using today have an HAProxy running somewhere.
» Versatile HAProxy
We can do rate imaging, we can enable WAP, we can do a lot of things with HAProxy that you won’t be able to do with the built-in proxy. Something also which is interesting is that HAProxy can use what we call an “adjunct check” that can be connected to the other proxy to get the pressure of being done on the other proxy, and the other proxy could report to the upstream proxy, or to the downstream in that case.
Now, the advantages. I said earlier that we can be verbose at logging. So we have 3 log lines. HAProxy will just log everything. We have the SPIFFE string. So if one of your customers complains or if one of your services can’t connect to another one, maybe the client certificate is not right, etc. You will have the SPIFFE string that you can check within your infrastructure to ensure that this is the right one. You will have a status whether it has been “allowed” or “denied” to get connected. All of this is issued from the HAProxy logs. You have all of this out of the box. And last is the reason why the connection has been denied.
Remember, I did 3 tests. One without any rules, one with a “deny” rule, one with “allowed” rule. The first line corresponds to the default behavior when my default ACL is denied. And this is what Consul reported to us.
The second one is when there was a deny rule purposely for WWW to Redis. You even have the rule number. Remember, in my GUI, I also have the rule number, so you can match it quite quickly. If there is something happening, you have an answer.
I did a talk last week in Paris about HAProxy observability, and 1 hour was not enough to explain all the things we can do, so I won’t be able to make it here. But we have a lot of things that will be useful for you.
» Yet to come
What’s next? This is the last slide. I saw a lot of traction about Consul Connect somewhere else. So I would like to see a Consul Connect HAProxy at some point. Of course, it’s up to us to raise awareness of Go SDK. This will happen later, hopefully, and we’re going to work together with HashiCorp to make that happen.
We want to configure telemetry as well. We did not speak about that, but telemetry’s a Prometheus piece. It’s available in the API and we can make this happen as well. A more turnkey solution would be the Consul Connect, Consul HAProxy thing.
Something that I did not show is what I liked in this solution. I never tried Istio, and after 1 day of trying to make it work, following step by step what is written on the GitHub account, I just said, “Well, OK. Service mesh, we’ll see later.”
And then I saw the announcement of Consul, and since I know Consul is simple to deploy, I said, “Well, I will spend half a day as well on this one,” and it worked. I first did an implementation using bash. It was very simple to set up.
What I liked as well is that you can put your own entries into the JSON file. I did not show you that but, for example, for my Node.js application I said, “You can enable the HTTP filtering mechanism within HAProxy because we know this protocol will be HTTP.” For the Redis piece I said, “I want HAProxy to run L7 Redis check.” And I said, “This application is pure TCP. Don’t enable the HTTP filtering, etc.”
You can do whatever you want, and then based on the keyword you put into your configuration, you apply the relevant piece into your HAProxy. One keyword could be “enable WAF.” Then in your XXX file, you can just put the configuration which enables WAF.