Presentation

Fabio: A Stateless Load Balancer

Fabio makes it trivial to deploy and refactor microservices, by providing a stateless and high-performance load balancer based on the services registered in HashiCorp Consul. Every service change automatically reconfigures Fabio without restart and without noticeable latency.

Fabio solves the problem of managing an ever-changing landscape of small services in a continuous deployment and delivery pipeline by automating the configuration of a critical component: the load balancer. Written in Go and deployed as a static binary which can be run without any configuration, Fabio applies HashiCorp's approach of simple but powerful tools.

Fabio is currently driving some of The Netherlands largest websites, including marktplaats.nl.

Speaker

  • Frank Schröder
    Frank SchröderSoftware Engineer

Transcript

okay good afternoon everybody my name is Frank Shweta I've worked for the eBay Classifieds group here in Amsterdam and today I want to talk to you about Fabio a fast modern HTTP router I've written and go so I think the first question that you could ask yourself first you know why would anybody and in 2015-2016 ride a new HTTP load balancer if we already have a ----load of them out there so I mean we have nginx we have Apache we have squid varnish those let the betted battle of battle test the tools that have been around for a long time they have like ----loads of features we have expensive appliances that we can use in our corporate environments like the big IP and the NetScaler and also other tools out there like traffic and volcon which are more also more in the modern HTTP load balancer space so the primary reason is that about a year ago we started looking into Consul for service discovery I've been working with gold for the last four and a half years and I noticed that you know within a within a couple of lines of code initial prototype was something like 350 lines of code I was able to do something useful and it was and was scaling quite well so short answer is well no I wrote a load balancer because I could and it turned out to be useful so why would we want to have another load balancer in the first place because I mean these tools that have been around for a long time are actually as I said proven battle-tested I think our application landscape has been changing over over the last couple of years quite significantly so we're looking at micro-services agile development constant refactoring and continuous integration delivery if you're running in in a cloud where you have to you know paper per minute or per hour you're looking at auto scaling you're looking at cost and most if not all at maintenance time so how long does it take you to actually manage all these different infrastructures in environments so if you start out with you know micro services with static routing you know like you have hacci load balancer you in essence you look like something you look at something like this where you have like individual services which you each serve what's a different end point slash user slash product slash order and you have you have a static configuration file that sits in your Apache or nginx which then routes incoming traffic to your to your back-end services if you take then the step to a service registration because you are sitting in a more dynamic environment then you can simplify this a little bit because your services are now being registered in in Consul you can tools you can use like or you know etcd or zookeeper or you know any other environments or decisions this isn't specific to Consul but then you can use a tool like Consul template to extract that information and then build a configuration file for the load balancer like H a proxy or Apache so however what you've done now is that you've now just move the vital configuration which is essential to run your application from the load balancer itself to Consul templates so now it becomes important that when you upgrade the template you actually do this in sync with a production deployment so because you now have you still have one moving part that is required to one additional moving part in addition to your application which is required to run your application which is this specific piece of configuration so the interesting thing is that services already know which routes they accept because I mean you have to write a handler for them so there are some where there is a servlet there is a you know a handler function something that accepts a request for a slash user slash product so your service and your application already knows which routes it accepts because you have to write code for that so if we were able to push these routes into the service registry along with the host port information that you're registering itself then we would have all the information that's necessary to build a routing table and in short that's all that Fabio does so I could stop the application presentation right here because this is the main the main core or the main feature of Fabio the service tells the service registry I'm listening on this address this this IP address this port and I'm serving slash user and slash product so and then Fabio just extracts this information from consul bills routing table and that's all you have to do so constable unfortunately still doesn't support metadata or freeform metadata in along with the service registration so we have to use somewhat note them I think you know nowadays accepted hack to put stuff into the tags which means that for in order to use Fabio you will just add a tag that says URL prefix - and then you put the host in the path of that specific of the specific path that your application is serving in there and this is enough for Fabio to detect which applications or which routes you or your service actually expect so the URL prefix tag hack has has a couple of advantages so from one it's an atomic update with the service registration so if you would use the Cavey store you would have to do two calls it would not make it simple for third-party applications like registrate err to use the same thing because it would have to kind of simulate that behavior there is no cleanup lingering elements in the in the Cavey store that you would have to remove so this this all makes it easier so what does that look like um so you have your three services let's say service a service piece or receive which are all running on different IP addresses different ports and each of them is registering one tag and then Fabio just extracts this information in essence just flips this around and generates a simple domain-specific language which says route add the host path and then the IP address okay so let's demo this let's see what this looks like okay so first I have a Consul instance running which can show you here so Consul is running nothing here come so let's start a service in this case I'm calling it service a which is listening on localhost port 5000 and it announces prefix 2 so the service is now registered in it doesn't this work the service now registered in in Consul we can see that it has a green health check so which is also something that Fabian needs it will only include services that are considered working in order to build a routing table so now we can start up Fabian and as you've seen I didn't run it with any any configuration so I just run the binary it assumes that or it has same default which means it assumes that it runs on machine which has the local which has a Consul agent running as well it has detected the configuration change in Consul and it has build a routing table based on that so what we can do now is that we should actually be able to query this to query this endpoint let's see and we're getting a response from service 8 so now let's start another service which also registers service on the prefix who and also this the prefix bar and let's call this service B and why this is something that could be useful I'm just gonna explain in a minute so this one is running on a different on a different port we can see that Fabia house has detected a router table change um there wasn't really any noticeable latency here and when our query now the services I get sometimes service a is service B and the reason it's not flip-flopping no exactly one by one is that by default it's using a random a random distributor and not round-robin so you can also tell it to do you know exactly round-robin I'm between the different instances but this way it's sometimes the one and sometimes the other um so if I want to look at the at the routing table that Fabi has built there is also a user interface which you can see here so we can see that we have service be listed twice service a listed once because service B and service a both announced path foo and service B is the only one which announces service bar the other thing that we can see is that both service a and service B are now getting 50% of the traffic for the four requests to service to end point food ok so let's go back here so the config language that Fabio generates looks like something like this and I and I created this in two colors I hope nobody's green red blind here so the first three lines are green and the the bottom two lines are all red and what I'm what I want to show with this is that the first three lines is the stuff that is auto-generated so but what happens if service a is actually announcing a route or a path that it shouldn't have done so if you have a load balancer that has an automatically generated routing table well the only way to fix it is if you deploy a new version of that service that announces the correct routing table so depending on how quickly your pipeline runs this can take seconds minutes hours in most cases it is probably way too long and it it then then that this is usable so you need a way to actually manual to make manual changes to fix broken deployments and this is what the two red lines are therefore you can amend the routing table with with some manual commands in the same in the same language and that's also the main reason why I have opted to use something that is human readable because it actually humans that would have to write these these kind of exceptions so that you can fix a broken routing table until your next deployment is proper done so they set the manual on the manual overrides as I meant as I call them they are mostly there to fix to fix broken deployments to make small tweaks until you've properly fixed your services but then there's also another another feature in there which allows you to do um dynamic traffic shaping so which means that you can match against a certain set of services let's say services that register an additional tack let's say red or green tomorrow or a date and then you can say well please route a certain percentage of the traffic to these number of services so what Fabia does is it will route this percentage of traffic to these instances of services independent of how many instances you've actually running so if you say please route 5% of traffic to the new version it will route 5% of traffic to the new version independent of whether you have one or ten instances of that running so you don't have to do any weight calculation you know which how much weight we have to assign to this specific instance or to this specific route so fabulous I'll do that all automatically for you so what do we do with with Fabio and so I started out writing a list with you know the things that we have done most recently both in my team and another team that's also using Fabio within with an eBay Classifieds but for the most part if I really think about you know what we're doing with Fabio is that we forget that it exists so now really think about that you know so how many pieces of your infrastructure do you actually have that are not constantly reminding you that they're still alive so in our case we have you know like an active MQ which is acting up we have my sequel databases we have file systems we have load balances which need to be reconfigured so with Fabio every once in a while when there is something happening we actually have to remind ourselves oh there is this single piece there that's also in the code path that's also in the hot path where every single request of our platform runs through but it's just completely invisible it just runs it's there it works um there isn't really anything that we have to do maintain it so with this approach which means that there is a load downs of which you don't have to configure anymore at all ever adding new services becomes trivial so because the only thing that you have to do is you start up your new service so if you want to add a slash product to endpoint or you know and support endpoint the only thing you have to do is start up the service you start up the application it will register itself and constable fabio will pick it up you can immediately access it as soon as you shut the instance down the the route will no longer be accessible Fabia will report a 404 and then that's the end of it and this works in every environment it works on my laptop it works in our virtualized environments it works on on the cloud environment it works on the physical environment it works in any environment because there isn't really anything special that happens there the other thing that we're doing with this is that we're making refactorings so remember the case when I when during the demo I started two services service a and service B which were both announcing the same the same path so in a normal environment why would you want to do this so you either have a user service stat serving user request in your product service that serving product requests but in our case we have services which is our old search front-end and we want to migrate this slowly to a new search front-end but we don't want to do this in a some kind of Big Bang migration it has to be something like a drop-in seamless replacement for for all the requests that we're doing and we're doing quite a number of requests there so we just run the new implementation next to the old implementation they're both announcing the same endpoint um and we can then use the dynamic traffic shaping to say okay so let's put 5% of traffic there let's put 10% of traffic there let's see whether it works oh it didn't work so let's shut it down again in we don't have to do anything special in any of our four or five environments in order to make something like this happen Marc paths which is well everybody in the Netherlands knows Marc paths I've been told is I recently went through a somewhat more elaborate refactoring of one of their one of their services so they actually register they took one front and service and registered every single HTML page that this service was serving so that they could migrate to a new implementation of this on a page-by-page basis so they could really just move one page at a time slowly seeing whether was working and then if it wasn't working then they could just switch back and it they could test this on their local machines in every environment and then they could also do this like this in production okay so this has been the story with with Fabio for or since September 2015 when this when this more or less went live what I've been working on for the last couple of weeks is I'm something that is called dynamic certificate stores so command-line arguments that's something like a nice a nice giveaway as well but this is something that that I've pushed live today so with a version one or two mmm what I've added is dynamic certificate sources one of the shortcomings of the current Fabio implementation is that if you actually want to do SSL termination on Fabio as well then there was only I've I've implemented it so that for every listener for every IP address that something is listening you can specify exactly one certificate so SSL actually has a server name identification support so you can run virtual hosts on an ssl connection but for that you need to be able to specify multiple certificates so since Fabio is well the main goal for Fabia is to actually be zeroconf low downs or something so which you do not have to maintain dynamic certificates doors is something that that will make this that will make this easier to implement and it will sub also support server name identification support so let's start with the command-line arguments because that's simple so you can in addition to the environment variables you can now also specify every everything that's in in the configuration file as a command line all right because for some people that is apparently simpler and with the certificate sources I've took the approach that there are you need to tell Fabio where to look for certificates so you can either specify a single file you can specify a directory you can specify an HDPE server you can use the Consul kv store or you can also use vault this works for TLS certificates which means the certificates that are actually being used for your SSL configuration for free SSL connection unfortunately it doesn't work for the client certificate authentication certificates because the gold standard library just doesn't have a hook for reloading these things dynamically but yesterday I filed a I filed a ticket let's see whether that leads to anything ok so let's see what this what this looks like and let's hope that this works ok different glasses ok so we've stopped fabulous so now let's run Fabio with with vault as as the backend let's start a let's start a vault server and here I'm using the development server with a hard-coded root token so obviously this isn't how you should run this in production but yeah for now I'm running it like this um so then certificates are stored under a secret and then in this case it's fabulous certs but that's configurable but this is no what I've used in the in the example configuration so I'm trying to find whether there's a certificate in the Fabio for Fabio lb io and I can also see whether there are any certificates under this path so nothing is there so let's start fabio with the vault address of the of our vault server let's give it a vault token then we have to configure a certificate source because the idea is that you configure a source and you can use it for multiple listeners this is why there are disconnected so the certificate source has a name in this call I call it cert it has a type which is type vault and then you have to give it a path where to look for certificates the current implementation is doing polling I had a how to talk with Jeff the vault guy yes a couple of minutes ago on how we can improve this so that in case you were actually running 505 your instances and you know you're not killing your vault server instance so but I can I can tweak this a bit so for now if you're using vault it will use polling if you're using Consul it will just do the regular long pole then we're starting a listener again on four nine nine nine a nine and we're telling it to use this certificate source which will immediately make it in HTTP and HTTPS listener in for the purpose of this demonstration so I've also used a round robin distribution so fabulous starts up and so far it reg it registers that these services are these instances are here are the the services are there but it doesn't have any any certificate yet because there was nothing in the vault involved so let's see what happens when we try actually try to access this as HTTP that should fail yes because okay I hope this fails because of this girl - I that works yeah so Fabia says there are no certificates configured this thing should actually fail you did 10 more time so let's put some certificates into our into vault and the way I'm doing this is that I'm saying okay so please create an entry Fabia lb dot IO under a secret Fabia certs and add two key value pairs here so the ones called cert in this name is important and then use the contents of this file which is just a plain text file a PEM certificate and then do the same thing for four key to put the key file there so if you have only one file which has the certificate and the key in PEM format concatenated to each other you can also put this just in the third file and it will it will work as well so I've now stored the certificate in in vault and if I want to read this I get the certificate and the private key and Fabio has determined or has found this certificate so Fabia lb dot IO is now actually being I'm accessible so let's hope that this works still doesn't work still doesn't work what am I doing wrong okay so service certificate says it's Fabio LBI oh so I'm assuming that there's something with a with a DNS cache that's messed up on my machine but that's at least so now it's working so let's remove the entry again let's delete you can see nothing is there certificate store is empty so if we do the request again we get an SSL error because Fabio will tell us there is no certificate so this now allows you to actually manage certificates on the fly you can just load them into whatever certificate or whatever you're using for certificates or so if you're using the pass a certificate store you could just use you know puppet chef ansible you know whatever distributes files across your your network and probably we'll just pick them up okay so what's next um there's one request that there are for someone wants to run Fabio in front of a ton of long-running WebSocket connections and for this Fabio actually runs out of ports when it's trying to make connections to the backend so we'll add some IP address pooling so that you can specify which IP address to use as a source IP addresses so that you actually can have 100,000 or 200,000 WebSocket connections on this the other thing is that I want to refactor this URL prefix tag while it has been quite useful it is in its in its current form it's a little bit limiting and why that is I'm gonna explain in a minute and then there was another long-standing request which was you know one of the things that came right from the get-go was okay so when are you going to support additional backends like you know Mia sauce marathon the docker API swarm kubernetes and so forth okay so let's start with the URL prefix tag your prefix tag is very simple its URL prefix - and then its host slash path um that's all you would have to do but it's quite inflexible because it's limited to URLs only um so there is a there is a ticket in for Consul to support generic metadata and in this ticket someone mentions well you know why don't you just use this RFC I think one four six four which is used for DNS txt entry is to put arbitrary metadata there and that sounds actually like a like a good idea in order to do this to to go to a generic key value approach so other than being different you know what would this actually allow me to do so after went through the list of open and close tickets I think you know these tickets would all be affected by this because it would actually allow me to to address certain features that people have been requesting on a route back route basis because otherwise they would have to be global parameters which you would have to control via the config file and this is something that I didn't really want to do um so there were things like prefix tripping which no I think you shouldn't do but lots of people are still keep asking for for doing this there are most recently someone asked for for a C or s support others want to be able to route bypass and header that has been injected you know by some other by some other mechanism then wild-card matching case insensitive insensitive path matching the protocol basically by redirection custom status code so a lot of stuff that I either didn't do or that I had to do based on based on global contact parameters I should be able to address with this so the issue with the multiple backends at least for me is a little bit a little bit more more mixed because if you look at the landscape that we have out there then we have these environments like missus marathon we have kubernetes we have talked a swarm well you have service registration and you have it you have the service registry and you have a distributed key-value store so either a zookeeper etcd which in essence contains the truth further for the cluster and if I if I look at what Fabio what Fabio does is then it's mostly centered around this consensus that that Consul provides so the service discovery it really is a nice is a nice feature but to be honest that is the easy part um so the the key feature that Consul provides for Fabio is this replicated consistent key value store that serves as the truth for the entire cluster this is where all the Fabian senses can connect to and can agree on what this is the truth so if I would want to support additional backends then while do I have to drag Consul along you know some people were complaining well you know already have this as a dependency you know why can't you use you know whatever is already there so I will try to figure out whatever is possible with this but to me that that was so far it looked a bit difficult so because what I want to maintain with Fabio is that it still remains the single binary the zeroconf which you just copy somewhere and you just run and it also supports the manual overrides because it's absolutely essential in my opinion that you have this was an auto-generated routing table so there is a there's a pull request to support Google compute platform already they're a friend of mine has has implemented this but he wasn't able to support manual overrides because whatever he was using on Google compute platform didn't have a way to just work where it he was where he would be able to store this so this is why this has been sitting there for awhile because I've been trying to think about you know how to solve this and Fabio should solve the harder problems it should allow you to just pick okay so this thing will just work and then maybe if you want to write an obscure plugin for the or a plugin for for some obscure platform that does service discovery as well or your homegrown solution then well maybe you can just generate these these config commands so as a step towards multiple supporting multiple backends is that I'm going to provide an API to push routes into Consul which you can already do because you could kind of abuse the manual override feature for this because if there are and if there is no service discovery you can just generate the entire routing table yourself and then do this so I want to make this somewhat more generic so that there is a specific place where you can put additional routes into Consul via a standard HTTP API then I'm going to look into additional his discovery modules and maybe also add different Cavey stores but that largely depends on the kind of requests so the first thing I have mostly done but for the rest I need more input so if this is something that is actually is matching a use case is something that you would like to do where you think well you know this would actually make sense in my environment so please find me after the talk and let's have a discussion about what this would look like in your environment so that I can learn better about the things that you are actually using okay at the end let's have a quick look at some stats so fabulous written in and go one at six it's mainly written around the HTTP util reverse proxy right now it's about four thousand lines of code but I thought I'd say three thousand eight hundred about this is contact management and tests the rest the core function the proxy functionality itself is still so simple that you can easily look it up it's it's not more than 100 or 200 lines of code it has been in github since September 2015 so as of two days ago I've reached two thousand stars one hundred twenty four hundred ten Watchers I'm doing roughly a release a month there isn't really a hard release schedule it's more that when I look at the release history that seems to be the rate at which I'm pushing but if there is a ticket open and I can fix it and it's already it's already in master a release just means that I've tagged it and I've built a pre-built a binary and uploaded to get up and to talk or hop this has been in production since September 2015 serving these three sites well these are the ones that I know of because there are the ones from our colleagues so Mike Potts is the site you know that everybody knows and they kind of you know went ahead and just pick this up even when my team wasn't really ready to do this and at some point well this is going to production we've tested it it just works so that was nice and then through some internal so through some internal stuff I found out that people in Italy are also using this sone they're also still quite happy about this and my team the add mark team is also using this and combined pushing something like 15,000 requests per second through this thing and so far we can't really observe any additional latency um we haven't really had any issues any issue that we had so far was more or less related to something that was happening not happening in the service properly from our perspective it just works and as I said at least from my part most of the time I forget that it's there and that's it if you want to look at the source code um this is where to find it if you want to email me this is how you can reach me if you want to see you know one of my five tweets a year then this is where you have to go to and thank you very much for listening and I'm open for questions

More resources like this one

  • 2/1/2023
  • Case Study

Should My Team Really Need to Know Terraform?

  • 7/21/2022
  • Case Study

Using Terraform Enterprise to support 3000 users at Booking.com

  • 2/22/2020
  • Case Study

Terraforming RDS: What Instacart Learned Managing Over 50 AWS RDS PostgreSQL Instances with Terraform

  • 10/7/2019
  • Case Study

Running Windows Microservices on Nomad at Jet.com