This is a guest post by Grant Joy, Senior DevOps Engineer at Distil Networks.
Distil Networks blocks bots. We protect websites from attacks like site scraping, ticket sniping, and click fraud. Founded in 2011 by three good friends, our company atmosphere is one of friendship and hard work. Distil has over 150 employees and five offices worldwide. Our U.S. offices are in California, North Carolina, and Virginia. Our international offices include London, England and Stockholm, Sweden. We recently partnered with Verizon to rapidly expand our content delivery network.
We have a lot of secrets. These secrets include database passwords, certificates, and private keys. We are serious when it comes to the job of protecting them. Shipping our code with speed and reliability in mind is essential. In support of this goal, we needed a secret storage system with high availability built in.
HashiCorp released Vault in 2015, around the time we were looking for a solution for secrets. Given the early days of the product, we relied on help from the Vault Google group. Later that year we, the operations team, were able to put in place a Vault cluster with HashiCorp Consul as the backend.
We run the cluster on OpenStack with one internal API handling the majority of transactions. Our Ops team has direct access to the backend using the Vault CLI while other teams have access to specific Vault secrets. Our internal API sits behind our firewall, with our public facing API allowed to connect to our internal API. This is a nice way to separate our more sensitive procedures from our public facing API for security and organization. We run Vault completely locked down to our private network with the internal API having the majority of the logical code, such as generating certificates.
HashiCorp Vault Basics and Cluster Setup
We created a Consul cluster with three machines. These machines have been running in production for well over a year now. We use three nodes following the recommendation to have an odd number of nodes. An odd number of nodes helps to avoid stalemate issues during leader elections. This might be necessary during the recovery from a major outage such as catastrophic hardware failure across multiple nodes. One resource we found helpful was the Consul setup instruction in Digital Ocean’s guide for Ubuntu 14.04.
Below are examples of the bash commands. We used these commands to install the Vault client on MacOS, authenticate with the Vault server, write a secret, and then read it back.
# install vault command line brew install vault
» set authentication variables export
VAULT_TOKEN="VAULT-TOKEN-HERE" export VAULT_ADDR="https://vault.distil.com:8200" export VAULT_CACERT="ca_cert.crt" export VAULT_CLIENT_CERT="vault_server.crt" export VAULT_CLIENT_KEY="vault_server.key"
» write a secret
vault write secret/file @file.txt
» read a secret back
vault read secret/file
Vault Behind HAProxy
The leader node should receive write traffic directed to the Vault server. The way to handle this is to use the Consul DNS interface. This lets Consul manage all traffic routing to the leader node. The server will redirect all traffic that does not go to the leader node back to Consul. The advertise address of the Consul cluster would then have the connection retry to reach the leader. In our case, Distil wanted to use our existing DNS service and not deal with Consul DNS. To do this, we use HAProxy with a health check. It is a lightweight load balancer and, as of version 7, began including health checks.
These query each Vault node for /sys/leader and, when one responds as leader, traffic is routed to it. Such health checks are also useful in notifying our operations team if something is wrong with the server. We show sample health checks below.
Distil has had this setup running in production for over a year without any issues. We often restart individual machines for system updates. The leader switches automatically, and every node rejoins the cluster when server restart is complete.
While this method worked best for our use case, you might look at Consul DNS before traveling down this path. There is a potential single point of failure disadvantage at the load balancer level.
Vault Behind an API
There are times when it is valuable to separate the user from having direct access to Vault. To achieve this, Vault provides a command line interface or a web user interface. This moves complex code from running on local machines to a remote API. It also provides a single sign-on control through a protocol such as LDAP or Google SSO.
In Distil’s case, LDAP handles authentication between a user's machine and the remote API. Token authentication handles the API-to-Vault validation. Different API endpoints can use different tokens. This allows for API endpoint to have access to only the Vault data needed by that endpoint. In some cases, that means creating tokens with a read-only Vault access control policy.
Ruby code run by the API could look something like:
require "vault-ruby" key = "-----BEGIN PRIVATE KEY-----......." secret_path = "secret/important_data" Vault.logical.write secret_path, key
» later on...
secret_returned = Vault.logical.read secret_path puts secret_returned # prints key
In Distil’s setup, we ended up writing a Ruby command line application using the Commander gem. We bundled that and deployed it to our internal Gem in a Box server. This makes it easy for developers to make tool updates and deploy them to users.
When we started using Vault, it was as a tool to hold a very specific set of secrets. As time went on, we found that it was really easy to integrate it further into other areas we hadn’t expected. We now use it for storing environment variables for applications, storing Let’s Encrypt keys and certificates, and even for random passwords around the office.