This is a guest post by Jay Christopherson, principal engineer, DevOps, at Spaceflight Industries. Spaceflight is revolutionizing the business of space flight by delivering a new model for accessing space. A comprehensive launch service and mission management provider, the company provides a straightforward and cost-effective suite of products and services including state-of-the-art satellite infrastructure, rideshare launch offerings, and global communications networks that enable commercial and government entities to achieve their mission goals, on time and on budget. A service offering of Spaceflight Industries in Seattle Washington, Spaceflight provides its services through a global network of partners, ground stations, and launch vehicle providers.
We had two main challenges facing us as we determined how to design the computing infrastructure to support our business applications: how we should handle distributed runtime changes and service discovery. We need distributed changes as we deploy remote satellite communications ground stations (spokes) around the world, but which are all managed from a central location (hub). Changes made from a central location need to be distributed out to one or more remote ground stations in an automated fashion. As for service discovery, we build and deploy quite often and we needed to make sure that changes to services in our infrastructure are detected and updated as quickly as possible without any manual updates. These are the reasons we looked at HashiCorp Consul.
Our applications all register on deployment with Consul and from there, we make heavy use of Consul-based health checks, tags, external services registration, and load balancing (for certain tools that can register an “active” component). Also, DNS allows all of our various applications and services to dynamically update and discover other required services. For runtime configuration we have invested in tools like consul-template to build dynamic configuration files that can update based on triggers, such as a value change in Consul. In the end, a change to our deployed services often becomes as simple as making a change to a parameter held in a file in our source repository; builds are triggered on change, the updates are pushed to Consul. Now, changes to our deployed services “just happen.”
Our Architecture with Consul and other HashiCorp products
The core of any deployment at Spaceflight Industries is composed of four pieces: HashiCorp Consul, HashiCorp Vault, HashiCorp Nomad, and HashiCorp Terraform. Focusing on Consul, it’s the base component of everything we do which includes key/value management, service registration, dynamic DNS, and external services. Any new service we design is designed around how it is going to interact and be configured with Consul. It also serves as the backing store for our Vault deployment. For any production level services, we deploy Consul in a cluster.
We’ve designed our setup in a hub-and-spoke architecture. We have a central hub where any changes that are common across all ground stations are synchronized out to remote sites. There are a few key things that we choose to keep locally in the hub cluster only. Previously, we made every remote ground station a read-only copy of the hub cluster. However, we have found that the ability to mark some items as local-only (i.e., not sync’d out), available in the more recent versions of Consul, has made certain things easier for us and we prefer a more hybrid approach to synchronization now.
Our implementation of Consul provided us with key learnings throughout the process. We used to populate the K/V store using scripts, but found that this really wasn’t sustainable long term. What we really wanted, was to have a developer check in a change to source control, and have that “automagically” build/test/deploy so that all changes are tracked, without requiring special knowledge of Consul or our DevOps infrastructure. Using a Terraform resource to populate Consul with K/V or external services means that we can easily achieve that goal. Make a change in source, commit, trigger a build/test, trigger a Terraform plan/deploy, and it all happens behind the scenes in very little time with zero interaction beyond the initial commit.
Our best practice recommendation here (which is probably obvious to most), is to build your system so that changes can be easily tracked from a single source. Also, set up your system so that changes can only be made from that single source. Implement a break-glass procedure which would allow someone to escalate their privileges such that they can make changes directly in an emergency.
Less management overhead and bottlenecks with increased developer productivity
There have been several key benefits from our implementation: