How Terraform Will Impact the 2018 US Elections
This talk will cover the deep technical details of how ActBlue uses Terraform, as well as how they have promoted and evangelized Terraform across technical teams.
In mid 2017, ActBlue began using Terraform to revamp its donation platform, a system which has accepted and processed over $2 Billion for political campaigns and nonprofits on the progressive left. The process began by leveraging Terraform to migrate a PCI compliant credit card vault to AWS and quickly expanded to support orchestration of the majority of the infrastructure, including non-PCI environments and a Fastly configuration.
The agility, modularity, and transparency of Terraform has afforded the ActBlue DevOps team the ability to deliver more features and more responsiveness to our platform during a period of massive growth of Democratic donors, campaigns, and initiatives. This talk will cover the deep technical details of how we use Terraform, as well as how we have promoted and evangelized Terraform across technical teams.
Lead engineer, ActBlue, ActBlue
Senior infrastructure engineer, ActBlue , ActBlue
Nicholas Klick: Thanks, everyone, for coming. My name is Nicholas Klick, and my coworker and friend Dan Catlin and I are excited to chat with you today about Terraform and the US elections. We’re going to just give a quick overview. We’ll talk about how ActBlue impacts elections, talk about some problems we’ve had, and how Terraform has proved to be a solution for many of those problems. And we’ll go over some examples of where we use Terraform.
» What ActBlue does
I do want to talk a little bit about ActBlue. What exactly do we do, and how do we impact elections? One of our primary aims is to empower small-dollar donors. Whether someone is giving $5, $10, or $25, we want them to be able to be impactful about a campaign or causes that they’re passionate about. Also, our platform counters big money and dark money in politics. We’re big on cultivating grass-roots engagement.
We found that when small-dollar donors contribute, they’re not just donating. They’re also becoming voters, they volunteer, and they get engaged with campaigns. We’ve also noticed that our platform is helping to encourage diverse folks to run for office. Essentially, we’re lowering the barrier and making it possible for non-traditional candidates to compete, because it’s so easy for them to get set up with fundraising with our online platform.
Also, especially this election, we’re helping Democrats to be competitive in unlikely locations. For example, Beto O’Rourke in Texas, places like that. You may not have heard of us, but you’ve probably heard of some of the candidates on our platform, folks like Bernie Sanders, Beto O’Rourke, Kamala Harris. They all rely on us to support their large-scale fundraising efforts.
And we’re growing fast. This is a graph of our lifetime fundraising. We were founded in 2004, and it took us over 12 years to get to $1 billion, and 18 months later, we hit $2 billion, and by the end of the month, we’ll hit $3 billion, so that will have been $1 billion in 10 months. Currently, we have 5.6 million donors, and 14,000 campaigns and causes.
» Problems with infrastructure
We process up to 12 contributions a second, and last month we moved $178 million. All told this election cycle, meaning from the 2016 elections to today, we processed over $1.5 billion. Now, scaling our platform has not been without problems. Essentially, there has been a lack of cohesion of infrastructure, on the strategy end, and also on implementation as well. Staff changes over time have led to exposing a number of issues for us.
We deal with a number of things. We have a divergent, sprawling infrastructure. We are based on a number of different providers: Rackspace, AWS, Heroku, Meteor Galaxy. Whenever a new developer would come on board, they would build their application on their favorite provider. There wasn’t a plan. There wasn’t a way to glue it all together. We’ve also been dealing with manual updates that are not in code.
Unfortunately, we do have a lot of configuration and code in Chef, but at times we’ve been surprised by a configuration that was just done manually, and a lot of times this has led to major issues for us. As well, we’ve dealt with black boxes in the code and in our team knowledge sharing. So we unfortunately sometimes have people that have silent information; that information doesn’t get spread throughout the team.
Generally speaking, it was pretty difficult to work with the systems. In some places we’re using very unintuitive custom scripts and processes. In the last 18 months, we’ve tripled the engineering team from 8 folks to 24, and the kinds of issues that we’ve dealt with are issues of ownership and siloing of information, communication, things like that. Our team has required more collaboration, more communication, and one thing we’ve been aiming for is better self-service for our developers.
Dan Catlin: So we looked to Terraform. Nick and I both have had some experience in the past, and it seemed pretty natural to bring it in to solve the problems we’ve had. These are the main benefits—he’ll talk about these in a minute—but transparency and collaboration obviously massively improved with Terraform. The modularity has helped us adapt really quickly as well, and both of those have created the emergent property of improving our agility and agile processes.
» Transparency and collaboration
This is our big siloing, knowledge-sharing problem. Moving from legacy systems, very few devs working on it, not necessarily a lot of documentation written: the standard story on apps with a small team, intimately involved with the development of it. They know how it works. They know how to scale it. As those people leave, that knowledge goes away. But we’ve been able to do a lot with Terraform.
Getting infrastructure as code lowers the barrier of entry for developers. If there’s something for them to look at and work with, configuration drift. We’ve also, with the tooling, been able to isolate and then open up a lot of our black boxes in a much safer way, while not really affecting our scaling story uptime.
» Reviewing changes
This has been a big one for us. As developers have been coming and going, they were rolling up their favorite platforms. Not all of these platforms have a tool chain. Terraform’s provider infrastructure has generally been solving that—official ones, user ones—they’re all helping us. But getting that stuff into a regular development cycle—where people can look and get helpful requests and review those changes—has not only helped us show developers what we’re doing, but has helped them understand how to contribute.
It reduces the time to understand the change if everyone on our team understands Terraform. But the big benefit has been our ability to mesh our dev and ops teams together to address problems instead of providing reactive solutions, instead of just siloing: “We’re going to be working on scaling a database problem, and they’re going to work on a new feature.” It’s much easier for us to deliver tools that they can build off of.
» Terraform provides modularity
This is really just Terraform modules, but they have given us a lot of very specific benefits. We have AWS. We got Fastly. We got a few other things we’ll talk about in a few minutes. But the ability to do this stuff and share this stuff across environments has given us a much better environment promotion model, as far as how we could build our dev system to better address the needs of our staging environment for our devs and our production environment, while also testing some of this stuff instead of having a very isolated production environment that works entirely differently than our staging.
So using the code reuse and building our version of what needs to get done for a specific service and then varying between those environments has really helped. Code reuse has helped reduce the time for new versions of services to be rolled up. It’s really fast to onboard devs to say things like, “We’ve already done this over here. You get to just fork this lightly, change some variables, get it much closer to a service definition than what we could do with a ton of fractured API wrapper scripts.”
One of the big advantages is a single language for many providers, and sharing a lot of these configurations between these providers. We don’t need to pass a creative service discovery system in certain cases, but it’s this simple reduction of time in doing this stuff with Terraform. It’s been super helpful.
DRYing the code. Everyone loves DRY code. And making slight variations on this has helped us very quickly expand our Terraform usage and respond to issues that are general scaling.
» Increased agility
A lot of this has led to much more agile practices across our team. Transparency and modularity have helped the collaboration. But we have a small team, with a lot of dollars coming in. We need to respond quickly, and a lot of this has really helped us. Furthermore, we had very siloed teams, and by moving into an infrastructure-as-code system, we were able to bring the devs into the operations process instead of treating these as 2 teams, oceans apart.
It also allows us to more quickly integrate the knowledge of our dev team and what they’re building with the systems we’ve built out. HCL is not without its problems—I think it works great—but it’s obviously better than us having a large set of custom Ruby scripts for AWS, a different set of cURL scripts for running our Rackspace, using API commands to hit some of our other providers. That common language makes it much quicker for us and our devs to move into new providers and address new challenges.
» Real-world examples
We’ve got some examples of what we’re doing here.
Nicholas Klick: The first example is our credit card vault. We have a custom-built system that tokenizes and stores millions of credit cards. We do this to minimize our PCI scope and also to have a more secure infrastructure where we’re dealing with a reference to the number, rather than the number itself, across our system.
In early 2017 our credit card vault required a move; our legacy provider had had issues; for example, 2FA was broken for a bit. There were network outages, just poor customer service generally. So we were inclined to move to somewhere more stable and secure, that we felt good about, that we could scale into, etc.
We ended up landing on moving to AWS. The architecture is we have VPCs in multiple regions. There’s secure peering of the tokens across the regions. We use restricted security groups and NAT gateways, set up multiple EC2 instances for our Node.js applications, Postgres SSH bastions. The node-level configuration is all done with Chef.
Terraform is essentially a module covering the entire credit card vault peers. So we have multiple peers that can share the tokens between each other. The module centralizes the code that’s used across regions. We store all of our state remotely; we store it in S3. We utilize multiple AWS accounts for our different environments: staging versus production versus testing. This allows us to segment the different environments and really control it.
If I’m in the production directory or in the staging directory, I know I’m only interacting with that AWS account. And some of the benefits that we realized using Terraform: It’s really easy to test and tear down as we were trying to figure out what VPC configuration we wanted to use. Terraform Plan is really great for doing no-op, like a dry run, to see what it’s actually going to do.
We’ve realized the configuration as code prevents those manual changes over time that were leading to surprises for us, and we’ve realized the ease of code review.
But this is a credit card vault, so we’re going to be running into some compliance and security-related concerns. What’s great about Terraform is that you can clearly see the network. You can understand the network security has visibility into it. As part of PCI, you have to do your biannual firewall review. It’s really easy to do that. It’s right there in the code.
The audit as well. It helps with that, because as part of the audit you have to show that you’re viewing the code and that you can provide examples that every change has PR review. The PCI controls are in the code, and what’s nice is it just ensures that you’re going to maintain that ongoing compliance.
To get specific into the compliance, you can see how Terraform makes it really easy to set up all the AWS resources that you need to meet the various requirements. I’m not going to go into all of it, but it definitely helps get you quite a ways down the road toward being PCI-compliant. Terraform made it easy to get quite a ways there.
» Getting visibility across the content delivery network infrastructure
Dan Catlin: So we’re taking in a lot of money right now. The amount of time we are down equates directly to how much funds can go to campaigns. We have used Fastly as our CDN provider and have moved an incredible amount of our processing off to the edge.
However, we were in no way managing what the setup was. It was almost entirely manually configured. The GUI works quite well, but as far as change management goes, we almost had no visibility into what developers were doing across our CDN infrastructure. When we are fundamentally relying on that to make sure that we can collect donations at any given time, this caused some problems.
So we’ve moved the whole thing into Terraform with some solutions engineers over at Fastly. We got a bunch out of this. This is probably one of the more successful things we’ve been doing as far as integrating DevOps ideals into the team. Multi-environment management, configuration promotion, very similar to what Nick just talked about in our previous example. It allowed us to test our CDN configuration for the first time.
This is also probably the biggest example of where we’ve had real developer benefits. There are a few developers that really needed to make some changes to our CDN configuration. Getting them involved with Terraform and building our Terraform configuration for CDN, we were able to train them, not only on Terraform, but we were also able to show them a very major part of our networking infrastructure. This has allowed them to go into some of our other Terraform projects, including a lot of our security and permissioning systems, and very quickly contribute to that.
It’s been great to not be working as an operations person isolated from our product cycle. We’ve gotten 4 or 5 devs of our relatively small team on board and treating this as a much more self-service option. They are able to go into our system, deal with our major CDN configuration, and they know how to make changes without us. Sure, they send the stuff for us to review, but it’s just been excellent to make sure that the devs are not just stuck over in their frontend feature development.
» Better response
The other big thing is, if you have a single source of truth for what our production configuration is, we can respond to production issues. Things happen. Data centers go down. This is a pretty regular scenario. Things that are out of your control will always happen. But having our entire configuration there as code allows us to very quickly not only maybe move traffic between data centers, but address very specific issues, and not only our routing, but our cash stability.
Before, we would have to debug our entire stack to understand that an issue was really happening on our CDN later, because we have ceded so much of our application logic to the edge.
So where are we going from here? Terraform has been very successful for us, so over the next year, as we’re preparing for 2020, we’re going to be utilizing Terraform to head toward this setup: We’re going to be moving all of our hosting directly to AWS, and continue working with Fastly as our edge provider. We’re going to be using Terraform for our entire orchestration layer, as well as some level of configuration of providers. And then our host configuration we are outsourcing to Chef, which is now going to not be doing any of our orchestration mechanisms, but will just be focused on host configuration so we don’t have custom scripts running against end providers to just roll up a host.
We’re also using Packer right now to build our AMIs out on AWS, and get security testing on those. Great tool. We don’t have a lot to say besides, “It’s great, you should use it.” But some of the stuff I was seeing them talk about earlier today—that’s going to be even easier for us to integrate our tested AMIs on Amazon into our rollout, into our future AWS stuff.
» Better orchestration with Terraform
Specifically, Terraform has become very important for us, and this is why we’re building the stuff on Terraform. We get isolated orchestration. Right now, we have our orchestration very tightly coupled with our host configuration in Chef—but not using any of the stuff like Chef Provisioning, just Rake tasks wrapped around Chef. This has caused obvious problems, but by isolating the orchestration, we can deal with our cross-provider dependencies much better.
This also allows us to focus Chef on host configuration and more simply dog-food that into Packer so we can build out good base configuration and review the security of it. Obviously, there have been some security issues with elections, and it’s become very important for us to make sure that we are doing not only regular security reviews, not just for PCI, but something like for the democracy.
Making sure that we can isolate that host-level configuration into those tools that are better suited for it—it’s just been super helpful. It’s much easier for us to review our security posture on individual hosts: “Is SSH out of date?” Obviously, we also had a bunch of disparate tools. This meant that we had a bunch of disparate configurations at any given time.
So, with Terraform, we really can build out our service patchwork, with all of the dependencies prebuilt out, all of the integrations between them built out in a single configuration instead of us having to think about, “Well, we need to deploy this app, but first thing we need to go add these routes to Fastly. But we need to turn them off. But then we need to go over to S3 and make sure the IAM permissions are over there.”
With Terraform, we can make sure all of that stuff is just happening at one time, which is much easier to promote an individual feature and individual environment change.
» Enabling growth
As we were building these examples out and pointing this stuff toward devs, it’s easier for us to isolate and manage any future development. We have a lot of projects coming up. No one on our side was ready for the scale of what’s been happening over the last 18 months. So we’re having to grow really quickly.
Terraform has really sped a lot of this up. We also are automating everything. So, with Terraform, storing plans, some of the remote stuff they were talking about earlier with remote execution stuff, it gives us a really clear path to automate this stuff so we aren’t spending as much time manually remediating what is often manual configuration changes in certain parts of our environment.
This makes our entire workflow this: We run Terraform to orchestrate providers, Terraform runs Chef on the host that it brings up, and we’re done.
Nicholas Klick: So ActBlue’s small-dollar model has the potential to impact elections. There are a lot of articles coming out now about how the Democrats could have possibly amassed a fundraising edge on Republicans, and what they’re finding is a lot of the fundraising is coming from small-dollar donors. The vast majority of those small-dollar donors are donating on ActBlue.
We’re definitely impacting the election in a number of different ways, and Terraform has been a critical part of that impact. It’s helped us scale during this period of really massive growth, and ,like we’ve been mentioning, it’s enabled a lot of transparency and collaboration between our dev and ops team. It’s aided us with security and PCI compliance, and ensuring that we’re maintaining those security controls, and as well helping us to reduce downtime during a critical period of fundraising.
Dan Catlin: We are hiring for DevOps. If you want to help ActBlue in the next phase for 2020, hit us up. Also, vote. We are collecting lots and lots of money. Sure, that may make an impact. The only real impact is you voting. Please, please vote. Thank you.