HashiConf 2018 Closing Keynote: New GCP Features for Terraform and Vault
Dec 10, 2018
Watch the introduction of two new features in the GCP portfolio for Terraform and Vault in this talk titled "Irresponsible Responsibility."
Seth Vargo, a developer advocate for Google Cloud Platform, discusses the state of HashiCorp and GCP's partnership while also revealing two new product announcements:
- Open in Google Cloud Shell for Terraform
- Google Cloud KMS secrets engine for Vault
Developer Advocate, Google Cloud
Hello, everyone. Welcome to the closing keynote, called “Irresponsible Responsibility, or How Seth Makes Up Titles.” My name is Seth, and I’m a developer advocate at Google. What does that actually mean? It means that my job is to listen to all of you and then go back to our product teams and tell them that what they’re building is wrong. And to advocate on your behalf at the same time.
Today I’m here to talk to you about Google and HashiCorp. Our partnership is not new. As you can see behind me, we’ve been working together for the past 4 or 5 years to build amazing things across the entire HashiCorp product suite. And in addition to highlighting some of the things that we’ve done in the past, you should pay close attention, because I have 2 new product announcements that are coming in this keynote today.
» Terraform from the Google side
Let’s start with Terraform. As you all know, Terraform had a ton of amazing announcements in Terraform 0.12. Let’s talk about Terraform’s growth from the Google side. We heard from Paul Hinze about how Terraform’s growing, and those numbers were exciting, but I think mine are better.
Between 2017 and 2018, the number of resources managed or provisioned by Terraform on Google Cloud increased by 410%. That’s a 410% increase in the number of people provisioning resources on GCP. And that number’s exciting, but I like this one better: the number of projects using Terraform on Google Cloud year over year, 2017 to 2018.
Now for those of you that aren’t familiar with Google Cloud, we have this concept of a project. It’s like a folder or like a logical grouping of a collection of resources like VMs or Kubernetes clusters. And between 2017 and 2018, the number of projects that were using Terraform increased by an astounding 1,006%.
This is truly astronomical growth. And this growth is supported by the fact that we have over 130 Google-specific resources in the Google Terraform provider. Those 130 resources don’t write themselves, though. And I think this is something that is super cool at Google. We find communities and we invest in those communities. We have a dedicated team of people at Google whose full-time or part-time job is to ensure that Google is well supported in tools like Terraform and Vault. Many of you might have gone to Dana and Paddy’s talk earlier; Dana’s our Terraform lead. And this team doesn’t just sit there and type modules or type Terraform providers all day. They’re engineers at heart.
» Making Terraform easy
So they build these really cool integrations for the community. I think the coolest integration is this project called Magic modules. The Magic module project is open source on the Google Cloud platform GitHub and it provides code generation based on the GCP APIs for Terraform providers. So what that means is as new APIs become available, we can immediately generate support for them in Terraform. So you don’t have to wait months or weeks or even days to get access to them. And if you want to learn more about the Magic modules project, go back in time and watch Dana’s talk, which was earlier today.
But there’s always been 1 missing piece. Despite Magic modules, despite a dedicated team of people, despite all of this investment and documentation and examples, there’s always been this 1 thing … 2 things … these 3 things … sorry, 4 … OK, these 5 things that you had to understand before you can use Terraform on GCP. You had to download this thing called a service account. And in order to download a service account, you had to understand IAM.
You had to understand all of Google’s internals and how we do authentication and authorization. You had to understand what to do with that service account, how to secure it, how to use it in a meaningful way. And you also had to understand billing, because most of the resources that Terraform provisions cost some money.
So even though we have an amazing free tier, you still had to add a credit card so that you could be billed for some of these resources. And it went a little bit further, because Google’s kind of rooted in academics—and I get to poke fun at them a little bit, because we didn’t just require that you understand these tools; we required a PhD in them. You didn’t just have to know about IAM; you had to be ready to be tested.
And just to prove how little many of you knew about the amount of research that you needed to get started: That’s not a Google product. That is a picture of a mouse on a blue hexagon [05:08].
But that doesn’t undermine the fact that this was hard. And then on top of all of that, you had to correctly type out some Terraform configuration. You had to correctly authenticate to Google Cloud, set up your service account, hope you had the right permissions or fight with your organizational administrator to get you the right permissions. You had to install Terraform, download the binary in a secure way and somehow trust it. And that process was really hard. And we think that that process hurt people from adopting Terraform.
And it went a little bit further too, because if you’re an organization that is already using Terraform on Google Cloud and you have a new person on your team, it is unlikely that you have the bandwidth to walk them through that experience. So they suffered.
» Introducing Open in Cloud Shell
That’s why today we’re super excited to announce Open in Cloud Shell. Open in Cloud Shell is a fresh product that is launching in Terraform and it works like this. If you go to the Terraform docs right now, alongside most of the examples, you will see an Open in Cloud Shell button. This Open in Cloud Shell button will provision an interactive in-browser development experience including an in-browser editor, an in-browser command prompt, and an in-browser tutorial that walks you through this example.
When that editor launches, you’ll see something like this. [06:32] And it will load that exact example for you in the browser. You don’t have to download Terraform. You don’t have to install Terraform. We manage the version for you. There is no service account. There is no authentication that you have to do. We manage all of that for you. But instead of showing some screenshots, I think I’ll just show you what it looks like in real time.
So let’s take that same exact example we just looked at. This is live on Terraform.io. And this is the
google_compute_address. If you don’t believe me, you can do it on your phone, but please don’t because Wi-Fi. I’m going to go ahead and click Open in Cloud Shell.
So this is the experience you get when you open the Cloud Shell workspace. In the middle here, we have an in-browser editor. And on the right-hand side we have this Terraform demo. And on the bottom, we have a terminal. And inside this terminal we have Terraform available to us automatically.
So what this is doing is, in the background, it’s provisioning a custom environment and cloning that example from the Terraform repo for you. Now you can see that Terraform is preinstalled. And I can select my project—which, remember, a project is kind of like that logical grouping of resources. I’ll pick my default project. And from here we want to make it as easy as possible to leverage this. And typing is really hard, especially when you’re live on stage and you’re nervous. So you can avoid all typing by just clicking this button.
When I click this button, I run Terraform on it. Terraform is preinstalled. We didn’t install Terraform. There were no service accounts. I don’t know anything about IAM or authentication. I didn’t have to learn anything. And look, you don’t even have to type yes. Look at this. Ready? Boom.
This is creating a real public IP address with Terraform without ever installing a tool, opening a terminal, or writing a single line of code. And these examples are as simple as a compute address or as complex as multiple VMs in a dedicated subnetwork. And we’re going to continue to expand support and bring Open in Cloud Shell to all of the Terraform examples.
» More good stuff in Cloud Shell
But there’s a hidden gem here. There are 2 hidden gems. The first is that these examples were tested and they also live on the Terraform website. So that means no more outdated examples, because the example you see in the in browser documentation is the exact same example that’s going to run in Cloud Shell. So that means you don’t ever have to worry about an outdated parameter or something that was a little bit broken, because it’s all the same thing.
And the second thing that I think is most interesting, is what this enables for organizations. Organizations can build their own tutorials with Terraform in this custom Cloud Shell image, in markdown. And you can launch it with a URL. What that means is that if you’re anyone of the organizations that’s rapidly adopting Terraform at scale, and you have a team of people and you have no clue how to train them, you can build these interactive in-browser tutorials that teach them Terraform your way, your organization’s way. And they can do that asynchronously without hurting the productivity of your existing team.
That’s Open in Cloud Shell. We’re super excited to announce this integration. You can play with it today on Terraform.io. So please give it up for Open in Cloud Shell and the cloud graphite team which made this possible.
» Google’s Vault use
Let’s talk about Vault. At Google, we’ve invested really heavily in Vault. We have 2 storage backends, both of which are open source. We have Google Cloud Storage, or GCS. Google Cloud Storage is our object storage, similar to something like Amazon S3. It provides 3.5 nines of availability. It supports Vault in high-availability mode. It’s fully open source, it’s in Vault Core. You don’t have to download a plugin or do anything. And it’s just 2 lines of configuration to start using GCS as a storage backend.
Then we also have Google Cloud Spanner. Google Cloud Spanner is our highly available, globally replicated relational database, with an industry-leading 5 nines of availability. To put that into context, that’s about 5 minutes of downtime per year. How many people in this room had less than 5 minutes of downtime per year?
Google Cloud Spanner also supports Vault high-availability mode. It’s also open source, and it supports transactions. And for those of you that don’t know, this is important, if you’re running Vault Enterprise. Vault Enterprise requires a transactional interface in order to do some of its replication bits, and that makes Spanner a prime target for running Vault at scale or Vault Enterprise at scale.
What’s great is that in Vault 1.0, it’s easy to migrate between these. So if you’re a smaller organization and you don’t think that Spanner is the right choice today, you’re not locked in forever. You can use GCS today and then, when you reach the scale where you need the high availability and high consistency of something like Cloud Spanner, you can easily migrate.
» Authenticating with Google Cloud Auth Method
In addition to these 2 storage backends, we also have the Google Cloud Auth Method. This Auth Method allows humans via service accounts and instances on Google Compute Engine to authenticate to Vault. Instead of trying to bake in some API keys or tokens or use Vault’s AppRole, you can delegate that authentication right to Google Cloud. And if you trust Google Cloud and you trust the Google Cloud metadata server, we can authenticate your instance for you.
But it goes a little bit further than that because we can authenticate anything that runs on GCE. That includes things like Kubernetes.
And then on top of that, we also have the Google Cloud secrets engine. This secrets engine can dynamically generate those service accounts and the permissions associated with those service accounts. And this is a great way to programmatically ensure people have access to the different parts of Vault. And if you stop by the Google booth at all during HashiConf, we had interactive code labs that let you get hands on with all of these features of Vault and all of these features of Terraform.
We’ve also been leading the conversation around how to run Vault as a service on Kubernetes and how to connect to Vault securely from Kubernetes. Whether it’s Kelsey Hightower’s Vault on Kubernetes the Hard Way or my set of Terraform configurations that does it the easy way, a.k.a. Terraform Apply or the Vault Kubernetes authenticator, which predated the Vault agent, which would provide a single init container that you could use in a Kubernetes spec to get the Vault token from Kubernetes authentication without hard coding that into your application.
We even built our own auto-unsealer, because that was something customers wanted.
» Announcing the Google Cloud KMS secrets engine
But something new is coming. And you know something new is coming because I told you 6 days ago. I tweeted this out.
I’m really excited to announce this new secrets plugin. But first we have to talk about transit. Transit is arguably my favorite secrets engine in Vault. It provides encryption as a service, signing as a service, HMAC as a service, even random bytes as a service. It even provides service as a service. We have a lot of customers that use the transit backend. They use it for encryption. They use it for signing.
We kept hearing the same story: “Hey, Google, you have this really good KMS project. Why can’t I use that? I’d really like to use a Google managed key because these are in-memory, software-based keys, which means they’re super performant. You can rotate them. You can upgrade them. You can revoke them at anytime. But I want something that I can attest. I want to be able to prove via attestation that this is valid.”
So today I’m excited to announce the Google Cloud KMS secrets engine. This new secrets engine for Vault is available on Vault 1.0, the beta. The docs are online, and you can learn more about it there. Or I could just show you.
This new secrets engine is embedded in Vault 1.0 beta 1. [17:05] I’m going to go ahead and start a new Vault server in development mode on my local machine.
Over here I have a new tab. I’m going to enable the GCP KMS secrets engine. It’s that easy. Now I’m able to communicate with the Google Cloud APIs through Vault IAM and authorization framework. I’m going to go ahead and create a key. So what this just did under the hood is, in addition to registering a key in Vault, which is like a sim link pointer to Google Cloud KMS, this also created a key ring and a crypto key version in Google Cloud KMS.
So under the hood, when we make API calls to Vault, Vault is in essence proxying those various API calls to Google Cloud KMS. So let’s see what that looks like. So I’m going to go ahead and write to GCP KMS, encrypt. I’m going to give it the name of the key that I’m using for encryption. And I’m going to give it some plaintext. And under the hood, that’s being encrypted with a Google Cloud KMS key. It’s just that easy. And we can decrypt this data. And if you’re familiar with the transit backend, this will seem very familiar. That’s because it’s modeled after the transit backend.
So if we come over here and we decrypt some data, we’ll get back to plaintext. One thing that you might notice is that there’s no BCC4 encoding. If you’ve ever worked with a transit backend, you know that that can sometimes be a hassle. We’ve solved that in the KMS plugin for you. If you give it binary data, it detects that and does the right thing under the hood.
Additionally, you can rotate these keys. This will create a new key version. But what’s nice is you can specify the key version that you want to encrypt with. And just like Vault transit backend, we support re-encryption. And I can give it the original ciphertext, and I can tell it to use the second key version and it will re-encrypt that ciphertext. This is great because I can give it to a relatively untrusted process, that process gets ciphertext, and it gets ciphertext back. It never sees the plaintext data.
All of this is being supported by a software-managed KMS key under the hood.
» HSM protection
It would be really great if that was an HSM, wouldn’t it? Wouldn’t it be great if that was an HSM? Because then you could pass their encryption in transit, but it’d be backed by an HSM that’s FIPS 140-2-compliant. Well, it turns out you can actually do that.
All I have to do is I can say, “A protection level equals HSM.” Everything else is identical. Same exact thing, same exact API, but all of the data is now encrypted with an HSM. And I can prove that to you.
You can see here that the algorithm is symmetric encryption and the protection level is HSM. It’s just that easy to encrypt and decrypt data on an HSM. This plugin also supports signing, asymmetric decryption, and verification, again, with both software- and hardware-based keys.
We’re super excited for this new integration, especially for our Enterprise customers, because we think it solves a real gap in the Vault ecosystem today.
And that is the Google Cloud KMS secrets engine.
» Summary of announcements
So to summarize: At Google, we’re seeing rapid adoption of Terraform. For the analysts in the room, you can quote me on this: “Up, and to the right.”
In addition to just writing Terraform providers, we’re also investing in the community and making sure that we support the rate at which Terraform is growing, through things like the Magic modules project. We have over 130 available Terraform resources, and that number’s only going to go up from here. And today we launched Open In Cloud Shell, which is the first step in making sure that the Terraform experience in the browser with no pre-existing knowledge is a zero barrier to entry. And you’re going to see more and more from Google in this area to reduce the friction to get started.
On the Vault side, we have 2 open-source storage backends, Google Cloud Storage and Google Cloud Spanner. We have the Google Cloud IAM Auth Method and the Google Cloud IAM secrets engine. And today, we announce the Google Cloud KMS secrets engine. Everything you see here is open source.
» Why does Google care about all of this?
I think there’s a really important question here. And I don’t know if any of you are thinking it, but it was top of mind when I joined Google: Why are we doing this? Why are there engineers who are paid to work on this thing? Why am I on a stage giving a closing keynote? Why does Google care? Why does Google Cloud care?
We have this thing at Google Cloud called SRE. SRE is our implementation of DevOps. We think of DevOps like an abstract class or interface in programming. We think that SRE is a concrete implementation of that class. It’s a very prescriptive way about how to run and operate production systems. It’s how Google runs in production. And we’ve been talking about SRE publicly for a little bit now. One of the key pillars of SRE is this idea of reducing MTTR and MTBF—mean time to recover and mean time between failures. How do you reduce the amount of time it takes to recover in the event of an outage? And how do you reduce the number of times between those outages, and reduce their frequency? This is a hard problem to solve.
Well, it turns out that Terraform is an amazing tool for reducing MTTR and MTBF.
We had a customer that was trying to do a disaster recovery exercise. And they said, “Hey, one of your data centers is in the path of a hurricane. We’d like to move all of our data away. We want to move all of our infrastructure away. How do we do that?”
Well, how long does it take to migrate a data center, even if you’re on a cloud provider? We did some numbers. It takes about 3 hours, by hand. Click buttons, run some commands, drag and drop some things. It takes about 3 hours to migrate the infrastructure. Not the data, just the infrastructure. This is a midsize company.
It takes about 30 minutes to do that with a script. We could do some automation, some bash, some PowerShell. It’s not really reproducible. And I can’t test it. There’s no testing framework.
That same process took 6 minutes with Terraform. You’re talking about a dramatic reduction in MTTR.
So when you’re looking at your service as a service owner or a project manager or a VP, and you’re trying to provide the SLA or SLO for your service, for your internal stakeholders, and it takes you 3 hours to restore, how many nines of availability can you provide? Hint: It’s less than 3. When it’s 6 minutes, you’re darn close to 5 nines. Five nines is only 5.26 minutes of downtime per year.
So leveraging tools like Terraform can help reduce MTTR and MTBF. MTTR because it’s fast. It parallelizes the creation of resources. And MTBF because you can test it. You can collaborate on it and you can treat it like code.
» Better security with Vault
And then on the security side of things, in the SRE discipline, we design secure systems. This is fundamentally ingrained in Google. We have projects like Project Zero, which, people exist full time to try to think of the most terrible things that other humans could do to them. And then they publish white papers about it so that those terrible people don’t do them.
But we also have to accept a risk of the unknown. We recognize that there are things that are out of our control. Things like vulnerabilities in a CPU or a chipset that are outside of our hardware pipeline, outside of our production pipeline. And once you accept the risk of unknown, you need to maintain a record of all access. You need to know who accessed the secret, when they accessed that secret, how they accessed that secret.
And then you need to have a very clear plan for revocation. In the event of a data breach, whether it’s a rogue employee or a black hat hacker, you need to have a solid plan for revoking access. And “shut it all down” isn’t really an answer when you’re at Google scale. You have to be able to revoke just the minimal amount of access. And you have to have a strong plan for that. And if this sounds familiar, it’s because Vault does that for you, among other things.
And while we don’t use Vault internally at Google, we think that it embodies a lot of the best practices that we have internally. And we think that with customers adopting it, they’ll be more likely to adopt SRE, which we think is the best way to run production systems at scale.
You can learn more about SRE at google.com/sre. And there’s a free book that you can all download to learn more.
So last year, Kelsey got on stage. And Kelsey introduced this thing called Hashinetes. And he ran Kubernetes alongside Nomad. And Vault and Consul were in there, there were networking and compute and storage, and people were inspired.
But people weren’t inspired because Kelsey did this crazy thing. People were inspired because Kelsey left them with a message. And that message was, “It’s OK to be irresponsible. It’s OK to push your own boundaries. It’s OK to go big and learn from your mistakes.”
Today, I’m extending that message, which is that these tools can help you be responsible, if and when you’re ready to do so. They promote irresponsibility. They promote learning and loving what you do and exploration. And crazy ideas like Hashinetes. But on the other hand, these things are being used in production, at large-scale enterprises, financial institutions, people who are in the path of downtime, of things that you do every day and probably don’t even know it.
So yes, you can be irresponsible. But these tools can also help you be more responsible.