AWS Terraform Landing Zone (TLZ) Accelerator
Oct 07, 2019
Watch Amazon announce and demo their Terraform Landing Zone (TLZ) AWS Accelerator preview version at HashiConf.
Terraform Landing Zone (TLZ) is an Amazon Web Services Accelerator that helps customers more quickly set up a secure, multi-account AWS environment based on AWS best practices with a strong isolation barrier between workloads. This solution saves time by automating the provisioning of core and application accounts and leverages the extensive providers Terraform has to extend provisioning to third party resources in addition to AWS. This enables a self-service and automated secure infrastructure within AWS.
In this talk, you'll get to see senior cloud delivery architect Brad Hankel demo an early preview release of TLZ.
- Brad HankelSenior Cloud Delivery Architect, AWS
Thank you, and thanks, everybody, for staying through the conference. I know it's getting toward the end, so hopefully you find a lot of value out of this.
Let's talk about AWS Terraform Landing Zone Accelerator Preview. This is a key word, "preview"; we do intend on releasing this. We'll talk a little bit more about it in Q4, but right now it's in an early limited release. We've had some pilot customers that we've been doing this with, but we wanted to come here because of HashiConf and introduce this one to you.
Freeing customers to move fast
What do our customers want to do on AWS? I'll give you a bit of background on why we did this. One of the things that we see is customers want to build; they just want to focus on what's differentiating. They don't want to do the things that are heavy lifting. If you think about things like RDS and things that we parade as services, they just want to go and build things.
That's a key thing that we want to focus on. They want to move fast. They want to go from idea to instantiation pretty quick. We want to support that. And they want to stay secure.
If you look at all these, these are common themes that we see across all the customers. And these are the themes we see customers coming to us with. But if you pick one out of here, "stay secure" tends to rise up to the top. If you think about a landing zone and everything about that, a lot of it is about the guardrails that we put up, everything around how we secure accounts and how we treat everything for it.
If you look at that, what we really want in here is to be able to create multiple accounts, which enables a strong isolation barrier between workloads.
If you think back to the other things—we want to move fast, we want to build, we want to stay secure—one of the ways that we help stay secure, we came up with a concept quite a few years ago about multiple accounts. We started seeing a lot of enterprises instantiating a lot of different accounts, sometimes in the thousands. When you get to that scale, how do you do it in a prescriptive, repeatable manner?
There are also some challenges with the multi-account environment. There are many design decisions. You have to decide a lot of things. We give you a nice platform, nice framework, but you have to decide a lot of things. That tends to slow you down when you want to go and do it.
You have to configure all these. If you're doing this, you have to have a repeatable process when you go above and you're trying to do enterprise scale. If you want to scale something out, you need some way to keep on repeating this process. And you have to establish the security baseline and governance. You're coming into the cloud, it's your license to operate. If you don't secure things in the cloud, you're going to have issues with it.
When we looked at this, a couple years ago, we came up with the AWS Landing Zone. This is a CloudFormation-based approach. It uses all the tools that are native to AWS, and it was our first iteration of trying to deploy an automated way to manage multiple accounts.
If any of you have been to re:Invent or watch anything around in the blogs, we came up last year with Control Tower. That's another way to provision accounts, do some governance around the accounts; it was released a couple months ago. But we've also been seeing a lot of customers that are baselining on Terraform.
If you're deploying things on Terraform, we've gotten customers that have said, "Hey, you know what? We're hybrid, multi-cloud. We have different reasons for picking Terraform. We want a Terraform solution." So last year we started working on something called an AWS TLZ, a Terraform Landing Zone.
How do these break out? What's the separation between AWS Control Tower, the AWS Landing Zone, and the AWS Terraform Landing Zone?
AWS Control Tower is a service. It's the least effort of anything that you have to do. Go in, click a few buttons, start to provision accounts. Because it's a service, it's a little bit more restrictive than some of the other things. May not have all the features, may not have the functionality that you want.
AWS Landing Zone really put the infrastructure as a code model in there. A lot of CloudFormation, which is our baseline infrastructure as code for AWS.
But one of the things that we started seeing was, especially in enterprises, the ecosystem is not just AWS. It's partners, it's other tooling, it's everything as code. And we started seeing a lot more push and pressure on us to do everything as code. So when we took a look at the TLZ and we see, "Hey there's a lot of providers; we can do everything as code. Everything from documentation to third parties and everything on the provisioning side of it."
So that's what we were trying to tackle with this. If you look at it, though, they're all heading to the same destination. All these things here—AWS Control Tower, the AWS Landing Zone, and the AWS Terraform Landing Zone—whichever one you pick, it's an easy deploy solution to set up multi-account environments.
AWS Terraform Landing Zone
But we're going to focus on AWS TLZ.
This is a preview; I will reiterate that. Feedback, any input, greatly appreciated.
It is a Terraform-based solution that automates the provisioning of a secure and governed AWS multi-account environment and extensible to third parties. And that's the key thing that we wanted to see on that. We already have some third parties, and we'll talk about a couple of them where we've implemented the first iteration.
What is baked into it? It's based on AWS and Terraform best practices and recommendations. We have a lot of workshops, projects, different ways that we can go through and work with customers, and we've come up with a lot of best practices. We have programs that go deep-dive into security; "security epics" is what we call them. We've based everything that we've baked into that on those best practices, and best practices that we've gotten from Terraform as well.
It gives you the initial security and governance controls. We don't want to pretend that we know everything about everybody's business. Depending on what industry you're in, you may have different regulatory compliance rules. You may be in financial, you may have HIPAA, you may have motion picture-type governance. So we give you the initial security and governance, but we expect that you probably have to add a little bit to it, depending on your industry.
The account vending machine
It's based on a concept of what we call "core accounts" and what we call the "account vending machine." The account vending machine is the heart of everything that kicks off the entire process, and we'll get a little bit deeper into that one in another slide.
The key thing for this slide, though: It is an automated and compliant account-provisioning system. Go ahead and enter in a few little details, kick it off; at the end of it you get an account that is fully vended, that's already attached into single sign-on if you're doing it, that already has your vulnerability tools scanning that account.
Everything's ready to go to give it to your partner. Because the whole idea about this from a platform perspective is you want to put workloads on it. Remember, this isn't differentiating for you. We just want to do it quickly.
How do we do the deployment on AWS TLZ? One thing that I'll note, just so everybody knows, we have, for a couple of different reasons, based this on Terraform Enterprise. We want to caveat that. Terraform Enterprise is what we have based this on, because of certain features and things that were in there like Sentinel. Now that Terraform Cloud has come around, we are looking at what we can do to support Terraform Cloud.
It's Terraform code in here. There's nothing in here that prevents you, if you're doing the open-source Terraform, from taking the code and modifying it. But we do have it baked in, where we expect something of the degree of Terraform Enterprise.
With Terraform Enterprise, though, you've always got this chicken-and-egg problem, because what we see is most people don't have Terraform Enterprise when we come in. The first thing we have to do, if it's a brand-new account, is bootstrap. The one manual thing that has to be done is you have to set up a master payer. That's your top-level account for AWS. It's where usually your bills come in. We do have to set up a few of what we call "core accounts": shared services, log-in account, a security account.
But the only reason we have to set those up—and we do those through Terraform as well—is because we have to deploy Terraform Enterprise, and we have to deploy your version-control system (VCS). We've worked with GitHub Enterprise, and we've worked with Bitbucket, and we expect to add GitLabs. So, pick your VCS, whichever one that Terraform supports. We'd like to support all of those as well.
All the provision of that in bootstrapping is done through Terraform code, but the key is, once we get this stood up—now we've got an instance of Terraform Enterprise—we want to deploy the account vending machine. Account vending machine is a [module](https://learn.hashicorp.com/terraform/getting-started/modules.html “Introduction to Terraform Modules”); it's part of the PMR (Private Module Registry) in Terraform Enterprise.
A key thing is to do the provisioning of the core and organization accounts. We'll talk a little bit about that too. Configure the supporting services. If you're an Enterprise customer, you're probably on Enterprise support. You're going to create a lot of accounts, and you don't want to create support tickets manually. Remember, everything is as code. One of the things that we do is create a support ticket.
And we trigger what we call a "baseline." When an account gets stood up, we think there are 2 aspects of it. We want to support 2 different objectives. We want the baseline to support the key things that the platform does. It's around VPC deployments, it's about security, it's about networking that you're doing. These are the guardrails that we're doing.
The other thing that we want—and one of the reasons why we're big proponents of Terraform Enterprise—is to extend this out to the business or your organizational units. We want them to do infrastructure as code. We want them to follow good practices of everything as code.
So when we go through this process of provisioning a baseline, baselining an account, we do it through 2 workspaces. One workspace is where the baseline goes. The other workspace is what we give to the business.
We'll walk through the account vending machine process. One thing to note and make clear is that, while account vending machine is a [module](https://learn.hashicorp.com/terraform/getting-started/modules.html “Introduction to Terraform Modules”) in Terraform Enterprise, we found that there are times where, using our native services and using Terraform, there's a better fit in certain cases.
A lot of what the account vending machine is based on is leveraging some of our standard tools: Dynamo DB, SSM Automation, Lambdas, S3. But remember, the entire provision of this is done through Terraform once it gets stood up.
So that kicks off. It creates the account, and it kicks off another Terraform template that does all the baselining to that account, which adds all the security, adds all the permissions for that account, turns on audit logging, and does the networking for that account.
We then extend that to the supporting services. Remember, that's a key thing. We don't want to stop there. We want to provision your code repo, so now we've provisioned a place for the business to use. We want to go on to provision Terraform, so Terraform now has a workspace. We want to provision audit logging and hooking into whatever SIEM tool you have, being able to present those audit logs to them. If you have a monitoring tool, turn on monitoring as well.
And also single sign-on. What we've seen in the pilot customers is that Okta has been a key player in there. We do all the provisioning for Okta as well. The idea, remember, is it's doing the whole chain. We take automation as far as we can.
And at the end you get a vended account. And like I said before, account gets vended, whoever's done the account request gets notified. That account has already been scanned. Everything's been turned on in that account. So if they start looking at that account, vulnerability tools are already fired off and they're already looking at that account.
AWS TLZ account structure
The master payer has to be set up manually. You've got to request it and set it up. Once you get that set up, then you can kick off some Terraform. In this example, we don't have Terraform Enterprise, so you kick off Terraform through the open source, and you're standing up 3 different accounts (shared services, logging, and security) under what we call a "core organizational unit."
The core organizational unit is a construct of the teams that are going to be using this type of service or looking at this; they tend to be the infrastructure teams, platform services teams, and enterprises.
We configure our shared services, because that's generally where people want to put their Terraform and their VCS, Terraform Enterprise, do logging, do security.
Once we have those accounts stood up, we have all the tools. Now we can start using the account vending machine and provision the other core accounts: a networking account and a break-glass account. It's flexible enough that if you have an additional account that you want—we've seen in one organization they wanted infrastructure shared services. You can add some.
Those are your core accounts. You protect them; you keep them ring-fenced. You don't let anybody get into those.
Then you want to start provisioning other accounts outside of the core accounts. Maybe you want to do developer accounts or sandbox accounts. And you can set up other organizations, like for testing.
In this organizational construct, we have a concept called "Service Control Policies," where we can put some governance and restrict people. If you're doing Service Control Policies (SCPs), you need a way to test them. You don't want to deploy them to an org unit without testing them. So you set up an organization that's just for testing.
Also set up an organization for archive. These accounts have a lifecycle to them. They come and go. Instead of deleting an account, you may want to move it to the archive, put an SCP against it that limits access or what you can do to that account.
But the only reason we do this is for the LOB use. You can call this org what you want. It can be based on application, it can be based on division, line of business. We don't really discriminate or care which way you want to go with it. It's just how you want to break up your accounts.
There are 2 patterns that come in here. We're never going to be able to tell what your software development lifecycle is. Do you do dev, QA, stage, pre-prod, prod, load testing? We never want to anticipate that, but since we have an account vending machine, we only discriminate between what's production and non-production. It really comes down to: What are the guardrails that we're setting up for each environment?
With a non-production account, you can say, "For this particular application, I just want dev and stage." You can set up those 2 accounts underneath that, and then you can just put production in there. You get some flexibility. You get to choose how you want to provision these.
If you have a line of business that doesn't need a lot of segmentation, then set it up at the line-of-business site. But we do see that things tend to segment based on production and non-production.
AWS TLZ security baseline
The big thing here is, What's the security baseline? And security is broken down into a couple different things: detective controls and preventative controls.
What do we configure with the detective controls right away? What have we already built into the system? VPC flow logs are turned on right away. CloudWatch is turned on. CloudTrail is turned on at the organization.
GuardDuty is turned on and deployed to every single region, because that's the best practice. You may not be going to all of our regions, but our best practice dictates that you want to have visibility into those regions to see if people are doing things in those regions, even if you're not using them. And some policies around config are turned on right away.
RedLock is a third-party vulnerability tool. We bake that into it. So account gets provisioned, and RedLock starts scanning it. It has already the authorizations, everything that it needs to come in and do all the scanning for it.
These are good. They detect things. But it's even better if you can prevent things.
One of the reasons we like Terraform Enterprise and Terraform Cloud is we're a big proponent of Sentinel and having Sentinel policies in there. KMS (AWS Key Management Service) is treated as a module, so you can do that for encryption. AWS Secrets Manager. IAM. SCPs are a big thing, Service Control Policies. And then for single sign-on, we see Okta gets used quite a bit, so we do all the provisioning for Okta as well.
One of the key things was making sure from a governance perspective that we're doing all the right things around security according to best practices.
Why Terraform Enterprise?
If you go back to the key tenets that we talked about early on—people want to build and they want to move fast; businesses just want to go—we're introducing infrastructure as code to them or telling them to go, but they're going out and grabbing resources. They're grabbing an S3 resource, but they don't really know how to do it.
But, through our knowledge and experience, we've already built out what we consider best practices for that. We also think that in some cases you come in with some of your own opinions on that and that influences how you want to deploy an S3 bucket.
We have this concept for what we call the "curated module." We take the resource and we add all of our best practices. We think over time, companies will add their own best practices to this. And then with that we can deploy a curated module, and we expose this through the PMR.
Why did we standardize on Terraform Enterprise? The key things were we saw a lot of value in the PMR, we saw a lot of value in the workspaces, and managing things centrally. We think this is a solution for hundreds of accounts. A tool like Terraform Enterprise, and using Sentinel, makes a lot of sense.
But the curator module has been really powerful, as we've seen in a couple of pilot customers. They're able to build up a portfolio of modules that the business comes and grabs, and a lot of times we have Sentinel policies that make them conform to this.
If you remember, what we recommend is you set up the guardrails, but this is the cloud. Things are moving at an intensive pace. New services are coming out all the time. We're not always going to have a curated module for these.
We may have a new feature, a new service, that somebody wants to try on S3. Maybe they can go and grab that—we don't want to put any restrictions on them—they can grab that resource, but if they're not doing encryption on it, we block it. So at least they have to come up to the baseline and the governance standards.
The best thing that happens is when you get into the business and you start seeing repeatable patterns. Think of how you can grow this. You've got business that's coming in, they're grabbing an S3, they're grabbing an RDS, they're grabbing an EC2, they're grabbing an ECS. They just start grabbing these modules and they can stitch them together pretty quickly. You don't have to reinvent the wheel every time. Then you start building out patterns on how people can repeat this process.
Benefits of the Terraform Landing Zone
What do we think are the benefits as we get into this?
Guardrails, not blockers
And this is what drove us really: Everything has to be automated. You're talking heavy volumes, you're talking a lot of accounts and management. So it's not only the automation of provisioning accounts, but you need to treat these accounts and things as a product.
New security requirements are going to come up. How can you automate reapplying the baseline across 200, 300, 500 accounts? How are you going to do that? Our focus was: How do we automate this? How can we take automation to everything on this?
Scalable, like I mentioned. You could use it for 2 accounts, probably a lot better for 200 accounts.
Self-service. We want people to move fast. You start putting up too many blockers for people to come in and request accounts, it just slows them down. People may want to stand up an account, do some testing, do something new. Give them a developer account, give them a sandbox account, let them go and do what they're trying to do on that.
Guardrails, not blockers. This has a lot of power, because you do need to have confidence that things that get deployed into your environment are not going to have egress access that they're not supposed to. So set up the guardrails.
Auditable. We spent a lot of focus on the logging, making sure that all the logs are being collected, making sure that they're exposable to a SIEM tool. Big thing about this, since you're using Terraform Enterprise and you're using a VCS, too, you have a way to do auditing on who's doing what changes and when they're doing them.
And flexible. We think we're going to cover most of the scenarios, but we don't think we're necessarily going to cover all the scenarios. We're treating this like an accelerator. We think that you have unique requirements, you have probably neat things. If we can get you 80% to 90% there, great, but if we can get you 100%, even better.
We want to get you as close as we can, but we don't want to restrict you from adding in your own feature sets or doing things that you would see as a value to your business.
It's Terraform Enterprise-based, like I mentioned, and we are investigating Terraform Cloud. We don't see any impediments to doing it, but since it's so new, we need to do a little more investigation on it to make sure we can support it.
Release is planned for Q4 of 2019. And probably more importantly, its code will be open source and it'll be free. So take it, use it as a community. Like I mentioned earlier, if you are using the open source of Terraform and you can find value in just taking some of the things that we're doing, have at it. Go for it.
We'd like to see people give back to this. We're not going to necessarily anticipate all the different third-party items that are out there. If you're finding something, and you're working through it, and you're adding something to this, give it back to the community.
Remember, this is non-differentiating stuff. Say you're setting up Palo Alto egresses; probably somebody else is setting up Palo Alto egresses. How does that change your differentiation between a business? It's just a way to do egress, but it's what you put in the platform that differentiates you.
We've had a couple of good pilot customers that have really been giving to the community, and they want us to bring this out. So we're able to make this available because of work that we've done, work that they've done. It's been a collaborative effort. So give back to the community. That's a key thing.
We're committed to doing the undifferentiated heavy lifting. And that's why we want to give this to you. We want to get you to start being innovative, building things, and doing things as fast as you can.
The key thing is, this talk has been a lot of marketing in a sense, but I want to show you what the PMR looks like. You'll notice that this is Terraform Cloud.
This is the account vending machine module; we put all the documentation in there. We have quite a number of inputs for this. We have outputs on here.
Key thing: There are 58 resources that get deployed when this goes up. Everything that we have to do around DynamoDB, everything that we have to do around Lambdas.
I want to clarify on some of these things to make sure people understand. We leveraged the Lambdas where we found issues or occurrences where it just made more sense to do it. One good example for that is we need access keys to get into the system. Terraform needs keys to get in. We don't want those keys to ever get out. We don't want them to ever be exposed. Terraform is really good about having evolved in having them there.
We have a Lambda that generates the keys, calls the API, puts them in Vault, into Terraform in the workspace, and nobody ever sees them. They're not stored in the database and not retrievable. We don't care. They're ephemeral. We don't care about them.
We had the partner: Every night, 150 workspaces, 200 workspaces, rotate every one of the keys, all done through a Lambda. It's interacting with Terraform.
We see those areas where doing that all through Terraform just didn't make as much sense. So we're able to do those things by leveraging some of the capabilities on it.
We have Systems Manager Automation and everything else. It's baked in.
If you've ever had fun with graphing on Terraform Enterprise, or any Terraform, this is a graph output of what we do when we baseline an account. And if we bring this one up, you can see it's just a lot of different calls for policies and stuff.
But the one thing is, if you go back to the original tenets, this is a lot of heavy lifting that doesn't differentiate you. So why don't we do it and give it to you and let you run with it? That's the intent.
There's a lot of effort in here. It varies depending on what the customer wants, but if you baseline an account, you're looking at about 120 to 130 resources that get provisioned for that. That's a lot of stuff that you don't have to manually do. You don't have to figure out how to do a lot of these things. We've tried to show you best practices for it.
Thank you. Hope this had some value.
We're early in this. We feel like we have it pretty good for the first release, but we're customer-focused, customer-obsessed. If you guys have ideas, reach out to me through email. If this sounds pretty good, but you'd like to see some other things, let us know. This is really baked on and built on the community.
Thank you, everybody, and go build.