HashiConf 2018 Keynote & Demo: Infrastructure as Code in the Real World
It's great to move towards DevOps and Infrastructure as code, but too often the real-world intervenes. In this session, we'll discuss how the combination of Azure and HashiCorp technology can facilitate a gradual transformation of an organization.
Kubernetes project co-founder Brendan Burns, now a DE for Microsoft Azure, shows how Azure and the HashiCorp stack can facilitate a gradual transformation in your organization.
Kubernetes co-founder, Microsoft Azure
Hi there. It's great to be here. Thanks for coming by to see me speak. I know that I'm between you and the reception at the end of the talk so I'll try and be a little entertaining and I will not be offended if you leave towards the reception.
All right. So I want talk a little bit about infrastructure as code in the real world. And relate things both to my experience and to talk about what we think about when we are approaching discussing infrastructure as code with people who aren't coming to the Azure Cloud and possibly not necessarily from a place where they're doing this to the beginning.
I thought I'd start a little bit with my experience. So obviously I've talked to a bunch of people but I've done a lot of stuff before I started speaking. I had spent a long time producing high scale production infrastructure, being on call for it—24x7 on call—fortunately not every week but being on pager and having to react and fix things on the fly, deploy new software every week, that sort of thing. I'm currently the director of engineering at Azure. I run the Azure research manager so basically every API request that comes in to Azure, every single thing that Terraform sends to Azure, comes through infrastructure that we run. Obviously I run the Kubenetes service as well and Azure container instances. So I have a bunch of teams that are actually doing this on a daily basis, doing this on a weekly basis. And in addition to that, a ton of experience with open source. Obviously Kubernetes is the most sort of famous of that but even before then, if people have used the tool, JMeter was one of the very first things I was a maintainer for in open source land, long, long ago.
And I think more relevant to the discussion here is I have worked on … I started counting the number of different configuration management projects that I had worked on and I counted at least six. And I don't think that is rare. I think if you talked to people who have worked in configuration management for a long time, we keep trying to take a crack at it and I think what's most interesting is that we haven't sorted it out yet and that's an interesting thing to discuss.
So when I think a lot about configuration management and when I think about my experience in production and when I think about my experience with infrastructure as code in general, the first thing that I have always been struck by is this observation: Most outages are self-inflicted. We always want to just like, probably if I'm just careful I can pull that coin out of there. Nothing bad is going to happen. You know, just ship it. I mean, but actually the most interesting thing about this is that I don't think anybody thinks, well, let's go break the system today. And nobody thinks like it's a great day to go cause a multi-hour or even multi-day outage. But accidents happen. And accidents cause the problems that we then have to react to.
The primary reason that these accidents happen is these snowflakes. Every single time you go and you make an imperative change to the world, you're creating a snowflake, you're creating something that is beautiful and unique but will never exist again. And yet, when you need to do a roll back, you are asked to recreate it. So I think if we have one lesson that I hope a lot of you have already learned and that we are bringing to the organizations that we are bringing into a more cloud native or a more declarative way of doing things, it's that we need to move from these beautiful snowflakes to something that looks a little bit more like this. And automation is, in it's own way, is beautiful. It's a fantastic situation where the exact same thing shows up every single time. The exact same product comes out on the other side. And so this is the place where we're trying to go. This is the goal, I think, in a lot of what I am doing throughout cloud, throughout conversations that I have with the various customers who are maybe beginning with snowflakes but they are already in existence in figuring out how to I bring them into the cloud.
So to get there, of course, we had a little HCL, right. You write some Terraform and we're done. Well, probably not. We're going to write a little JSON or maybe hopefully done. And we throw a little more YAML on it and we're maybe done. But then there is a little bit of a throwback in us all, so we add some XML, you know. And now, mission accomplished. We're done. I'm going to walk off the stage cause I've shown you how to bring your work, you know, how to take that snowflake and turn it into that beautiful car factory.
It turns out, we're not. Even if we manage to stick with one configuration language which honestly never happens, there is still all of these real world concerns that make it such that we really can't just sort of write our way and declare our way into that pure perfect future. And it includes things like data locality and hard to move infrastructure and legacy deployment tooling that someone wrote in a Bash script 10 years ago and then left the company. Scary snowflakes, right? How many people out there have servers that they are scared of? I'm seeing some hands. You know, it's scary that you are scared of it but it is scary. And also the real reality is there's is always higher priorities. There's always that feature that some PM wants you to push out or that the field is telling you is going to land that next contract. Or your organization is like, you know, we've done it this way for 10 years. We run into this a lot in the cloud. In fact, I've been thinking a lot that maybe we should just ship a plastic dashboard of LED lights that everyone of our customers can install so they can watch the server lights blink even though they are running in our data centers somewhere.
Further more, I think a lot of these technologies are new. Even something like Terraform that's been around for a relatively long time at this point, still has a bit of a skills gap. Not everyone out there knows how to use these tools correctly. We have to worry about things like compliance, budgeting for these sort of things, finding the people to maintain our infrastructure and on and on and on and on. The folks who put these slides together or took my slides to put them up here, asked me, you sure, your slides got clipped off at the bottom, right, like is that. No, no, you don't understand. All of these things like there is just like this infinite list of things that stand between us and perfectly running piece of factory that is churning out our systems.
But nonetheless, just like the penguins that are trying to waddle their way up to the edge and fall off, we have to take the plunge. And so we have to move from this world where we have a bunch of snowflakes to this world where we are automating everything that we do, where we are securing and addressing the compliance and really achieving all of the goals that we set out to do in moving from the imperative world of scripting to a more declarative, more reproducible future.
And so for the rest of the time, I want to talk about how the combination of some this stuff that we are doing in Azure in terms of the cloud—because I think, of course, being able to move to the cloud is a big part of this. If imaging a machine involves landing a hard drive into a particular rack in a data center, in some ways, we have already lost. And what I really love about tools like Terraform, is that they are some of the things that were really built in a cloud-first way, in a truly cloud-native way. They are built around the idea that we want to be able to manipulate even things like servers. And we want to manipulate them programmatically and we want to manipulate them through files. And I talk a little bit about how the intersection of cloud technology with tools like Terraform, Packer, Consul, can really produce a lot of steps that allow you to move into the beautiful declarative world while still addressing a lot of these real-world concerns.
So in the first place, I'm going to start is one of the very first projects that I pitched and took on when I started working on Azure and that's the Azure Cloud Shell. And the idea behind it is actually really simple. It takes the cloud—obviously I guess—it takes a little bit of storage placed into the cloud and it adds in a shell. But what's so beautiful about this? Well, what's so beautiful about this is by default, when you start using this experience, it's an immutable experience. That shell is created exactly the same way every single time. You don't have to worry about upgrading software out in the field. You don't have to worry about things like two-factor compliance because, by the way, this came in through the website. This came in through someone going through the exact same login that they use to log in to their documents, to log in to their email, to log in to all of your infrastructure, all of the rules that you have around access, geographic restrictions, two-factor authentication, every sort of compliance oriented thing that the IT and security teams may decide to throw at you, you get automatically by having it live inside of a browser.
So that's great. And we can even do things like lock down the IP range and we can say, hey, you know what, we guarantee you that only access users are going to come out of this shell from a particular IP range so now you can actually lock down your infrastructure as well to prevent people from outside of a known range from accessing your infrastructure.
But it's better than that, because, of course, inside of this cloud shell, we can actually provide you with tools. We can actually provide you with things like Terraform, right. We can say, hey, you know what, we understand that it's really fantastic that you are logged into this shell. When you use Terraform in the context of that shell, it would be great if you didn't have to do a login again, right. It would be fantastic if, simply because we've already gone through that two-factor and all that other stuff, I don't have do to it all over again just because I happen to be using different tool. And so that kind of integration, that sort of ease of use enables us to bridge some of those gaps of people, not necessarily having the tool to download, maybe not even having permission to download the tool and also having to worry about how do I get the tool up and running with my infrastructure. I just login to the website just like I would anywhere else. It's already available, all locked and ready to go.
So let's take a look at what that looks like.
So here we are in the Cloud Shell again, with the two-factor auth and all that stuff. And I can actually, how many people out there use Visual Studio Code. We like Visual Studio Code. We've got Visual Studio Code in your web browser—yeah, that's right. And so you can actually come in here, this is a shared workspace, right, this is a cloud workspace. I actually wanted to use my phone to show you I can access the exact same thing on my phone, right? I'm not sure Visual Studio Code is going to work out too well but if I want to run Terraform in my Cloud Shell on my phone, it works. And I can actually open up my Terraform file here. I've got syntax highlighting, I've got all of the stuff you expect to see in an editor in this shared cloud workspace. I have a bunch of Azure resource manager definitions here. This is going to create a virtual machine. And I can actually integrate things like cloud in it, in order and local files from my cloud in it in order to bootstrap the VM. Obviously if I come down here, I can do
terraform plan and it's going to go through and figure out the whole thing, hopefully.
So it's refreshed all my state, it's ready to go and then I can do Terraform, apply, and it's going to go and it's figuring out obviously what's up there, what it needs to recreate. At this point, I'm going to say Yes, and it's going to go ahead and create a virtual machine. So I think, hopefully, this shows you the value—and again, because it's in the cloud, I can access this from anywhere. Literally from my phone, literally from the iPad, wherever else. I have access to Terraform, all the authentication all ready taken care of for me, the ability to go and deploy things.
But it's not just about Terraform. It's not just about being able to deploy stuff, right. We actually want to be able to use cloud based experiences to do things like build images. So I want to create a VM, I want to be able to create an image to run for that VM. Most of the time we don't want to be running scripts if we are getting to that beautiful world of declarative infrastructure, we don't want to be running scripts in the middle of the VM boot. That's a terrible idea. How many people have learned that running a script in the middle of your VM boot is a bad idea? Again, I see some hands. Yeah, it's a bad idea.
And so we want to be able to build VM images, have immutable things that we can boot up every single time but this is pretty unwieldy, it's pretty heavy. If I'm on a device say, without a terminal or my phone, I'm not going to be able to run Packer on my phone, so it seems like an obvious thing to be able to build an image-build service that I can do Packer in the the cloud. And so that's actually exactly what we have done. So we have in private preview now and headed towards public preview sooner, the Azure Image Builder service which enables you to take Packer definitions for your images and build then with a cloud based service and, of course, it's also integrated with that cloud shell, so you can have the same experience to build a VM image entirely in the cloud from whatever device that you want to be working on. And I'm pretty excited about that.
In addition to things like Terraform, Azure actually has it's own native templating language. People may or may not have played around with but the Azure Resource Manager team that I run has a templating language that is used for Azure. It's that JSON file that I had on the previous slide. But one of the problems with built-in, cloud specific templating languages is they have historically been limited to cloud specific resources. It's a great language if you wanted to play Azure Resources but if you wanted to, if you were mostly Azure, you had a little bit of say Cloudflare to do DDoS or to do DNS or maybe you are using a startup to do monitoring, you really couldn't write a complete template for that. And likewise, if you're a service provider, you might think, you know what, I would really maybe like to integrate myself better with Azure or with another cloud but I don't want to have to be in the business of maintaining providers for every single cloud, 'cause it's too costly to be in the business of maintaining all those things.
So what we have done is actually taken this inside out. And we've actually said, you know what, people are building a ton of Terraform providers already anyway, can we gain access to that ecosystem? Can we leverage the work that people are already doing to integrate themselves inside of Terraform and enable them to also be accessible from this Azure native language. So if you are an implementer of a Terraform provider, suddenly, with a resource provider, the Azure Terraform resource provider, you have the ability to have your service be accessible, not just inside of Terraform's native language with HCL but also in Azure's native language. That means you'll show up, your objects will show up in the Azure portal as first order objects. If people have come to your service from a more Azure native background, they'll be able to access and build infrastructure as code for you without you really doing any work other than building for Terraform,. So we're really excited to see this move forward and in particular, what this has allowed us to do, is build out providers that can give access to things like the Kubernetes resources that we're making available via the Azure Kubernetes service.
So we'll take a look at what that looks like over here. This is the Azure Resource Manager schema and I don't expect you to memorize it all but what you can see here is I'm declaring a managed cluster, that's my Azure Kubernetes service resource on all of the properties, agent size, things like that. But now I have this new type which is a Terraform provider registration and what this is saying is I'm going to run to register the Terraform Kubernetes provider into this template. And so now, in addition to creating an Azure native Kubernetes service object, I'm also creating a Terraform provider object which links up the Kubernetes provider that's already in Terraform with the Azure API infrastructure. And then I can go down here and create a Microsoft Terraform open source resource and that's going to actually be my pod definition. So if you see here, this is the native podspec for a container that I wanted to plug into Kubernetes. So by using the Kubernetes resource provider, by leveraging all of the great work that people have done in the community to bring Kubernetes to Terraform, we can also give access to the same resource provider within the contours of Azure's native templating.
So I think this really speaks to where I think we should go in general. Configuration language is going to be a choice like a programming language, right. I said I'd worked on two, three, four, five, six different configuration languages. The reason I have done that is because they're all a little bit different. They all have a slightly different feel and I think that we have to sort of admit that, just like with programming languages, while people will be arguing about the best one for millennia, there's going to be a wild heterogeneity of different configuration languages for different use cases and the best thing that we can do is enable us to only have to build these kind of providers once. And have them be available to people no matter what language they choose to access them from.
So I'm really excited to have built that partnership with Terraform and excited to see that go forward.
But speaking of containers, you probably would have expected that I'm not going to spend the entire time talking about how I deploy infrastructure as code and creating VMs and things like that so what about containers and the beautiful world. It's all ready all cloud native and declarative. Sadly, it's not. Sadly, it's all hybrid there too and people doing weird things to route things to different places and the reasons are all the same. It may be great and easy to move your web app or to build a new web app out in the cloud in a cloud native way with containers and Kubernetes and all of this great stuff.
But the truth is that HR database is still on a server, underneath somebody's desk. And it's not going anywhere. In part because everybody's forgotten which desk it's under, but in general it's because data is hard to move, it's scary to move. Compliance makes it hard as well. So we need to be able to build a hybrid world where you can go from a data source that is on-premises, maybe on-premises for quite a long time, for either historic reasons or possibly even just data locality reasons. There may be regulations that say that data has to reside in a particular locality. While still wanting to have the agility, flexibility, and reliability of building containerized applications up in the cloud.
So you really want to have this connectivity between these on-premises worlds—I love this factory icon by the way, for the on-premises thing, it appeals to me. You want to be able to connect these things together. Now even in the on-premises world, you want to do this. And in that on-premise world a tool like Consul is a really fantastic tool for being able to provide service discovery and a lot of other services in that world of on-prem VMs. Now also with Azure, we can build things like ExpressRoute. So with ExpressRoute with Azure, I can link up your Azure network with your on-premises network, so that it's one nice flat network space. We can route packets back and forth, we can do beautiful things. We can deploy the Azure Kubernetes cluster into that virtual network so now we have your cloud-native technology running out in the cloud, connected to the same network that's working with your on-premise VMs.
But this important question remains, which is how the heck do I connect everything together? Right? Especially since Kubernetes has its own form of service discovery, it has its own form of load balancing and all sorts of things like that. I have to find ways that I can bridge the world of the on-premises server where it really may be a pet, to the cloud-native world of Kubernetes. Fortunately, it turns out that you can use Consul to do the same thing. So what what we're doing with the Azure Kubernetes service, is we're figuring out ways to make Consul a good citizen in the AKS world and at the same time, it's been great to see the Consul team take tools that Microsoft has been building like Helm, and use it as the way to package and run a Consul service that's running on top of a Kubernetes cluster.
When you integrate the Azure Kubernetes service with Consul, you get automatic synchronization between the services that are declared for your on-premises world and the services that are running in your Kubernetes cluster. With Consul Connect you can actually have mutual TLS happening between the system that's running in the on-premises world and the servers that are running in your Azure Kubernetes cluster. And in particular, because of the way it's installed, using emissions controllers and things like that, you have the ability to automatically inject that sort of service mesh to connect these things together in a much more automatic way than requiring each team to learn about the right ways to connect to any particular service.
So the combination of these things, the cloud-native Kubernetes service on Azure, with Consul to bridge between the world of Azure and the world of the on-premises service really enables you to simultaneously move into a cloud-native world where really you can do very agile development of your front ends, to a world where your database is a snowflake singleton sitting on a particular machine in a particular location.
And just as a pointer to how you can get started with this, there's the Helm Chart out there that you can easily download and use Helm to install.
Alright, so let's take a look at this. Here I'm gonna show you Helm. I already have the Helm Chart installed, so I'm just gonna run through effectively a noop upgrade.
So you can see here when we deploy this Helm Chart, we're getting a whole bunch of deployments, cluster role bindings, some services, and some configuration maps. And here are all the services that are running. And if I try and look up this service, this MySQL service, what's gonna happen is it says it can't find it. So it tried to look up this service MySQL, that's my database that's running on-premises, it can't seem to find it, and let's take a look at a simple service file for Consul, I can go ahead and create that and now if I go in, into that same—and that should be clear, this is a container because I'm using
kubectl exec, I'm getting a container out on my service running out in my Kubernetes cluster. Now in a native way, I can actually use
mysql -h mysql -u, root -p and now I'm accessing that MySQL database running on-prem in a virtual machine, through a native Kubernetes service. So from my perspective, as a cloud-native app developer, it looks and feels like it's Kubernetes. But I've bridged the world from on-premises to the cloud-native world. This is hopefully the beautiful future that we're all trying to achieve.
In summary, I think we built a lot of technology, both in the cloud and with the Hashi' technologies that we've integrated into Azure, in order to enable you to actually move to this world of declarative infrastructure and repeatable infrastructure, in a way that is still respectful of the real world concerns around compliance, around bridging, around legacy systems, to modern systems. And really giving you a chance of gradually iterating your world forward instead of having to clean-room re-implement the whole thing.
Thank you so much. Thank you for your patience. See you later.