Day Two Kubernetes: Tools for Operability
Looking for tools that can help take your Kubernetes cluster beyond just an experiment? Dev advocates from Microsoft recommend Terraform and several other open source tools.
Unless you want to build your own container hosting solutions from scratch, you need operational tools that give your teams a repeatable process for deploying things like schedulers and Kubernetes clusters.
In this talk, Bridget Kromhout and Zachary Deptawa, two developer advocates at Microsoft, share some of their favorite open source tools for taking that beginner Kubernetes experience to actual production usage. They include:
The talk will also cover containers and Kubernetes clusters at a high level while also covering practical, day-two application of these open source tools to deploy Kubernetes clusters reliably.
Principal Cloud Developer Advocate, Microsoft Azure
Cloud Developer Advocate, Microsoft Azure
I'm really excited to be here, and I know that we're trickling in from lunch, and it's okay. We're going to start at the very beginning, it's a very good place to start. I think that Kubernetes is the Greek for "helmsman," it's one who steers. And you're like, excellent, we're starting at the beginning. I have to do the traditional second slide because you have to have a second slide, right?
I live in Minneapolis, I work at Microsoft, more on that later, and I podcast Arrested DevOps. If I sound familiar, and you're like, "Oh, that's where I remember her voice," I'm also the chief cook and bottle washer for the global organization that runs devopsdays. I think we had 70 events on 6 continents this year. Talk to me if you want to run one in your city. Definitely talk to me if you happen to work at McMurdo Station [in Antarctica], 'cause it would be cool to be on 7 whole continents.
Sadly, if any of you were here for Zach [Deptawa], he's not able to be here, which is really sad. He is sick, so I would love if everyone wanted to start this talk off right by tweeting some get-well wishes to Zach. And again, if you're like, "Ugh, it's just you? I'm walking out." Yeah, I understand, it's fine. Because Zach does a lot of the webinars with Hashi. He's really involved there and I'm really sad he's not able to be here.
We have to have an outline. By the way, whenever somebody puts up an outline like this, I mentally substitute it as, these are spoilers for this talk. We're going to talk a little bit, just like level setting, on what is going on in this ecosystem. We're going to talk about some of the exciting open-source tools. I feel like, If people give you a headset mic and a remote and let you stand on stage, you should definitely prognosticate about the future. So I'm going to do a little bit of that, we'll see. We'll see how much we can get to.
Starting with: What actually is this container stuff? I look around the room and I've got to tell you, I've been doing this thing for a couple years now, where I stand on stages and look around the room, and you're all getting younger all the time. I'm kind of like, how much of this stuff, the ancient history stuff, are you actually here for? Then I realize there are wonderful young people in this ecosystem who have amazing insights. For example, my colleague Jessie Frazelle. She was speaking last week at GitHub Universe, and pretty much I could mic drop on a whole talk about containers right there with, "Hey, containers aren't actually real." And you're like, "What? But they are the foundation of all important computing now." Yeah, right.
It's important for us to remember that the way we're using containers today is taking advantage of namespaces, define what a process can see, cgroups define what a process can use. These are features in the Linux kernel. Especially when you're having conversations at your organization where people are like, "We have very important DevOps digital transformation that we must accomplish by the end of Q3. Containers are key to that." And you're like, "I never really thought people would be that excited about Linux kernel features, but here we are."
People get excited about containers because they imagine all of the wonderful problems that they can solve. It's true, they can. Containers make it trivial to produce consistent environments, and reproducible, repeatable deployments. I worked at an organization a couple of years ago that started running Docker in production in October 2013, when I think the main thing on Docker's website at that point was giant letters in PLINK that said, "Under no circumstances should you run this in production."
The company that I worked at was a little streaming-video company, and we were like, "YOLO!" It got acquired by Warner Bros., so I guess sometimes YOLO works out. I think people get excited about those stories, and I feel like I have to also demystify that and tell them that containers are not going to magically solve all the woes in your code base, and they're not going to magically solve all of the problems in your organization. They might make some of them worse because it gives you something new to manage, and it gives you new failure modes to worry about.
Containers are not going to magically make everything awesome. At the same time, people get excited about containers because it seems like this great new thing, and I'm kind of like, "Not so much." Because I got my CS degree, by the way, back in the 90s when the CS departments would give root on their faculty members' machines to undergraduates because it was the 90s, and many things seemed like a good idea at the time. We were using containers then, because they're not actually new.
Containers have been, in one way or another, with us for quite some time. But they are a lot more usable today. Make some noise if you used FreeBSD jails, or if you used Solaris Zones. All right, there are a few of you. I'm not the only one who remembers that stuff. This was useful. We used it for a reason. It was kind of a pain, but we used it for a reason. It solved problems for us.
When containers started becoming more mainstream, the genius of Docker is that they took something that was only accessible to a small population, and made it significantly easier for nearly everyone to use. At the same time, the future is still pretty unevenly distributed because not everyone can even say that they're using containers in their organization today. Certainly not everyone is using Kubernetes. I have on the screen here saying that Kubernetes is way too newfangled for your enterprise to be ... Yeah. This is not actually new. It's been around for four years and some change, and at the same time, it's not the endgame. It's just an orchestrator. Nomad's an orchestrator, there's DC/OS. There are a lot of orchestrators in the space.
When you're trying to think, what problems are you trying to solve, I think that's where the conversation about containers should be. Because the orchestrating and the care and feeding thereof, they're a means. They're not an end. I really appreciated what Armon was saying about some of the nuance in choosing orchestrators, and the keynotes this morning. Just because something has a lot of airtime does not mean that it's the right solution for the problems you have.
Honestly, the whole container-and-orchestrating space is a place I would watch out in your organizations, maybe in your own heart, for résumé-driven development. People get really excited about something shiny, and they usually say, "I definitely need to start a Kubernetes initiative." I'm like, "What problems are you trying to solve?" It's amazing how sometimes the problem that they're trying to solve is that they want to have a Kubernetes initiative by Q3.
Even though I'm going to talk about some exciting tech in the space, I want to make sure that we're level setting in terms of "Should we?" And the answer is usually "What problem is it going to solve for you?" It's amazing the answers you'll get if you ask that in your organization.
That's the reality check portion. Let's talk about some of the tools in this ecosystem. It's a very confusing space. Probably not intentionally. I shouldn't assume people do it on purpose. A former boss of mine, Tim Gross at DramaFever, likes to call it "conservation of complexity." Which is to say, whatever tool you choose, it's not going to make anything magically simple. Like, say, microservices become popular. So you definitely need microservices, and you've now replaced your IPC with transient network failures. You didn't make a problem go away. You moved it around.
That is completely legit. It makes a lot of sense, but it's conservation of complexity, something really important, when we're starting to work in this kind of complex space, to think about. When we're using tools like Kubernetes to move the complexity around, we should be cognizant of that. You can't ignore it.
There are plenty of orchestrators out there, and Kubernetes is not the only one, although it's the one that I'm talking about. But if you are using other orchestrators, you can definitely think about these elements, because they're pretty important. What is the orchestration doing for you? Hopefully, it's doing some kind of scheduling and moving and monitoring of workloads. It's all of the things that when you built your homegrown platform, and you did, 'cause we all did, all of the janky Bash, like you probably got really tired of maintaining, and that's the stuff that's mostly moving into your orchestration systems now.
If you're saying, "Oh, we don't use Bash in production," and I'm like, "All right, when you build your own platform, it's probably built out of Docker and Packer and Chef and your cloud of choice." And the underlying substrate of all of the Bash that you hope somebody is checking exit codes. You know this is true, just admit it.
But you probably want a bunch of this stuff in your orchestration, which is why when people are looking to Kubernetes as an example of orchestration, you want the extensible but also self-healing built into it. If you're cobbling together a lot of this stuff yourself right now, you probably want to be looking at some kind of orchestration layer. That doesn't mean this is the right one for you. In fact, I think if you're trying to make some decisions, talk to the fine folks at Hashi. 'Cause Armon gave a really good description this morning of how you make some of those decisions about where the trade-off of complexity and features is for you.
But we also heard from Mitchell about Hashi's Kubernetes support. It is a choice. I'm not going to go into a huge amount of detail about architecture. You can look at architecture diagrams. I think that it's useful to understand that there is a control plane that in some of your managed services is not going to be something you have to worry about. Then there are going to be a bunch of your worker nodes, which can be VMs or bare metal or whatever. If your master or your control plane ... And master is kind of a useless term because you very frequently could have multiple and in fact, if you're going for HA you probably have multiple nodes in your control plane. But it may or may not run on VMs. It definitely is going to be using some etcd.
You need your state to go somewhere. If you're groaning and thinking, "State, but I thought this magical container cloud stuff made that not a problem." Oh, I have a bridge to sell you. I have a pro-tip when dealing with vendors. When they show you the demo, that is super stateless and smooth, be like, "Cool, okay, I wanna see the demo with database failover" because otherwise I assume you have state where your customers and your money live and if you don't, congratulations on your exciting new startup. I hope that it goes really well and you get some state which matters.
Another thing that we can look at when we're talking about components in Kubernetes is this idea of a pod. If you're not used to Kubernetes, the idea of a pod can be one or more containers. It's sharing the same context and resources and you're suddenly like: "sidecars." There are a lot of different ways to do this, but there are a bunch of other components here that are interesting. Kubernetes is a fast-moving, open-source project. Every time they release, there are exciting breaking changes, so it's not the sort of thing where you're gonna run ahead at all times. They are also constantly renaming everything in here. Every single slide you read or conference talk, tutorial, you're probably going to have to do a lot of mental substitutions as to what's being updated. Keep that in mind.
Without diving too much into the stuff in here, kube-proxy does network proxying. The kubelet, and we'll talk more about that in a little bit, is a process watcher. It makes sure that the specified containers that you're expecting are running.
On a higher level, what people are trying to get out of Kubernetes, and some of the things that would help you choose a tool like this is the quick deployment and reliability of the predictable deployments, predictable and pretty simple scaling. The idea of being able to do a lot of very nuanced roll-outs of new features in your application. You can set all of your tolerances as to: "Are we okay with overrunning our capacity a little bit? Maybe our cloud bill is higher." I work for a cloud, that sounds okay, but you also don't want it to be too high, because otherwise everything is bad. You can set exactly how much you can burst and then how willing you are to have any request dropped on the floor. Say you have none, then okay. There are a lot of toggles that you can switch. Especially for people who are looking at on-prem or a hybrid solution or just trying to control their cloud spend, it's valuable to be able to limit the hardware usage. You can say, "We only want these resources, we're only gonna provision to these resources." That's pretty useful.
This talk promised a little bit of detail about day 2. A lot of people are thinking, "Excellent, I am definitely going to 'Kuber some netes.' I want to have Kubernetes, what do I do next?" You have to look at the tools that will help you solve the problems you're trying to solve. We're getting beyond 101, but everyone has a different problem set. Let's talk about some of the tools that can help you make it more operable, which is to say you wanna move past, "Cool, I saw a demo" to "What could I do day to day?"
And it's a pretty big and complex space. The little corner of it that I'm contributing in is trying to make this stuff more usable. I would break it up into three sections. Getting started with Terraform is super useful, and Azure, probably AKS, then managing your configs and your apps, because you can end up with so much YAML. When I run workshops for Kubernetes, I feel like I have to give a content warning for YAML. YAML may be upsetting to you. Many of us feel this way, there is lots and lots of YAML, but you can use tools like Hound to make that a little bit better.
Event-driven scripting, let's talk about some of that. Terraform and Azure. I started getting some slides ready to talk about this, then Mitchell covered it in a keynote this morning. A lot of people are trying to do hybrid cloud stuff, so you want your workflow to be the same across environments. It's pretty cool using the Kubernetes provider inside Terraform to deploy the pods and services to your Kubernetes cluster. If you want to look into this later, these slides will be online and there's a URL here. If you're using Azure, you can go check out all of the stuff they've been working on.
I started screen capping it and the slide's impossible to read and there's another page or two. Go take a look, because that's only some of the results of that search. I was really excited this morning about the idea of being able to deploy your managed Kubernetes into AKS through Terraform, then deploy your services and pods right into that AKS cluster that you just created by using the exact same code base, but with a different provider. We're always talking about wanting to make repeatable workflows and make things more usable. That's how we do that.
I added this slide an hour ago because I thought, this is perfect. AKS, it's your manage solution and if you're thinking to yourself, "But I ran through Kelsey's Kubernetes the Hard Way and it was so exciting," yeah, it was. And are you planning on up-leveling everyone at your organization and having them all be able to support every single nuance of running it yourself? Maybe, in which case you're probably Heptio.
If you don't work at a vendor whose main raison d'être is to make Kubernetes clusters work really well, you may find that you might wanna do what your organization profits from. Using the manage solutions with your cloud provider of choice is probably a good idea. There are a lot of open-source APIs so that you can scale and run your applications but not worry about the infrastructure, so that's worth looking at. All of the cloud providers have, hopefully, some very simple mechanisms to get started. This isn't one of those things where we need to sit and look at every single detail. You can just create. Then have the CLI and look at the notes. That's not production ready, obviously that's just seeing a manage cluster, but if you're managing it, one of things that you're doing a lot usually is there's either new features or there's zero-days or whatever. You absolutely need to do an upgrade.
But it gets really annoying trying to hand things off and make sure that the upgrade goes well and there's in-place upgrading for these manage solutions. Managing, upgrading, scaling, it's probably worth looking at. We're all technologists, we're all smart. We can do all of these things, we have to decide which ones to spend the most time on. That's what some of these solutions are good for.
I wanna talk a little bit about Helm next, because and I had it on the list for a reason and not just because a bunch of my colleagues at Microsoft now work on it, but it's actually a CNCF incubating project now. It's under Cloud Native Computing Foundation governance and it's really great. It's the solution for what ails you in terms of all of that YAML. It's your package manager. You have versioning and sharing and releases and release rollbacks. I have a problem with that word "rollback" because I always think, "Okay, Cher's wrong. You cannot turn back time."
You definitely cannot, you can't get back to the state that you were in. Wouldn't it be great? You can roll forward to something that you hope is like the state where you weren't being paged, but you can't actually turn back time. But having good versioning of, yes some YAML, does help a lot in that space. I jumped ahead because I was so excited about that, but I'll just click through these quickly.
The Helm Charts are where you keep all of this information. The updates become way less annoying and the sharing is really valuable, because when you want to use Prometheus or something, You could go read the directions and install Prometheus. This'll be 800 commands. Or you can just install the Helm Chart and you're running Prometheus. That is one of the ways to get things running on your Kubernetes cluster much faster and also, you've set your own release check points and you can get back to a previous version without too much pain.
Okay, so you're thinking, "Alright, that sounds great, but my cluster itself is not the only thing I'm managing. I'm also trying to operate my software. The entire reason I have a cluster." For that, I would recommend looking at Draft. It's another open-source project that we and other organizations contribute a bunch to and it's at development and in deployment. You get a couple of commands, you can deploy your applications, and you can get the kind of help you're looking for to not have to constantly write the Docker files, write the Helm Charts.
Raise your hand if you absolutely adore writing YAML. I see two jokers raising their hands. This is a tool that can make it a lot easier for you to generate the configuration that you need in order to be able to manage your tooling, by which I mean your application. As in, "It was great installing other people's stuff with Helm. I would like to do that with my stuff." You can do that in any language. Another thing to take a look at is Brigade. If you were watching GitHub Universe, you were like, "But wait, GitHub Actions." Yes, they are now in the space. For now they're only for private repos.
This is event-driven scripting that you can use right now and it looks like there's parallel evolution there and there's already some issues in the Brigade repo. Of course, this is all developed in the open, so you can take a look at that, but it's worth looking at because it looks like GitHub is gonna be part of Microsoft soon, which is awesome. I think that these evolutionary branches are probably gonna grow together. With Kubernetes, you define everything as desired state. It's desired state configuration, but if you want to do things based on events, that doesn't really work.
If you look at Kashti, this is a UI to display your events. This is: How do you know if your event-based task failed? It's a dashboard. It gives you a view of all of your event pipelines, failures, logs. The kind of stuff that you definitely could build, but do you want to?
We've now done a quick tour of a lot of building blocks that you can use. None of those are the answer to everything forever. They're a building block you can use to create your deployments, so you can create the deployments that you can repeat. You know how, when you proof-of-concept something, and you deploy it and it works and people say, "That's exciting," and you say, "That was really a pain to do, but I'm glad I got it done." Then somebody says, "Do it 18 more times with these subtle changes," and you're like, "Ugh, my life." This is the attempt to make that a little bit less painful and, of course, move things between all of our environments.
If we're looking to what's coming next, first of all, I hope everyone who is an eligible voter in the United States is going to vote. Notice that I am not telling you what you should do with your choice, because this is your franchise, but it is really important. Statistics on the internet tell me that many people choose to not be heard and I'm like, "Really? Because I want people to listen to me all the time." You probably want people to listen to you, so I highly suggest making that happen.
Other than that, I think it's really important, and apologies to Ian Fleming because it's not actually diamonds that are forever. I mean, they are, probably. I don't know. I'm not really into diamonds, but day 2 operations need to keep working. It's kind of trivial to get something working on day 1 and I feel that the people I've been talking to here at HashiConf who are in the "Great, we need operability and repeatability" space understand this already. We have to get our organizations to understand that once some software works in production, you're going to want to keep changing it and you're going to want to have the ability to update it safely and you definitely need tooling like the stuff we've talked about.
I think that that's not the only dire portent. I think that it's pretty important for us to remember that all of these things that feel really new and exciting, they are new and exciting, but wow, this stuff is getting really real. Your airline and government are probably now deciding to "Kuber some netes." They want to start and continue using it and be successful. Maybe you work at some of those institutions. Oh, hey, how do you feel about breaking changes now?
It's important for us to be thinking about day 2 operations and whenever we're YOLOing something out into production because it seemed like a good idea at the time, having those conversations inside our organizations. That means we have to learn a lot and you're at a conference, which is a really good place to do that.
I've noticed one of the things that's challenging about going to a conference is you get back to the office and you're like, "I learned so much. It was great." People say, "Excellent. Teach it all to us now." You're like, "Uh. It was great. It will be videos." I will suggest that you take a look at container.training. That is an open-source container-training module that some colleagues and I have been working on with collaborators across the space. It was started by Jerome Petazzoni. You might know him from his work at a little company called dotCloud that later became Docker. That is worth looking at, so you can run yourself through some learning opportunities, just like the awesome stuff at HashiCorp Learn that has come out now.
Even though you are definitely smart enough to set up Kubernetes from scratch, I would strongly encourage everyone not to because at least to start with, it's way easier for other people to be on call for it if you have some people in your support chain who focus just on that and don't work at your organization.
I remember, it was 2012, and I was working at a startup that was using Hadoop and it was in our critical path. I set up MapReduce from scratch and then I went to our cloud provider and was like, "Hosted MapReduce solution. This is definitely something we're going to use because I'm a one-woman ops team and all these devs are on call with me and I might be able to fix the thing I stand up from GitHub and maybe they won't." You have to choose where to spend your complexity tokens.
I was very excited at lunch to see James Watters from Pivotal. If you don't know him, you should go check him out on Twitter. Hopefully he's tweeting on the conference hashtag so you can find him easily. He talks about the value line, as in, which stuff is going to provide differentiating value to our organization that I work at and which stuff is not? I can't tell you the answer to that for your organization because it's always going to be different. It's very probable that running a cloud provider or running a platform service is not where your organization gets most of its value. It's really valuable, I think, to think about where you want to spend those complexity tokens.
Other future-looking stuff: There's a very cool project going on, and no, not those Packers. We are at HashiConf, so the "packer" we're talking about is building images and Azure is starting to offer it as a service. There's a link there. It's very new, so it's worth checking out.
Helm 3 is coming out and this is so new that I can just point you to a blog post and some stuff on GitHub. If you'd like the opportunity, especially if you're using Helm and you have opinions about Tiller or opinions about Lua, if you have poll-based workflows and you're very excited about Helm controller stuff, it's worth taking a look and having your voice heard because right now is the time to have an outsized and significant effect on open-source projects.
If you haven't seen Virtual Kubelet and especially if you have data centers of your own, that is probably worth looking at because it lets you add anything you want to your Kubernetes cluster. You have a giant data center somewhere? Sure, it's not registered itself as a node and you can have developers deploying pods to it. You're like, "Excellent, our bursting bill in the cloud is way lower now because we've been able to add our existing data center as part of our Kubernetes cluster." It's worth taking a look at. Even if it doesn't solve a problem you have now, knowing that those sorts of things are possible might help you come up with the right solutions for you.
I've taken you on a whirlwind tour of a lot of things happening in the open-source space and I also want to point out from my colleague Erik St. Martin, who runs GopherCon, that Kubernetes, with apologies to "Halt and Catch Fire," he says, "Kubernetes is not the thing... it's the thing that gets us to the thing." It's not even the only thing, but containers, abstractions make for better portability. Great, especially if your organization is still thinking about public cloud. Cool. You're probably going to "Kuber some netes." Awesome. That's not the endgame.
When we look at the endgame, when we think about the fact that the world just keeps changing around us, which is exciting, I like to point out what Jeffrey Snover said just recently, that Azure is over 50% Linux. People don't realize that, but it is and I bring that up specifically because I think that it's valuable to not just think about the past. Also think about the future that we can create.
Recently, Microsoft used to have opinions on open source. Microsoft just open-sourced all our patents. I think from that we can conclude that change is probably the only thing that's going to keep happening.
This is my pinned tweet because it sounds ridiculous and sublime at the same time. I work at Microsoft. They gave me a Mac with a Microsoft asset tag on it and my job is to get people to use Linux. From which we can definitely conclude that this timeline is not what any of us expected 20 years ago, but I joined because I think Microsoft is genuinely focused on open source to create a better world and I think that a thing that you can all do, too. You have the power to do that. Not just by voting, though you should, but by creating the infrastructure of the world we all live in.
I have links here. I podcast too much so I was going to say "in the show notes," but no. I will tweet the slides. There's a lot of stuff to check out here. Zach has a repo with a bunch of demos in it that you can check out if you want to play a lovely copy of the home game. I am going to be up at our booth the rest of today. I won't be here tomorrow, but I'll be up at the booth the rest of today. If we can chat about some of this stuff, we'd love to talk. Thank you so much.