Skip to main content
Presentation

Modern Ops tooling: I get by with a little help from my friends

The container space can be overwhelming, even when your job is to keep up with it. Abby Fuller, an AWS technical evangelist, takes a distilled look at the tools and trends you might want to get familiar with if you're working in cloud-native environments.

In an environment running lots of containers, how do you manage them all with automated tools? What about monitoring and observability? And how do you debug them?

Abby Fuller, a senior technical evangelist at Amazon, follows this space closely and has a bunch of tool suggestions for container management—including CLIs, debugging tools, monitoring, and other development and operations tools. You'll probably come out of this talk with some new tooling ideas for managing your system.

Speakers

Transcript

Good morning. Welcome back. This is day two of HashiDays. I'm really excited to be here. A little bit of housekeeping, by the way. I have to race to the airport probably literally after this, so I can stay to hang out and answer questions for just a little bit, and then I have to run. We'll be quick today.

Let's jump right into it. This is a different talk today. It's called "I get by with a little help from my friends." I think I mean that two ways, and I'm going to get into both of them in just a second.

But first things first. My name is Abby. I work in developer relations for Amazon Web Services. That's kind of a fancy way of saying that I used to be a DevOps engineer, or an infrastructure engineer, or whatever the cool kids are calling it now, and now I talk to people like all of you about how you can work better with your infrastructure, and your systems, and your software and things like that.

Some of you may have noticed that I talk and tweet about containers quite a lot. Containers are great, so that's why I started using them in production in the first place. There's a lot of benefits:

  • They're really portable
  • They're really flexible
  • They're pretty scalable
  • They're easier to work with

But with them, unfortunately, comes quite a bit of another kind of work: Scheduling, and resource usage, and service discoverability, and deployments, and logging, and monitoring, and services meshes, and how do I log things, and how do I observe them, and all of that good stuff.

This is really hard. It's a lot of things to deal with, when really you're moving to containers in the first place, because you want it to be easier. Even when it is literally your job to keep up with the container space. It can still be overwhelming. It seems like every day there's a new project, or a talk, or a blog, or a new tool, or an open-source project, or a video, or a tutorial.

It's a lot to keep up with. How are you supposed to keep up? How are you supposed to parse all of these things? How are you supposed to take care of all of this kind of work that goes into using your containers, without having to spend all of your time focusing on the housekeeping bits.

Because what you really want to be focusing on is how can I just run good applications. How can I do the thing that my business is trying to do? I had a customer put it to me in really nice terms once. He worked at Travelex, [inaudible] currency exchange. He said, "Anything that does not directly enable us to exchange foreign currency is not something that I want to think about." He had one job. So he had a core job, and anything that didn't support him doing that wasn't something that he wanted to worry about. But all this work that goes into containers is something that a lot of people don't have to worry about.

How do you do this? How do you keep up with that? How do you make all these things happen, without spending your whole day thinking, well how do I monitor this? How I deploy them? How do I scale them?

I think that the best way to do that is with a little help from your friends. I mean this in a couple of different ways:

  1. The first one is through your tooling, and through your processes. Your tools are your friends. How can you lean on other things that aren't just you, and yourself doing your work, and your coworkers? How can you lean on things like tools and processes to enable you to make this a little bit more effective.
  2. It's also through your community. So that's person, friends, or people friends. How can you get help from resources outside of just yourself?

Your tools, and your community. We'll talk tools first. In other words how can you work smarter, and not harder. (I'm pretty sure that that's thought leadership by the way. Please feel free to add me to your list on LinkedIn as a thought leader.)

Secrets management

We're going to start with safety first. I think that safety and security is the thing that we want to start with from the bottom up. How can I keep myself safe and secure as I'm working with all of these scalable systems?

If you're working with AWS, that starts with something like roles. In other words, how can you and your service perform actions securely? That's something like a service role. What can your service perform on your behalf? Like access, Kinesis, or DynamoDB, or user role—that's a person.

What can I specifically do? Maybe I can add new IM users, or maybe I can create new tables, or we have a group, and that's how I can group users with the same permissions. But it's not just about your roles and what you and your service can do on your behalf, but you have to care about your secrets too. And there are different kinds of secrets, and it's probably worth a whole, entire talk to just talk about secrets management, and lots of people have done it. They've actually probably done it better than me. We're going to just do one little summary slide about it.

A couple of words of advice, and Bridget talked about this really well yesterday too. But you want the principle of least privilege for your secrets—and your keys as well. Don't share them between your services. Don't share them between your humans. Go for the principle of least privilege, or least access where possible. Store your keys somewhere else, and then access them securely, rather than just storing them as like environment variables right inside your container. But, because this is a talk on tools, I don't want you to have to manage all of those things yourself.

Lean on tools, both to check your images for vulnerabilities, and help you store the secrets. I have a couple of options there. The goal of this talk ultimately, is that people will walk away, either having found out about a tool, or an open-source project, or just thought about something that they hadn't thought about before. Here are a couple of options that you can use to manage your secrets. Tons of different options out there. You could use something like Vault, or Aqua MicroScanner, or a security scan. There are so many options out there to help you be safe from the beginning, and then help you store all of your personal secrets that other people can't get to them.

Everyone has secrets there right? This isn't something that you can get away with not thinking about. Everyone has to think about it.

Local environment ops tools

Now that we're thinking about safety, we'll go from the bottom up again.

Good development starts locally, because everyone—and if you say that it's not you, then I know that you're lying—everyone I think has been in that situation before where it worked on my machine, or I don't know why that doesn't work. Worked on staging, or it's working in production, but I can't get it to work locally. I don't really know what the problem is. That's because our environments tend to drift apart.

Your good development has to start locally, because that's how you get rid of things like production bugs, and how you eliminate things like well it worked on my machine. Some common approaches to this might be: Vagrant—this thing that I used. Or an EC2 box with local access. But what you're really wanting to do is keep a shared environment, so that it's standardized. So that I can run the same thing locally, as I can on staging, or on production, so that I can test things myself in an environment where it's as close to the production environment as possible, so that I can get things really close, so that I can actually test.

It's not just—I built this locally and now I'm hoping that it works in the cloud. It's—I built and tested this locally in an environment that was as close to the cloud as possible.

Some tools that might help you out. I put Vagrant, and Cloud9 there. If you're using Kubernetes you can run things locally with Minikube. I can never figure out how to pronounce anything in the Kubernetes ecosystem by the way. Kubectl, Kube "Cuddle", Kube "Control". I can't figure it out. I made a best guess at everything that I was supposed to be pronouncing on this slide. If you disagree with me, and you think it's Kube Cuddle, I'm pretty sure that I'm right.

CLIs

From your local environment to your production, you probably want to interact with your containers through CLIs. You don't want to be the person that's sitting there like, "Hang on, I have to deploy a new version." Then you type out every single command by yourself. You're using these CLIs, you're scripting around them, you're automating around them. Your CLI in a lot of cases—and this is a gross generalization—but it's 30 minutes—your CLI depends on probably the tool that you're using on the other end. That's your orchestration platform.

We're going to look at ECS from Amazon and Kubernetes. I can say Kubernetes by the way. I just can't say Kubectl.

A couple of CLIs—there's the AWS CLI, that's the official one, but there's also a bunch of unofficial ones. There's also a second official one. I'm not really sure why there's two, but the ECS CLI supports Docker compose files, if that's the format that currently floats your boat. Two unofficial ones that I listed here. The Fargate CLI and the Coldbrew CLI. Both are community projects. I'm here to tell you, by the way, that it doesn't matter what tool you use. The tool that you use is the one that works for you. That you can maintain, that you can work with, that you're happy with. You don't want to just use what I'm using because it sounded cool. You don't want to use what he's using because it sounded cool. You want to use what actually works for you.

That's whatever you think that you can work with, what you can maintain, what you're comfortable with using. If a community CLI works for you better, absolutely use a community CLI. If a tool that you wrote yourself works better, I mean, totally use it. It's what you can maintain, and what you can scale, and what you can work with.

A few of the CLIs for Kubernetes: Kubectl, and not Kube Cuddle. Kops, which is Kubectl for clusters—can use to manage your production infrastructure. Kubeadm, or I don't know how other people are saying this now. Adam probably, something crazy. It can help you bootstrap your production-ready clusters. Not production ready, but still pretty cool is Kubicorn, which will help you manage your production infrastructure. There are a lot of tools out there.

Ultimately use the one that works for you, but it's not something that you maybe want to be doing everything by hand. We're going to talk about when you want to do things by hand in just a second, but you don't want to be doing everything by hand. I'm interacting with my clusters via CLIs, now what? I have to automate.

Everybody say it with me: No artisanal containers.

Infrastructure as code

You want to treat your infrastructure as code just like your actual code. So, automate, and template things out.

You want to make everything repeatable, and portable. That means not exec-ing into things, and installing packages, and then calling it good. You want to make things repeatable, reproducible. You can build them directly from repository, but make it part of your CI/CD process. After that, you want to automate your AWS.

CloudFormation, that's from Amazon, extremely JSON-y. Amazon loves JSON, and especially if you can nest the JSON, that's what makes it good. If you don't like running cloud formation templates, there's some snippets there. It ends up looking something like this, with a lot of comments there. A less JSON-y option would be Terraform, created by HashiCorp, thank you all. Open source, either one works.

Ultimately they're both resource management, and probably the most common question that I get after things like this, is which one am I supposed to use? Which one is the right one for me to use?

It's whichever one you're happy with. I work at Amazon and I use Terraform in production, and that's okay. Part of the main difference: CloudFormation does planning, and execution in the same step. You just run your CloudFormation stuff. Terraform does it in two steps. You have to plan, and you have to execute. Whichever one works for your process is the right answer here, or how much you like JSON.

(I know some of you like JSON, but I have a hard time with nested JSON, by the way, because it's like, well I did this, and it's really awesome, but now I'm playing 'find the bracket.' Never been a big fan of a game of find the bracket.)

Here's a Terraform example. These are just snippets by the way. They are not actually things that you can apply, that's just what they look like.

Monitoring and observability

So you've automated. You're using CLIs, you've thought about security. How do you know that everything is working? Monitoring, and observability. This is what helps you debug, and then understand your system. That actually, it's been really cool over the last maybe year and a half, two years, because you've seen more, and more people not just do logs. How do I find out what's literally happening in my application right now? Console.log, whatever. Hi, if you're me, or some sort of square probably, if you're getting really frustrated.

But we've seen this kind of upsurge in people that are trying to understand what is going on in your system. Not just an output, but what's actually happening. Why does this happen when this other thing happens. I think that's important right? Because I think Charity Majors has been one of the bigger talkers on this recently. She runs something called Honeycomb.io. I think the way that she put was that, do you really want to figure out what's wrong when you get paged? Do you want to understand what's happening in your system way before that? How can I understand what's happening, before it gets to the point where it's actually broken, and it's woken me up?

How can I understand that first? Beyond those logs, I want actual usable, helpful information about what's happening with my system. This will be, in fact, the title of my memoir, other than things that you eat in an airport don't count, so peanut M&Ms. They put everything inside M&Ms now by the way. Peanuts, peanut butter, pretzels, rice. Like crispy rice pieces, anyway it's great.

A couple of things to think about. You want to reduce the noise. If you're getting paged for something, and it's not an actual disaster, it probably shouldn't be waking a human up. Also you want to page only on things that require immediate, emergency attention. Everyone has gotten the page. Anyone that's been on call before has gotten the page and said, "This is fine. I can take care of it when I get to work." It should not have you woken up if you can take care of it when you get to work.

You also need all of it. You don't just want monitoring, you want observability. You want to be able to see inside your actual system, not just get just the output. You want to be making sure that you're asking, and answering the right questions. I think Fog Creek started this—the five reasons why. Not just why did I get paged? It's like, "I got paged, because Nginx went down." It's like, "Okay, but why did Nginx go down?" We got something that we weren't able to handle. It's like, "Okay, why did you get input that Nginx wasn't able to handle?" You keep working back, until you found the actual root cause. Make sure that you're asking, and answering the right questions, and that you have the tools to find the right answers, but also to find the right questions.

Because this is still a talk on tools, even though I wandered in a different direction. You don't have to go at it alone. There are lots of things out there that will help you make sense of what's going on, but it has to be beyond just logs. It has to be beyond, and everyone also has had those logs, where sure I'm logging. I have 10 terabytes of unstructured log files in S3. Those aren't logs. That doesn't help you understand what's going on. It doesn't help you solve a problem. It's just a big dump of information.

Make sure that you have usable information, usable things that you can take action from. Things that you can actually understand. Here's the caveat. You have to use the right tool. So, not the tool that I'm using, not the tool that he's using, or she's using. You have to use the right tool. That means that you don't have to do everything yourself when you don't have to, but it's still okay to get fancy on your own, if there is no tool that works for you. There are people doing observability before there were observability tools. There have been people that wrote their own log driver. There have been people that wrote their own resource management systems.

It's okay to roll your own

That's probably why we have HashiCorp. It's okay to build something yourself, if there's not something out there that works for you. It's okay to get fancy on your own. It's okay to customize on your own. What I'm saying is don't reinvent the wheel when you don't have to. Not everyone has to write their own log driver. Not everyone has to write their own custom deployment infrastructure. It's okay to both use tools that already exist to help you, but it's also okay to write your own if you don't find one that fills that gap for you. That might be something like tweaking your load balancer settings, and auto scaling group setting to get your deployments to be faster.

My new thing is overlapping screenshots by the way, because I couldn't find a slide big enough. It might mean bringing a custom AMI to AWS. It might be that you resort to configuring everything with Bash and EC2 User Data. This girl. I have not yet met a problem by the way that Bash and user data couldn't solve. Do what I'm saying maybe, and not what I actually do in production. That's also thought leadership.

User data is good for everything. You can start services there. You can configure your environment there. Hypothetically you could support Docker flags there that AWS doesn't support in the UI. This is being recorded obviously, but I'm still going to show you. Hi AWS PR. I can just echo whatever I want straight into Etsy sysconfig Docker. This is extremely crafty by the way. You can totally do this, but that's what user data is for. I'd done things in user data that I couldn't do in the UI, and that was okay, because I needed something, and the tool that I was currently using wasn't enabling me to do that, and that's okay for all of you too.

It's okay to do things that aren't supported in the UI. I mean safety first. Well, I'm not sure that user data is safety first, but safety first. Do what you need to do to get the result that you want, but promise me you'll use your powers for good. This is my favorite show by the way. Okay. Remember when I said at the beginning that there was two ways that you could get some help.

Using the community as a resource

Here's number two. Your best resource is the community. It's all these people that are sitting next to you. It's those people that you're talking to on Twitter, or on literally 10,000 Slack channels. So many Slack channels. It's people like that. It's your community. It's your peers. It's the user groups. It's the meetups. It's the Slack channels. It's the Twitter accounts. Kubernetes, and AWS, and HashiCorp services, and all these open source projects. They all have extremely vibrant communities and resources. You have people writing these blog posts. You have people giving talks. You have people running user groups, and meets up. They're all out there, and they're here to help you.

The best people to learn from, it's not just talks like this, it's not just talks like the rest of the HashiDays. It's people working on the same kinds of problems that you are. Talk to your peers. Talk to your colleagues. Talk to your internet friends. Write a blog yourself. Give a talk yourself. Come to conferences like this one, but I guarantee you that, absolutely everyone out there—you're not working a problem that no one else is also working on right? Lean on your peers. Talk to people.

If you're working on something, and you're like, "I think I'm kind of stuck. I don't understand how to fix this." Chances are someone out there is also working on something really similar, and that you can learn from them, they can learn from you, that you can all share things. Also, by the way, don't be that person that goes to Stack Overflow and they ask a question, and then they just comment back seven weeks later and they say, "Never mind, fixed it." Post the solution. Don't be that person.

Everyone out there is working on things that you can learn from, and they can learn from you. Write about your experiences. I got this job by the way, because I wrote a blog post. I wrote a really annoying GitHub issue that's still there. Someday I'm going to fix it myself. I just got access to the repo, so it's on. But I got a job, because of a blog post, and a lot of people out here that speak at conferences, they speak at conferences because they worked on an open source tool, or they wrote a blog post, or they wrote a talk, or they went to a conference, and met someone at a user group.

That's how people get started sharing their experiences. If you don't know how to get started, or if you're struggling, ask for help. Ask me, ask your internet friends, ask your real life friends. Resources will get you started, because this is about tools. Find a meetup. Join a community call. Come to a conference, try user groups, again so many Slack channels.

This is just about it for me. I have had my half an hour. I will wrap up. Thank you all for coming. I hope you enjoy the rest of your HashiDays. I think we're going straight into the next talk after this, but if you want to ask any questions, I will be hanging out around the back for a little bit before I head out. Thank you, and I hope you enjoy the rest of your conference.

More resources like this one

4/11/2024FAQ

Introduction to HashiCorp Vault

Vault identity diagram
12/28/2023FAQ

Why should we use identity-based or "identity-first" security as we adopt cloud infrastructure?

3/15/2023Presentation

Advanced Terraform techniques

3/14/2023Article

5 best practices for secrets management