Case Study

Scaling security at Comcast Xfinity mobile with Vault and Spring Cloud Config

Watch Xfinity Mobile's case study and demo showcasing how they used HashiCorp Vault with Spring Boot and Spring Cloud Config.

As Xfinity Mobile scaled their application from a monolith platform to an 80+ microservice cloud-native platform running across five regions, they quickly realized that they had to scale not just their architecture but also their processes and practices.

From DevSecOps and code review to security and configuration management, they had to rethink how to adapt each step of their pipeline to this new scale. In the talk, Xfinity Platform Architect Pankesh Contractor shares challenges his team faced and the solutions they employed.

One of the main shifts was to externalize configuration and secrets as part of the 12-factor principles. With a live demo, Pankesh will show how they utilized HashiCorp Vault with Spring Boot and Spring Cloud Config.

Speakers

  • Pankesh Contractor
    Pankesh ContractorPlatform Architect - Xfinity Mobile, Comcast

Transcript

Hi everyone. Welcome. I hope you guys are having a wonderful time here. I've been enjoying this, my first time at HashiConf. The conference has been amazing. The talks have been great, and thanks to you guys for stopping by.

The law of unintended consequences

I'd like to start my talk by talking about this law—which I think I made up—about the law of unintended consequences. That means when you're dealing with architecture, and you're dealing with new things, stuff can go wrong. It's mostly the stuff you made decisions around. It's tricky because you—in your vision—you have a grand design that you want to take to fruition. What ends up happening is things look like this—and more often than not, you're one surge away from disaster. That's the reality.

How do you evolve your architecture and your business continuity practices—and new things you bring in too—without ending up in this situation? That's going to be a focus of my talk.

My name is Pankesh. I'm a platform architect at Xfinity mobile. We are a relatively new business unit within Xfinity. We launched two and a half years ago. My team owns the APIs for the entire platform. We do all the APIs ranging the buy flow experience—to activating the phone—if you get a new device and you want to activate the phone too. One of the services I love the most is a packaging service—it's almost a nanoservice, which tells the agent which box to ship the phone in.

I love to focus on how you make sure you are evolving the platform, not just from a functional point of view, but also all the nonfunctional key requirements. You need an advocate, and I'm the advocate on that team.

If you guys want to reach me, those are my contact details. If there's stuff that you want to talk about, just flag me or find me here. I'd love to talk about some of these things.

A standard disclaimer; I've used very standard APIs and open source libraries that are available to you. But take a look at what I'm showing and see how it fits into your work and your architecture—then adapt and use and see how things work out for you.

An evolution to microservices

I'd like to start this story with going back in time. This is where I was maybe 12-15 years back. My first application, when it was running, looked somewhat like that. It was very simple. It had a UI, and a bunch of application code put together—mostly on two silvers I think. We had a load balancer, and database server with a couple of firewalls. Life was easy.

But then we started getting awareness of data, and we started saying, "We need to separate out this logic, and we don't want our UI logic to be where our services are. We want to keep this database separate so we can secure it more." And then things started looking more like that.

Then we started getting awareness of domains. You had business owners who were saying, "My domain needs these special things, and I have special requirements around it." Things started getting more fragmented. We were still doing the same MVC. This is when Spring started coming into being—and people had more tooling to enable this sort of architecture.

Then came Agile. Most of you guys already know or have experienced that stuff. Although we are doing the Agile part, things are coming from a business with high velocity. People want things to be done in a given frame of time, and we want a predictable result outcome out of these services.

We started breaking these things up, moving teams into feature teams. This is what a lot of us in the industry call a modular monolith. You're still deploying this application within a container. You're still having this monolithic piece of application that gets deployed, but it gives you that velocity, the speed of deployment and releasing new features. But it doesn't give you is scalability.

This is not too long ago, I'm talking about 3-4 years back when everybody started saying, "We need to scale these up." Some of this transition was a result of enablers that we are here talking about. The Terraforms and the Consuls. All of these things that enabled application teams to build an architecture like this.

It was all around the scaling piece. It encapsulated the data, and you have different versions. This was a dream for an architect coming back from the days when you were doing monolith, and you had a brittle application. You needed to be completely predictable on what you release; this gives you the predictability. You could have one version of an API running and a completely different version of API running. You could move things in and out. Great.

Very soon, we could not scale our hardware infrastructure to scale with the microservice layer. We said, cloud is the answer, and it truly is. It is what you need to run your business in this day and age. The question that we have to ask ourselves is, “Was this complexity intended? Is it what we wanted? Was this a grand plan, or did we stumble upon this design, this architecture?”

Most of us will agree that, yes, this is intended. What is unintended out of this is some of the complexity that we get. My argument to that in here is complexity is fine as long as you can manage that complexity.

That's the rest of my talk. I want to walk you through what it is to manage that complexity. What you want to do in your architecture so that you don't end up with some accidental complexity.

Managing complexity

My team started looking at this. Some of you guys might've heard this or even used this. It's called the 12-factor principles when dealing with microservices. A lot of this talk is about, "What do you do with your codebase? What do you do with your dependencies? What do you do with services from a scalability point of view? And what do you do with your logs and all?”

Some of you guys might already be doing this. Even if you are not following this down to the tee, you might have come across or started implementing some of these things as part of your work.

Points one and three became important to us—and I want to talk about this right now. It talks about your codebase. The idea being you want to have a single codebase that can now be run in multiple environments. Along with that comes a single set of configuration. You want to manage your config, you want to have it stored outside your code in the environment, and you want to move it along with your code—so that you can predictably release stuff. I know there were a couple of people who gave a talk around drifts in your configuration and drifting your code and how you manage that across your dev lifecycle.

Configuration as code

We have I think roughly about close to 95-98 services right now. That number keeps changing—I think I have to put the asterisk there. And across six environments—deploying them, managing them, making sure it's consistent—and doing it in a six-week release cycle is tremendously hard if you do not follow a very disciplined approach.

With regards to config, there were a whole lot of conflicts that we want to manage. We are a Spring Boot Microservice shop. Spring has a good library called Spring Cloud Config. It lets you manage configuration as code. Essentially, you move all your code into a Git repository, and as you migrate your application code, you version your configuration as well. You manage them together and move them both along together.

One of the things that naturally happens as part of configuration is you end up with a lot of secrets in your configuration—your DB passwords, your API keys, your service credentials, etc.

Unfortunately for us, that started showing up in Git as well. I don't know about you guys, but I don't want to be in a position where I have to support that or stand against that design patten in a security audit. That's not a win/win situation at all. You're going to lose that argument all day long.

We started talking about, "What are we going to do with this situation that we are in now?" And somebody said, "Why don't you encrypt this?" Great, yes, we could encrypt it, but encryption is never a solution to the problem because you always end up with a chicken and egg problem.

Integrating Vault and Spring

So, where am I going to put my encryption key? Where is that key zero? What do I do with it and how do I make sure it's secure—and it moves along the environment as my code is moving along. We needed a design patten or something to help us—and we stumbled upon HashiCorp Vault.

An enterprise team within Comcast was standing up a Security as a Service platform. I started looking at that, and it seemed like a great fit for us because it had good seamless integration into the Spring ecosystem. Very clean REST APIs.

Preeti was here in the morning talking about what a good application is and what a good supportable library needs to be like. That's what it was. It had clean APIs, was light, flexible. I didn't have to tightly coupled into my architecture. I could loosely couple it and still use it the way I wanted it. It supported all the authentication mechanisms that we wanted to use.

We were not locked into using it. It was not very opinionated, which is good in some cases and absolutely in this case it was. Then I had a team that was standing on this one platform. I'm on the application platform team. I am very off standing up yet another platform that I need to run and support, which is not my core business capability. We had a team that was standing this up, so it almost fell in our laps.

Vault with Spring Cloud Config

This is what it looked like. We had microservices, we had the Cloud Config Server, we would move our configurations into Git and move all our secrets into Vault. The config server will log into Vault, and it would be seamless.

Let's do a quick demo. The first thing we said was, "Let's see what this structure looks like." I have Docker running Vault in a very basic configuration. Don't go by some of the stuff that you'll see here—but the idea still holds true. We published all of our secrets, so what you see here is the name of the microservice. In that app, I'll create microservice-based path variables and put secrets there. You see here I have a HashiConf username and password put in my configuration.

Spring Cloud also provides a Spring Config Server. If you guys are familiar with it, I'll show you what the configuration looks like. Before we started integrating Vault, we only had the Git as a source for all of our configs. It was very simple for me. With the deep integration of Spring into Vault, I only needed to provide this block here. I needed to give the host port the key scheme. Then the interesting thing is I could give this order—which is a priority. I could say, "If there's a config available in Vault and in GitHub, take Vault as a priority," and boom. That stood up my config server. Let’s see what that looks like if I start this up.

This is running on Vault 8762, and this is me trying to get the configs. Now I've gotten one step. I was able to show you that the configs are available from Vault. I was able to pull those configs from my config server as well. Great. I made one step. I saw that jump between the config server, and now that stuck into Vault. I had to pass the Vault token, but that's fine. I have one config server running, and I need to configure that with the Vault token, great.

Addressing Vault dependency and awareness within our architecture

Now let's see what it took me to get my services also connected to this config server. I realized I have to start my services as soon as my config server is stuck into Vault. All of the clients that talk to the config server now also need to be Vault-aware. I had to now pass these Vault tokens as part of starting up my microservices.

Now, this is where we started getting a little uncomfortable. The dependency and the awareness of Vault was trickling and percolating through other aspects of my architecture. Not just the config server needed to know this, but all of the microservices that we had needed to know this.

That also meant that the deployment architecture—or whatever the deployment methodology is that is starting my services—now also needed to be aware of Vault; get that Vault token and pass it to the servers on start up.

This is where we were at, and we started questioning this. But let me show you what that looks like. At least the fact that it works the way it's supposed to work—even though we had some of these concerns.

The service is running on port 8082, and if I were to call this, I have my services. What does the code look like behind this? It looks somewhat like that. In the true nature of Spring—Spring is very convention-over-configuration. If you follow the convention—to annotate this class with the configuration—I had to give it the prefix. If you guys remember the other screen I showed you—it was prefixed with HashiConf and then my username password.

As soon as I do this, that's all I need to do. Automatically the configs will come in and based on my preference—whether they are coming from Git or Vault, my configs flow through. This part was fabulous, but the other pieces were not so much.

Evaluating our approach—Part 1

What did you like about this? I loved the fact that this was minimal change to my existing code and configuration. I didn't have to do much. All I had to do is configure Vault and move all the secrets over there. Then I could control the rollout per microservice.

I didn't have to go through a big-bang. I could say, "Here is the set of services. Here is a set of environments. We can roll this out," and that looked great. We didn't like the fact that I now needed to pass this token at startup—and now my configs were in two places.

People might say, "You're not talking about configs now; you're talking about secrets." But that's beside the point. The point is for my services to start up, I need to make sure that both of these things move in parallel. I don't want to keep track of one without being aware of the other.

We didn't think that was a great thing. I could not do an atomic comment and move all of these things together, so that's something we were not too jazzed about. We also found out this library didn't support all of the authentication mechanism. We wanted to use AppRole, and it didn't have native support for any of this stuff. We would have to had to move all of this into our deployment architecture, which is not the most ideal solution.

We were like, "Let's get back to the drawing board and see what we would like to do with this, or what is it that we want to do with this?" We always want to integrate tightly, but you want to couple loosely. You don't want to have tight coupling with other aspects of your architecture.

Taking a different approach

“How do we get to that?" is the question that we wanted to ask ourselves? We said, "Let's do this. The secrets that we have in Vault—what if we go back to what we were saying and encrypt them and put them in GitHub?”

“Let's take that for our spin. Let's see how far we can go with that approach. Let me disconnect my config server from Vault.”

Now my config server didn't need to know about Vault. I'm like, "That's one less coupling that I need to worry about in my architecture." Then what do I need to do? I'm like, "I have to log into Vault anyways. What if I just use the master encryption key? Instead of storing all the configs, I'll just store the encryption key.” That solves the issue that I had with, "What do I do with encryption keys?"

That's a secure place to put my encryption key. I get all the auditing around it. I get all the access control that Vault provides me—all the goodness from that. So I'm like, "That seems like a good approach." We changed our dependency from Spring Cloud Config to this other library called Spring Vault Core.

Utilizing Spring VaultTemplate

What does it provide you? If you guys have used any of the Spring Frameworks or Spring Libraries, it provides a very similar approach. Like with RestTemplate and JdbcTemplate, it provides a VaultTemplate. It makes reading, writing, modifying, deleting API or keys from Vault very simple. It manages all your tokens. Once you establish the authentication template, you don't have to manage the token or the lifecycle of the token there. Then it also supports all of the various authentication mechanisms that are part of Vault—it's pretty robust.

That's what it looks like. To initialize the template, you have to give an endpoint and give the client authentication. The endpoint is very simple. You construct it based on the URI or the Path. The client authentication is a little more involved, but just 10 lines of code or even less. You only have to figure out what your authentication is going to be. We were using AppRole, and you just need to pass on the initial token and the auth path to Vault.

Give or take within a few lines of code, we were able to integrate with Vault, get the master key, and seamlessly put this in our codebase. Then once you have the VaultTemplate initialized, you can read any path just using that.

Behind the scenes, this still goes through all of the ACLs. All the rules that you've put in Vault—and who has access to what. In our case, we were using GitHub as our base authentications so groups within GitHub had access to certain paths in Vault. This will still honor all of that. I didn't have to worry about that. Externally, the Vault team was taking care of all of that for us. But this gave me a very simplistic API to reach into Vault and pull all that I wanted.

Let's take a look at what that looks like. Coming back to a very similar approach— instead of storing the actual username password, I'm just storing the master key. If you see in the path here, I am still storing it by microservice, but I've also said I want to make it environment-specific. We wanted to have an environment-specific master key. I didn't want to use a single key for all my environments. This gives me that capability.

Let's take a look at what happened to my config server now. Technically it should be Git, but because I needed to run two MO's together, I'm pointing to a local file system for configs. But this would be Git. As the picture showed, my config server is now only pointing to Git to get all of the configurations.

This is what my config looks like, and I'll talk through some of this stuff as well. You'll notice here I have a password here in cleartext. What do I do? That's definitely not what I want going in my Git.

Creating a developer-friendly service

We realized as part of my microservice starting up, I needed to provide libraries that are easy to use for the developers. We provided a library that lets the developer do the encryption. As a developer—if I'm integrating with a new API or integrating with a new service, I have the service credentials—I want to store them and make sure that it's in my environment along with my code. I would take the credentials, and I would call this endpoint. It's an encryption endpoint. I would give it the key that I need to encrypt, and I would give it the environment.

Behind the scene, this library is going to go up to Vault, go get the master key, encrypt this stuff, and return the encrypted value back to the developer. All the developer had to do is take this and stick it in the configuration. I would replace this with that.

But I'm not done yet because I have done the encryption part, but I still need a way at runtime to be able to decrypt it. What does that look like? If you guys saw a similar code that I had shown you before—I'm going to replace this with the library that we created. It's called KeyUtil. This is what it looks like.

Let's restart this service and see if it's taken effect before the start up. The idea here would be that this service is now using config server as a backend, which is going to Git; it's going to go get that configuration. It sees those prefixes in their VLT, which knows that it's an encrypted key—it knows it has to go talk to Vault. It's already initialized Vault; go gets the master key and decrypts this. Let's see what that looks like. Great. Let's go back and talk through what we do as a result of this.

We gave the developers a starter—like in Spring, there's the idea of starter dependency. We give them a “vault starter”. They only had to include that starter in their dependency, and all of the stuff automatically gets wired in.

It wires in the endpoint we gave them. It encapsulated away all the dependencies, “How do you get the master key, and what do you do with that?” They didn't have to worry about how to get it and how to retrieve it. We give them a Java API that decrypts the stuff for them. You saw that. I used that in my code. Just a util library that gets auto-wired in. They don't have to worry about how the auto-wiring is happening or where the details of that were. They could auto-wire that automatically.

We get them a REST API to encrypt this stuff. All of this stuff was done without them having any awareness of where the keys were or where the keys were coming from.

Evaluating the approach—Part 2

What are the benefits we gain? I think if you compare with the previous solution I had my secrets and my configs all in one place in GitHub—and they were secured in GitHub. I could do an atomic comment, I could test this in different environments. I could validate it all the way through my build pipelines, and I knew that when I was pushing a version of my code, its corresponding version of config was also moving in. I didn't have to orchestrate that with the Vault team and make sure that they are also pushing the configs or making it live at the same time.

I got atomic capability, and I got one of the 12 factors—I can move code and config together. I got that. The dependency to Vault moved to just one key-value pair per environment—we overlook this stuff, but this is important. I've made sure that when my application is scaling up and when I have a burst in traffic or when I have something going on where I'm getting a whole lot of requests, I'm not overloading the Vault platform as well. My problem doesn't become their problem. If I'm on a bridge for some reason, I don't also need to involve the Vault team.

Secondly, if I'm scaling up, I'm adding a whole new set of services. When are preparing for an event, I don't need to have the corresponding peers on the Vault team also be ready for that event. It's a non-event for them. An event for me should not trickle down to the rest of the architecture on our side. That gave us the nice separation between these two.

This continued to be a no big-bang approach. I could roll it per microservice without having to go through it as a big-bang approach. Things felt good, things felt right.

A few unforeseen benefits

There were some unforeseen benefits. Now, this stuff happens when things go wrong. Sometimes things can go right if you do things the right way—or at least approach things and are meticulous and decisive about what you do.

We realized that our dependency for Vault now moved to bootstrap. My library was reaching out and getting the encryption key at startup. After my services were up and running—if for some reason I lost connectivity to Vault, or there was a network partition or some firewall issues, and I couldn't reach out to Vault—I didn't care. My services would be up and running. The only time I would worry about it is if the service was scaling up or if new services was starting up. That was a good side effect.

I decoupled my application as I was mentioning before, so they didn't have to know about my scaling, and those traffic scenarios and all of the keys were completely secure. A developer could use it, but they didn't need to know where it came from or what the details of that were.

That's what the approach looked like. Completely clean from our point of view.

Key takeaways

What do I want to talk about as key takeaways from this? From my experience, what do I want to share?

If you guys are Spring shop or you have dev teams that are working with Spring, Spring offers some good libraries. Three mostly production-grade libraries with an awesome open source community behind it and some great committers. If you want to do direct low end, high level integration to Vault—which is what we ended up using in the end; if you guys are not a microservice application but you are still deployed in the cloud, you can use the Spring Vault as a library.

Lastly, if you're a team that has a few handful microservices, not a whole lot and it's relatively simple, and you want to use Spring Cloud Config—that's another great library to continue and utilize as well.

We found out security doesn't have to be overweight or heavyweight or cumbersome to use. You can aim for simplicity, and with some of the toolings that HashiCorp has given us, that's absolutely possible.

You can make it simple to use. If you keep it simple for your teams, your developers, your test automation folks, your DevOps folks. If it's simple, and you give them the APIs—they're not going to try to figure out a way around your architecture protocols—your security protocols. They'll follow that because you've designed for usability for them as well.

Integrate tightly but do not couple tightly. Couple loosely so that you can move around without having to drag everybody else along with you. Security is a key feature. Design for it like it's scalability—security is also important. Make sure you're very deliberate about your architecture decisions, so you don't end up with an accidental complexity. Sometimes is we go and read an article or a book, and we implement it the way we see, but it might not be a good fit for you.

Try some libraries, test them out, tread through that process and if it doesn't feel right, just ask yourself and follow a different approach. It's completely fine as long as it fits your architectural need and scales along the way you want it to.

That's my talk, thanks, everyone. If you have questions, reach out to me or find me here. Thanks.

More resources like this one

  • 1/6/2021
  • Case Study

Self-service discovery at scale with Consul at Bloomberg

  • 1/5/2021
  • Case Study

How Roblox Developed and Uses the Windows IIS Nomad Driver

  • 12/17/2020
  • Case Study

Consistent development and deployment at Comcast with Terraform

  • 9/2/2020
  • Case Study

Service Mesh in the Real World