Presentation

Breaking Down the Monolith: The Networking Challenges

Published 7:00 AM UTC Jul 10, 2018

The industry-wide shift from on-premises to public and hybrid clouds, along with the transition from monoliths to service-oriented applications is breaking security and network management processes that were once simple. HashiCorp is bringing that simplicity back to distributed systems.

As the IT organizations move from the world of private data centers to the world of public clouds, their applications are becoming more distributed. This means more operational complexity. Networking application components together is a bigger challenge and securing that network is harder as well.

Armon Dadgar, a co-founder and CTO of HashiCorp, has seen these challenges first-hand from enterprise customers who are trying to build a DevOps culture and workflow that is more agile and self-servicing. The first thing most companies want to do is break apart their monolithic applications. This means separating application components into loosely-coupled services, which makes quick software changes and deployment easier. A service-oriented architecture or microservices architecture is a prime example of this distributed, loosely-coupled environment.

Unfortunately, this style of architecture breaks a lot of processes that were very simple in a monolith. If you just look at the networking and connectivity angle of the problem, you'll see challenges in:

Service discovery: Application components are no longer running on the same machine. How do they find and securely communicate with one another?
Configuration management: How do you propagate configuration changes to an increasingly larger number of services?
Network segmentation: Applications are no longer on a limited number of machines surrounded by a firewall with one-way access. How do you construct fine-grained service segmentation that keeps your application secure even in the cloud?

In this video, Dadgar breaks down these challenges in detail and explains HashiCorp's method for surmounting these hurdles—a product called Consul, which is a utility for connecting, configuring, and segmenting applications in a distributed system. Until recently, Consul only provided solutions for service discovery and configuration managment, but now, with the additional features of Consul Connect, Consul now addresses network segmentation as well, making it an elegant, high-performance service mesh solution.

Speakers

Armon DadgarCo-founder & CTO, HashiCorp

Transcript

Good morning, everyone. Thank you so much for joining us, and thank you, Nick, for the very warm introduction. I want to welcome all of you to HashiDays officially. I know Nick already said it, but we have an awesome lineup of talks. It's going to be a really, really exciting event, so we're glad to be able to do it. For many of you, I expect this is actually the first HashiCorp event you've been to, and I think maybe some of you were here for our actual first ever European event, which was not very far from here in Amsterdam two years ago. Fun little fact: the giant H is actually making a return from the first HashiConf event here. It got saved, and we wanted to bring it back for you.

Last year we made sort of a slight mistake. We took a detour from Amsterdam, and we missed out. We ended up in London last year, so we were very, very excited to be able to bring it back to Amsterdam this year and join all of you. What you'll see around you is many, many people, about a little over 400, in fact. These are your peers in industry, these are folks who are joining us from all around Europe, all around the world. It's practitioners who are here to learn about the state of the art, whether it's in DevOps, in cloud infrastructure, in how we work together and work as teams more efficiently. So I'd encourage you, don't just talk to the folks you already know. Use this as an opportunity to meet new folks, talk about new things, learn and share what you already know with people who maybe don't already know it.

The HashiCorp Diversity Scholarship

As part of that effort, one of the things we care about deeply at HashiCorp is, how do we make our communities more diverse? A program that we kicked off earlier this year, thanks to Mishra—if you find him anywhere, he is the mastermind behind this—is the HashiCorp Diversity Scholarship. What we felt is, one of our struggles is, how do we bring in more people from under-represented groups that don't usually find their way to technical conferences like the ones we put on? So what we wanted to do is find ways to bring more of these people to our conferences and be able to meet them and share what we know with them, and bring their perspectives into the community. So we're glad to have several people joining us from all over the world to participate in this event, thanks to the scholarship. If you know folks who are interested and would to join us for HashiConf in the US, we have a similar program for that one as well.

The market shift from static to dynamic software environments

In the spirit of sharing knowledge, I think something probably many of us in this room know is we're going through a very, very large transition right now in the market. We're pretty universally moving from a world of private data center, VMware traditional point-and-click administration to—how do we start to embrace public clouds, whether it's AWS, Azure, GCP, all of the above? There's this sort of transition taking place in the market as we try and both change where we're landing, moving from our private data centers into the cloud, and change the process by which we think about infrastructure, provision it, and move to more of an agile, self-service, DevOps model.

In our view, this changes a lot of things. As we go through this transition, it's not just a shift of one or two tools, it's not just a slight tweak of our process. Our view is, this is a pretty large change in the way we think about delivering applications, and it impacts many different groups. It impacts the way our operations teams think about provisioning infrastructure. It impacts the way our security teams think about securing our applications, our data, our underlying infrastructure. It changes the way we deploy our applications, how we package it, how we think about CI/CD, what are the run-time environments we use for it? And lastly, it changes the way we think about networking, both at the physical level as well as the application level.

In our view, there is a broader kind of meta-theme. It's this meta-theme around a transition from a world that was much, much more static to a world that's much, much more dynamic and ephemeral and elastic. And as we undergo this change, it starts to break a lot of things. So whether we're talking about infrastructure, where we used to have dedicated servers that were relatively homogeneous to—now it's very heterogeneous. We're running across multiple environments, and these things are coming and going all the time. We're not provisioning the VM and letting it live for months or years. We're provisioning the container and letting it live for hours or days. So a very different scale of infrastructure, a very different elasticity of infrastructure. So how do we change how we think about infrastructure? It can't just be the same point-and-click approach as we try and do at orders of magnitude more scale and orders of magnitude faster.

New challenges in security, development, and networking

As we think about the shift in security, we're moving from a world where we largely depended on the four walls wrapping our infrastructure. We had a notion of the network perimeter, and we pinned things on IPs. IPs gave us a sense of identity. We knew this IP is the web server for at least the next few years—to a world where we don't really have four walls. We're kind of an API call away at any time from any node, serving or receiving public-facing traffic. And at the same time, the IP is just this recyclable unit. The VM dies, a new VM comes up, gets the same IP. Containers are recycling it all the time. So how do we think about the change in security as we move to this perimeter-free, low-trust model and lose the identity we had at an IP level? So there's a lot of challenges there.

As we think about our development tier, we are going from maybe a handful of relatively monolithic frameworks—maybe our giant Spring framework under our C# framework—to now we want to support many, many new ones. There's a Cambrian explosion of interesting tools that have come out over the last few years, so whether it's our container platform, our Spark big data platform, event-driven Lambda architectures, there's a huge variety of new platforms we want to explore and leverage and make our developers sort of have a toolchain, so that as their application is more event-oriented, they can use that. Or if it's more big data-oriented, they can use something like Spark, and provide all these as part of the toolkit.

Lastly, there are the changes in networking land, which is, as we're making this shift, as we're shifting from this dedicated, relatively stable infrastructure where we had the notion of an IP, to now this very dynamic ephemeral infrastructure, how do we change our networking to keep up as well? We don't want to think in terms of IPs and manually updating our load balancers. Instead, we want to think in terms of fine-grained services, which might be a container, might be a lambda function, might be a VM. But these things are coming and going and scaling up and down. So in our view, this dynamic change is sort of the underpinning theme to all of these transitions.

What you'll see is, this has been our focus for a long time from a toolchain perspective, is how do we lean into this change and build a toolchain that was designed for it? Thinking about cloud as sort of our native operating environment, and really focusing on what should that experience be as we're moving to this world where we want infrastructure-as-code, we want microservices, we want cloud-oriented infrastructure?

Networking challenges outside the monolith

Today I want to spend a little bit more time talking about what's happening in networking land, and starting with really breaking down the monolith and what changes as we start to do so? When we talk about the monolith, what we usually mean is a single application that has many discrete subcomponents to it. An example I like to use is, suppose we're delivering a desktop banking application. It may have multiple subcomponents. A might be log-in. B might be view balance. C might be transfers. D might be foreign currency. So these are four discrete types of capability, different pages the user's interfacing with, different APIs, things like that. But we're delivering it as a single packaged application, a single monolithic server.

What's nice in this format is, when these systems need to interact with each other, system A needs to call system B, it's easy. We just mark a method in B as being public. We export it, and now A can just do an in-memory function call. No data is leaving the machine, we're just doing a quick function hop over, and then jumping back to A. So there's a lot of nice properties about how we compose these systems together.

But what about things that run outside of the monolith? Because not everything is going to be within the app. What we find is most of the things are really databases. We have most of our logic encapsulated within the application, and we're just talking to databases by and large, through static IPs, when we talk about monolithic apps. As we need to start scaling up these applications, what we typically did was not break out the sub-pieces, because we can't, they're being delivered as a single application. So instead we deliver more copies of the monolith and then split our traffic over multiple instances, using a load balancer approach.

To secure the overall system, we split up into three logical zones. Zone 1, we don't trust at all, is our demilitarized zone, traffic coming in from the internet. Then kind of the middle tier zone, the application zone, where the monolith runs itself. And finally our database, data zone, which is shielded from everything except for the applications.

So what changed? Why do we need to do anything different? What's happened in the meantime? I think the first big thing is we've stopped writing monolithic applications. Instead, over the last few years, there's been a shift away into microservice or service-oriented architectures, where the core crux is, how do we move away from packaging and delivering all of these subsystems as a single deliverable and deliver them as discrete logical services?

The advantage of this is, now if there's a bug in, let's say, our log-in portal, we don't now have to wait for B, C, and D to all be in shape to be able to cut a release and deliver the monolith as one unit. If there's a bug in A, we can patch A and just redeploy A without having to wait for development teams B, C, and D to be ready and build the compiles and redo all of the QA. So it gives us a lot more development agility. We're able to deliver these different discrete capabilities at whatever cadence makes sense, whether it's feature delivery, whether it's bug fixes, whether we're scaling up and down these individual pieces. We gain a lot more flexibility because we're not tightly coupling all of these different functions together.

Service discovery

Unfortunately, like most things in life, there's no free lunch, so what we are gaining with developer efficiency of being able to do and run A as a separate service, we're starting to lose in terms of operational efficiency. There are new challenges that we inherit as a result. The first one, the one that becomes obvious the fastest, is how do we do discovery? Historically, what we used to do is just say mark this as a public function, A can call it, and now it's just an in-memory function hop. Well now, it's not just "mark it as a public service," because it's not compiled into our application. It's not part of the app, it's not even running on the same machine. It's somewhere over the network. So as system A, how do we find, how do we discover system B to call it over our network?

Configuration management

Another challenge we inherit is, how do we configure our distributed application? As a monolith, all of its configuration lived in a single massive properties.xml file, but the advantage of this was it gave all of our application a consistent view. If we changed something into being maintenance mode—we want to do database maintenance or a schema change—we would change the config file, and all of the subsystems A, B, C, and D would believe we're in maintenance mode simultaneously. Versus now, we have a distributed configuration problem. We don't want A to believe we're in maintenance mode while B does not believe we're in maintenance mode. We might get inconsistent behavior at the application level. So how do we deal with the fact that now we have this distributed configuration and coordination problem?

Service segmentation

The last major problem we're inheriting is a security problem. In the traditional monolithic world, we had the three zones, but the challenge was, what we were doing with the three zones was basically segmenting our network. And when we talk about network segmentation, what we're doing is taking a single larger physical network and splitting it into smaller chunks, so segment A and B as part of a larger physical network. What this let us do is restrict the blast radius, so if there was a compromise in segment A, it wouldn't overflow into segment B. We could control the traffic at a coarse grain between these different segments. There were many different techniques for doing this, virtual LANs or VLAN, firewall-based approaches, software-defined network-based approaches. But overall, what these gave us was a relatively coarse-grained way of bucketing different aspects of our infrastructure together, and so each of these segments may have still had dozens or hundreds of discrete services as part of it.

Now, the challenge as we start talking in a microservice architecture is, where do we draw those dividing lines? We still have the line on the left, which goes to our demilitarized zone, and the line to our right, that goes to our data tier zone. But now our internal application zone has a much more complicated service-to-service flow. It's no longer one application that's talking internally via function calls, it's many discrete services talking over a network. So how do we start to draw lines in between it? With a simple example, we might look at these four services and say, well, you can still do the same thing. You can still cross-hash and put firewalls in between all of these things.

The problem is, this is meant for illustrative purposes. This is a simple example. As we start talking about real infrastructure, it's not four services, it's hundreds. In the case of some of our customers, it's thousands of applications with complicated service-to-service communication flows, where it's no longer obvious where these cut points are. It's not trivial to figure out where do I deploy firewalls and what should my network topology look like to constrain this traffic anymore?

The ideal network segmentation scenario

So how do we think about this problem? In some sense, what it starts with is saying, what's ideal? What would be kind of the perfect scenario? The perfect scenario, we'd be able to move away from a coarse-grained model, where there are hundreds of services as part of a virtual segment, to saying there's actually only the services—the fine-grained boundary only matches exactly two services. It's only the sender and receiver. That's what I've indicated with the orange boxes. What if you could draw your network segment that finely, where you said A can talk to B, and then you maybe have another fine-grained segment that says C and D can talk bi-directionally to each other?

In this arrangement, we bring down the segment to only those services that essentially need to communicate. But these things would be impossible—it's not easy to cleanly separate them. It might be the case that A actually still needs to talk to C and B still needs to talk to D. So, how do we avoid creating a huge zone that says, "A, B, C, and D can talk to each other freely," right? Instead, we want the ability to overlap these definitions. So, we may have an overlapping definition that says well, "A can also talk to C," even though we've already defined A can talk to B as well. But, because A can talk to C and B should not imply that C and B can talk to each other. This shouldn't be associativity of access.

So, what we'd like to do is maintain a fine grain of how these things actually communicate without resorting to creating the large segment (the large blast radius) of just saying all of these services can communicate cause it's too hard to find the dividing lines between them.

Solving service-oriented networking challenges with Consul

As we talk about the challenges of moving from monolithic architecture to microservice architecture, what we're doing is sort of talking about the trade-off between our developer efficiency and our operational challenge. What we gained was, now all these pieces can all be developed independently, deployed independently, scaled independently. But we've inherited three operational challenges. How do we have all of these pieces discover one another, how do we solve our distributed configuration challenge now that we no longer have one configuration file, and how do we segment our network such that we don't have an enormous blast radius?

The way we think about it is, these three capabilities together are really what a service mesh aims to solve. As we go to a microservice, or service-oriented pattern, what are the challenges and is there is a solution that thinks about them in a well-integrated way as opposed to a patchwork of different technologies we have to bring together.

As we look at how we've solved this problem over the last few years, for a large part we've looked at the first two. So for some folks who are less familiar with Consul, what we've done in the past is really through two different mechanisms, one is Consul has this notion of the service registry and the registry is a central catalog of all of the nodes in your infrastructure, all the services running, the current health status. And the goal is that this register captures everything that's running such that you can solve the discovery problem. As the service comes up, it can be programmatically inserted into the registry and then when any of your downstreams need a route to it, they can basically query the registry online. So instead of using a static IP address, that's maybe going to a load balancer, you can just talk to the registry and just say, "What are all of the downstream databases," or, "What are all of the downstream APIs?"

This has historically been integrated using a DNS interface. So, for most applications, there are no change to them. They just started querying for database.service.consul and behind the scenes, that's being translated by Consul into a lookup of the database. So it's able to let us sort of mask the location of services and deal with IPs changing and instead hardcode a service name and not an IP address.

The other challenge we've looked at for a long time was distributed configuration, and our view was, how do you put that into a central key-value store and then expose that with a series of APIs and the ability to block and receive changes in real time? So Consul's HTTP API allows you to trigger and notify any time a change is made. So now you can switch a flight that says, "We're going into maintenance mode." And all of your services can get that in real time as opposed to changing 50 configuration files and redeploying all of your services. So looking at how we solve the distributed configuration problem.

The question then remains for us, how do we solve segmentation? This has been an exercise left to reader. So today, we're very very excited to talk about a first-class solution for this problem, that we're calling, Consul Connect.

So now I'd like to welcome Mitchell Hashimoto onto the stage to talk to us about Consul Connect in more depth. Thank you so much.

In the next video, Mitchell Hashimoto, the other co-founder and CTO of HashiCorp, will introduce the new Connect capabilities of Consul.

Sign up for the latest HashiCorp news

More resources like this one

3/15/2023Case Study

Using Consul Dataplane on Kubernetes to implement service mesh at an Adfinis client

1/20/2023FAQ

Introduction to Zero Trust Security

1/4/2023Presentation

A New Architecture for Simplified Service Mesh Deployments in Consul

12/31/2022Presentation

Canary Deployments with Consul Service Mesh on K8s

View all resources