In this session, learn about an architecture that uses Amazon EKS, AWS Wavelength, HashiCorp Consul, and the Verizon 5G network.
There’s been a lot of talk about "the edge" — but what does that really mean in the context of your architecture? How can application teams build highly distributed Kubernetes applications with nodes separated by thousands of miles? In this session, learn about an architecture that uses Amazon EKS, AWS Wavelength, HashiCorp Consul and the Verizon 5G network. We will also explore the ways in which Consul can support east-west traffic patterns within mobile edge environments, and we'll see why Consul is well suited for edge-aware routing.
Hey, everyone. Hope you had a great afternoon. Welcome to day two, afternoon two of HashiConf. It's great to be here. My name is Robbie Belson. I'm a developer advocate at AWS, and I pretty much spend every waking moment thinking about our edge computing portfolio — thinking about how we can serve the needs of the developer community. I'm joined here by none other than Jay Cee Straley from the Verizon team. You want to introduce yourself briefly?
Jay Cee Straley:
I'm Jay Cee Straley. I'm based here in LA, work in the Playa Vista Lab, and I work on technology and product development.
Today, we're going to talk about 5G and edge computing. Raise your hand if you have a 5G phone. Anyone? Great. A lot of hands up here. Today we want to show you how you can use that 5G device and connect to new cloud computing endpoints that are topologically closer to you.
We're going to learn more about some of the architectural challenges that are introduced by these new environments and how HashiCorp Consul can natively solve these challenges in a repeatable, automated, and flexible way.
To start things off, I think we have to define our terms here because the edge can mean a number of things, right, Jay Cee?
Jay Cee Straley:
It can mean a device, radios, CDN, the cloud itself. When you talk about edge, what does it mean to you?
Jay Cee Straley:
Today, we're going to talk about the carrier edge — at the edge of the 5G and 4G network. Let me go right here. We're going to talk specifically about Verizon 5G edge with AWS Wavelength.
Wavelength Zone extends AWS infrastructure out to the region — to the 5G edge. Within the region, Wavelength Zones cover major metropolitan areas, including in the United States its 19 locations. It's in Atlanta, Dallas, and it's all across the United States.
With this, you can build a VPC in a region and extend it out to the Wavelength Zones in that region. The infrastructure is monitored and managed from that region. Today we're showing you it has the same pace of innovation, and it's geographically distributed.
The coolest thing for me to talk about is that this AWS infrastructure you're all used to using is extended into our actual metropolitan aggregation points — it's actually in our 5G datacenters. So, when you connect to an app that's on a Wavelength Zone, you're connecting directly — not any internet hops — it's right there.
You can access all your services in the same single pane of management. You have operational consistency. The same way that things are upgraded, it's patches.
One key thing is when you go into the console — once you opt into Wavelength — you don't have to go into any special section. Everything's there — EC2, storage, compute — it's all right there. The other cool thing is when you build a VPC, it's a failover zone. If something fails in the Wavelength Zone, it goes right back to the region.
That's all the preliminary. Next I want to talk to you about the cool thing that I got to build. I'm here in LA, and this is something I built on the Wavelength Zone. As you know, in LA, there has been a great amount of innovation in media production workflows driven by studios and their movie labs joint venture.
Consistent with their vision, we challenged ourselves to bring production workflows onto Wavelength, starting with one of the most challenging in terms of compute requirements, latency, security, and reliability — and that is video editing.
Video editing is notoriously compute-intensive and sensitive to latency. Usually, an editor furiously works on his or her keyboard to observe, maneuver, and edit the video in its timeline. At any given time, the virtual or real machine transcodes it and renders it onto a high screen resolution to make sure the audio and video tracks are in sync. This render is then streamed to the client device, where edit tasks are performed.
Considering that the editor is usually using visual cues to make his edits, it's very important that he gets the video and audio to closely reflect their fast keystrokes. One thing I learned in my work — I came in from a technology point — is the importance of how much editors rely not just on visual cues but sound cues. And they need those to arrive quickly and reliably.
You can see up here — this is how much latency we were working against. The constraint was about 20 milliseconds. When you think about 24 frames per second, it's about 41 milliseconds ‘til each frame shows up.
We realized that — when we were working on cloud — it didn't quite meet that latency budget. The user experience became painfully slow and jittery, with plenty of skipped frames and an extreme mismatch in audio and video sync.
With the deployment on Verizon 5G edge with AWS Wavelength, editors worked in mobile locations and had a great video editing experience — no skipped frames, and no lost packets. To be honest, we were surprised when that first happened. We had editors come in and use it, and it worked great. The subjective experience of the editors pretty much matched the experience of an on-premises edit session.
We built this awesome environment on edge editing, but then we thought, we built something very specific to editing. It worked because we know that space, and it was rewarding to see the editors love it. But then we thought, that's a very specific use case, and we wondered how can we take what we learned and build for more general use cases.
For our next step, we built what we called our workflow orchestrator. That's what I'm showing you up here. You can see the familiar AWS. Then down here on the left, you'll see what we were building. We started with an editing environment based on our previous work. But this same tool can be used for all kinds of other use cases — including retail, automotive, healthcare, first responder environments, to name a few.
It starts with essentially an engineer — and you guys are all familiar with this (we did this in Terraform for our first flesh) — building a template. Then somebody like a technical director who's a manager — any time they want to start up a project — they could just bring up that template, configure the project, put in the configuration for who's going to be allowed access to it. Then put in exactly what's going to be in that environment, and press a button. And within the time of APIs on AWS — which is very speedy — in a short time, they had an environment.
Especially during COVID, instead of having to pack up a machine and ship it out to an editor — or getting licensing and waiting, all that, what might take a week or two — they could press a button and have an editing environment set up within 15 minutes. And, like I said, we're looking at doing this in other areas as well.
We did get it to work, and we thought this is great, game over. But then we realized, wow, this is for one Wavelength Zone. And, as I talked to you about, we have 19 Wavelength Zones, and we're getting even more. So, how can we scale this to work across multiple Wavelength Zones? That's when we started talking to Robbie.
That's exactly right. This is such an incredible use case — the ability to bring video production to the edge in a way that reduces latency but ultimately makes the user experience better. I'd love to produce movies, but I don't think I'm cut out for that.
So, Jay Cee, what are some of the other use cases and verticals that you're seeing adopting AWS Wavelength?
Jay Cee Straley:
We have C-2VX. We're doing it with automotive to make it more safe. We do it in retail with cashierless checkout. We're seeing it across venues in live broadcasting. We've just seen so many different use cases. And IoT, of course, is one of our favorites.
That's exactly right. To bring that point home, you can think about AWS Wavelength as delivering two flavors of applications. First, with your consumer smartphones, those on a Verizon connection in the United States can absolutely leverage AWS Wavelength for lower latency need in entertainment experiences, as an example.
But it's also for B2B workloads. You have a fleet of IoT devices or really any connected mobility use case, you can absolutely use ASW Wavelength as well. And, the beauty now is it's not connecting to a single gateway. You're not designing applications for low latency in a single location — you can do so, in a geo-distributed fashion —build that same experience in a number of different locations, certainly across the United States, and for Wavelength more broadly, across the globe. But buckle up folks. It's time to talk architecture.
I want to walk through a little bit about how ASW Wavelength works. We want to go back to that previous slide to showcase the VPC constructs.
Seeing a lot of stuff here, so let's break it down. Start with your device right here, a Verizon-connected 4G or 5G device. We talk a lot about 5G edge computing, but it also works with a 4G device no problem.
I'm going to break down everything you need to know about mobile networks in 120 seconds. Got a device? That device connects to an eNodeB, which is essentially a radio in the 4G world, or a gNodeB, which is a 5G radio. All of those radios in a given area converge or connect to a single carrier datacenter — or aggregation point. That's the carrier datacenter that you see here on the slide.
What happens in that carrier datacenter? Well, all sorts of stuff. It's often referred to as the packet core or radio access network. Point being it's how mobility is ultimately architected. All of the functionality you need to build a mobile network — most of it sits in this packet core. But there's one specific area or network function that's really important to take away. It's called the packet gateway in a 4G world.
That's where your mobile traffic anchors to an IP address so you can connect to wherever you're going — maybe your sports application, the news, your next version of AR, Pokémon GO, whatever it may end up being.
Right next to that packet gateway is the AWS compute and storage — and that's what you're seeing here. Within the carrier datacenter, the Wavelength Zone is, itself — within that geography, within that carrier datacenter — but exposed to you as a developer as just another availability zone.
Now we know what a Wavelength Zone is. It behaves much like an Availability Zone but with a few important distinctions that I want to highlight here.
First, you need to know it's geographically distinct. But if it behaves like another Availability Zone, you're going to ask me, “Wait a minute, Robbie. How do all these AZs connect?” You've got Availability Zones in the parent region — in this example here, I have three. Each of these Availability Zones doesn't share a single point of failure. Of course, the Wavelength Zone can be seen as a separate failure domain. But how do they connect?
That's where something called the service link is so important. The service link, unbeknownst to you — something you don't have to manage as the developer — seamlessly provides that redundant connectivity back to the region. We often call this the parent region. Every Wavelength Zone has a parent region. In the United States, there are two regions that leverage AWS Wavelength Zones. You have US East 1 and US West 2 – US East 1 in northern Virginia, US West 2 in Oregon.
But make no mistake, the parent region has very little to do with the underlying geography of the Wavelength Zone. What if I told you that, as part of US East 1, you could have Wavelength Zones in Boston and Miami?
So, now you've built out these hub-and-spoke architectures with Wavelength Zones potentially separated by a thousand miles. That's crazy. That's new. That's a different networking architecture.
Two other things I want to call out here on this slide. Something called the CGW or the carrier gateway. It behaves similarly to an internet gateway, but the NAT-ting that's happening is between private IPs within the VPC so that they can intercommunicate — and carrier IPs, which look and feel a lot like public IPs. But the underlying pool of addresses are being exposed, in this case, right through the Verizon carrier network.
This is a great example of our partners working together with us to expose otherwise really complicated parts of the 5G network — but, so you, as the developer, don't have to think or care how they're ultimately and underlyingly built. That's ASW Wavelength for you, folks.
But now we’ve got to get to Consul. How does Consul fit in? We talked about edge awareness, geo-distribution. Give me an example of when we'd use that. Jay Cee was talking about these video production use cases where it was great that it'd work in LA, but what if I want to do it in San Francisco and Seattle and Denver and Vegas all at once?
One way you could do that is via EKS or Kubernetes managed by AWS. One way this could look — you could launch your control plane in the region and specify a series of subnets. If you tried to specify a subnet corresponding to a Wavelength Zone, that configuration is not supported. So, very similar EKS configurations to what you would probably use today.
If you wanted to have a node group or set of worker nodes in a Wavelength Zone — Fargate isn't supported, managed nodes are not supported. You use self-managed nodes, which you can think about as an auto-scaling group with a specific AMI optimized for that version of EKS — where the kubelet says I'm going to talk to the Kubernetes control plane with this API server endpoint, and this CA, and you're good to go.
With that being the architecture, you think you're done. Simple. You just deploy your services and deploys and you're good to go? Well, let me give you the following scenario. High level, illustrative, yet incredibly important.
It's going to start with a three-tier web app, but let's simplify it even further — two-tier web app. You've got a web service and an app service. That web service, right now — assume it's Wavelength Zone 3 — wants to talk to the app service. Typically, if you were to schedule a deployment, the underlying pods could be scheduled in one or many Availability Zones.
That's a full mesh networking topology. By that, I mean US East 1A could communicate with US East 1B, which can communicate with US East 1C. That's no problem, they're all right next to each other. But remember what we talked about with ASW Wavelength — it's hub and spoke. You've got to get back to the region via the service link.
So, if I wanted to go from Wavelength Zone 1 to Wavelength Zone 2, I can't from within the VPC. You see these red and green lines on the slide? What I'm trying to illustrate here is that web service — if it was trying to talk to app — Kubernetes DNS may not know better where to send that traffic.
It might get lucky and send you to the app service, conveniently located in Wavelength Zone 3. But it could guess wrong. It could try to send you to Wavelength Zone 2 or Wavelength Zone 1. The impact of that, of course, your application hangs because Kubernetes DNS doesn't know any better.
It could also try to send you back to the region, which would work, but there's a latency penalty. So, when I talk about topology-aware routing, wouldn't it be nice if DNS just knew that when I was saying go to
app.default, just go to the app in Wavelength Zone 3. And if, for some reason it's not available, then go back to the region.
So, that tiered routing approach that's intentionally edge-aware — that was the north star. But I don't want to build a two-tier web app. That's no fun. I wanted something that could bring this to life.
We built HashiCups on ASW Wavelength. I thought that'd be fun. We're going to 5G-ify the HashiCups application.
How do we do it? There we go. We didn't use all of the microservices just to simplify the deployment here. We chose four of the key ones that we believed were illustrative of the primary routing decisions that you — as an application developer in an edge computing environment — may have to make.
You have a frontend service. Then you have a public API service, a proxy, a product API service, and a Postgres database. So, the connection flows I've tried to illustrate here — you go from mobile device, to frontend, to public API, to product API, to Postgres. If you want to take note of this, we'll recap. But that flow is going to be important in terms of how we configure the routing here.
What did we do? We wanted to create this seamless, repeatable way to make all of these routing decisions easy. Just to highlight the configuration, what you're seeing here, in the simplest form, you'll see that Availability Zone 2 I didn't deploy a carbon copy of the services. I could have, but for simplicity, think of that second Availability Zone as another subnet hosting the EKS control plane and really nothing else. It happens to also be running a copy of core DNS in the kube-system namespace for Kubernetes DNS reasons.
But for now, let's focus on three of those AZs. You have an Availability Zone in the parent region. So, if I'm not on the Verizon mobile network, which could be as simple as — I'm a Verizon device, but I've switched to WiFi — you still want to be able to access the service. Think of that as being the one-stop solution for some of these edge cases. No pun intended. Then Wavelength Zone 1 and Wavelength Zone 2 give you that geo-distribution. Could you extend it to five, six, seven Wavelength Zones? Absolutely. But for illustrative purposes, let's assume I deployed each of those three services to those two Wavelength Zones.
You might ask me, “Robbie, where's Postgres? Why is it only in the region?” Well, I'm going to throw another curve ball at you. What if you don't want to have every copy of every microservice in every single AZ? That's not realistic. So, to showcase the complexity that we believe is representative of an application, what if Postgres were to only exist in the region? Here's the question we tried to solve.
Jump to the next slide: You're going to see some question marks. The question mark means this is a potential path your application traffic could try to take. My mobile device connects to the frontend service. The frontend service connects to the public API. Now, the public API service wants to eventually get the catalog information for all of those HashiCups — and HashiCoffees, I guess — that I could order. It could try to guess and send you to Wavelength Zone 2, which we discussed is blocked: You don't want that to happen. In fact, you want a deterministic way to make sure that that routing path is never invoked.
You could send traffic to the product API in the region. But you'd only want to so under the scenario that something happened in the Wavelength Zone, and it's no longer available — as a failover or next best alternative. How do you care for all these question marks?
Step one and a very important step one, or step zero, are Kubernetes namespaces. I want to show you the limitations of what you could do in the absence of Consul. I think that's important to highlight. The reality being, most of the time, in fairly basic scenarios, you'd probably be OK —you notice the hesitancy that I use here.
This a very important foundational step to leveraging Consul, and here's why: When you think about DNS and Kubernetes, namespaces are a great way to provide logical isolation. So, instead of what we were asking before, where's
app.default? A frontend’s just saying, where is the public API? Kubernetes DNS will assume you're referring to the namespace in question unless you specify otherwise.
So, you're saying, “All right, Robbie, we're done. Frontend can route to public API. Public API can route to product API.” But remember, Postgres is in a different namespace. We haven't solved the namespace problem because DNS is going to be blissfully unaware — if you only provide just Postgres as essentially the service to route to — it's not going to know where it is.
You might be saying, “Wait a minute: Developers could hardcode this.” But think about how unwieldy that's going to get as the number of edges continues to grow ever larger. You're not going to want to hardcode it. You're going to want to decouple application developers building your app and the cluster operators who maintain the infrastructure.
So, this is how we get to step one: Consul service mesh. I often define service mesh as essentially that infrastructure layer that helps facilitate or secure service-to-service communication.
The way Consul does that is you have that control plane consisting of Consul server — and I'm running that in the region. That's really important — running Consul server in the region. Because with the hub-and-spoke architecture, if it were scheduled to one of the Wavelength Zones, now the control plane can't communicate to all of its worker nodes.
Of course, that would become a problem because the catalog wouldn't be up to date. So, server is explicitly scheduled to the region. Then you have Consul client running as a DaemonSet, so every node can have access to Consul client, that proxy can be injected, etc.
This is a good start: We've laid out the foundation for how we're going to use Consul. And, out of the box, we're now going to have services that can register to the Consul catalog. And, using a feature called transparent proxy — if you were to do absolutely nothing — the application would behave just as described before with namespaces. Still, you have Consul running — traffic within the namespace will be routed to the namespace. The way that routing would happen, of course, would be through the proxy — that’s essentially routed through the sidecar of each pod. That’s a good start.
But now we’ve got to get to Postgres. How do we do it? Well, we've got a handy dandy solution. There are a lot of different features of Consul that we find really valuable. But one feature I want to turn your attention to is the idea of service resolvers that are implemented as a CRD in Kubernetes.
That is essentially a custom object where I say, “Wait a minute, Kubernetes. I know you're looking for Postgres, and I know you can't find it because it's not in that namespace.” But it's OK. I'll tell you. Always look in the region namespace. Or I think I named it “demo app region” in my application.
That's how you get that arrow, that guaranteed deterministic routing pattern. Because otherwise, anything could happen. To show you what that manifest looks like — probably thinking it's hundreds of lines of code. Nope. Eight. Let's jump to the next slide here.
Yep. That's it, folks. Specify the service, specify the namespace, and you can recycle this manifest for any number of Wavelength Zones or namespaces you choose to leverage. So, when I say it's easily automated and repeatable, this same manifest could be used across a number of namespaces. Because no matter what Wavelength Zone — which is itself a namespace as we've defined the pattern — you can always find Postgres because it'll always be scheduled to the region. When we say it's extensible, it's repeatable, that's why — eight lines of code that can be used over and over completely independent of the application logic your developers are building.
That's a good step two of three. We still haven't cared for the scenario; we jump to the next slide of failover. You'll say, wait a minute, Robbie. Where did the product API go in Wavelength Zone 1?
Well, imagine the scenario. These are highly ephemeral 5G environments. Anything could happen. Maybe it's a manual mis-config. Or maybe just something happened. Assume for a moment the product API is down. Don't panic. We’ve still got to get to our HashiCups‚ — I still need my coffee.
We don't want the app to go down. We have a solution. We just need to failover to that namespace in the region. Again, we don't want to failover to another Wavelength Zone, we want to failover to the region. The way that we do that is also a service resolver, — but instead of the redirect pattern, we want to use failover.
You'll see we have a wildcard there saying, if our current application can't find product API — the service — send all traffic to product API in the namespace that’s designated demo app region. So, once again, a deterministic way to create failover in these edge environments, which we think is really powerful.
To recap: Same service resolver or same object, same CRD, two different use cases. But when brought together, we believe you now have something called edge awareness. Because it's not just important to understand latency, it's an understanding of the topology of the network — and we believe Consul does this really well.
If you were to implement this in your video workflow environment, you could have the production editors in Atlanta, in Miami. And as these microservices scale and move to different environments, it's the same configuration manifest over and over. The complexity doesn't increase as the workload continues to evolve. We think that's really powerful.
We want to do a quick demo in our next section here. Think of this more as the cluster admin view. I wanted to show you this doesn't introduce any more complexity, as you typically use kubectl as you develop your application. I want to walk you through the two scenarios we created.
Got two little quick videos here walking you through how that works: Imagine for a moment I've already deployed my Consul Helm chart. You'll see the Consul client is running as a DaemonSet. You'll see it's scheduled to each node. Then you'll see here the Consul server is explicitly scheduled to the region.
Next I wanted to show you deploying the application to each of those Wavelength Zones and then Postgres to the region. I created a quick little automation. This template will be available for you in the coming weeks.
But I want to show what happens in the workflow for doing so. First, I wanted to inspect the pods for the region. You'll see that Postgres is scheduled to the region. You're seeing step zero of my demo in action. But you can scope the namespaces, which will solve most problems.
You look at Wavelength Zone 1's namespace. You'll see that frontend, product API, and public API are all there. And then, you see that Wavelength Zone 2 also has a copy of each of those three services. That's the key point. The service resolver right there — that's the missing piece. So, if you have the configs for the pods in each of the Wavelength Zones, you have the config for the service resolver, and you're good to go.
You see Postgres again. That's in the region. It's a lot of the same. I'm trying to highlight this is no different than how you would've deployed to the region today. The only difference is making sure that in each namespace, only workloads corresponding to that Wavelength Zone are scheduled to that namespace. You can do so with patching or annotations — a number of ways to do that — then with the service resolver, you're good to go for most scenarios in the absence that a particular service doesn't fail.
Let's jump to the next scenario. In this scenario, we're going to highlight what happens if product API fails. Public API still needs to find a copy of product API, so, how does it do that? I wanted to showcase that this is the environment — only changed before they put a full copy of the app in the region.
And this is the manifest file. We say, first and foremost, in Wavelength Zone 1, let's cause chaos — let's remove the product API. All we need to do in the public API manifest is say, disable transparent proxy and then configure explicit upstreams. Where we say, if you're looking for the product API from public API, go to this port on localhost, which will seamlessly route you through the proxy to that service. You’ll see here localhost 90-90. You're good to go. Then we would deploy this new manifest.
Very simple — it's one manifest that you could recycle over and over because you would be failing over to that same service in the region. We believe that this is a powerful pattern, as you see here. This is updating the manifest, and we're good to go.
I want to transition, Jay Cee, to wrapping things up a little bit here and talk about who cares. You're walking through all available pods, as you now see in Wavelength Zone 1. You'll see product API doesn't exist, but yet we're good to go. As this quickly completes, we talked about all the industries that this could leverage. This isn't a solution just for video production. IoT, retail, media, entertainment, healthcare —
Jay Cee Straley:
The list goes on.
As you think about the power of HashiCorp Consul to provide this edge-aware routing, what are you most excited about? What use cases and opportunities strike you?
Jay Cee Straley:
I'm excited about removing roadblocks for developers. You know, when you try to deal with all these different Wavelength Zones, admittedly, it's hard to know where to place and how to load them and do all this work. I'm excited by what developers are going to be able to do using service mesh and Consul to deploy apps across our network — to make it so it's as easy as developing a cloud app now.
I totally agree with that — and to build on that, a few areas I'm excited about: First and foremost, I'm excited for the rest of HashiConf. For all of you here who want to learn more about how Consul's continuing to evolve, right after this session, Irena from the Consul team is going to talk about some of the latest announcements for Consul. I know I'm going. I hope to see you there and continue the great discussion.
But I do want to highlight, as we jump to the next slide, we want to see you build with us. If you have any questions, please reach out. We want you to see us as your trusted partner for building highly distributed edge applications. We want to hear your feedback. We'd love for you to try this demo and let us know what you think.
There's so much more work to do in this domain — this is the very beginning. We've scratched the surface for what you can build at the edge with 5G. You'll notice that we've talked a lot today about within-the-cluster communication or east-west traffic. We believe that Consul is really powerful in addressing that solution.
But we intentionally omitted a whole other exciting problem space around the north-south traffic domain. What if I told you that a mobile device could be in Miami, and the closest Wavelength Zone could be in Boston? To borrow from the airline industry when they say the closest exit might be behind you: Same idea for edge applications — it has to do entirely with the topology. What packet gateway are you anchored towards? Things like that that only carrier-exposed APIs can solve.
Turns out we did a really nifty integration with Terraform and how you can automate that entire workload. If you want to learn more, you can check out a recent HashiTalks presentation we did with the fabulous Rosemary Wang of HashiCorp.
If you want to learn more about that or whether you're thinking about east-west traffic or north-south traffic, 5G and edge computing, or a really exciting problem space, we'd love to see you build with us. We're super excited about the promise of HashiCorp — both Terraform, Consul, and beyond — to make development at the edge easier.
We hope you enjoy the rest of HashiConf. Thanks for joining us. We know you had a lot of choices this afternoon, and we hope to hear from you and your feedback.
Thanks so much.