HashiCast Episode 8 - Daniel Bryant, Datawire
Sep 25, 2018
This episode of HashiCast features Daniel Bryant, Product Architect at Datawire. Join us as we talk to Daniel about his career and his experience in the technology field over the years. We also dive into the world of Service Meshes and how they help with your application architecture. We also talk about API Gateways, Kubernetes and most importantly about defeasible logics.
- Daniel BryantProduct Architect, Datawire
This episode of HashiCast features Daniel Bryant, Product Architect at Datawire.
Join us as we talk to Daniel about his career and his experience in the technology field over the years. We also dive into the world of Service Meshes and how they help with your application architecture. We also talk about API Gateways, Kubernetes and most importantly about defeasible logics.
- Datawire's Ambassador
- Daniel's blog post on Distributed Tracing with Java “MicroDonuts”, Kubernetes and the Ambassador API Gateway
- Daniel's Medium profile
- Nic JacksonDeveloper Advocate, HashiCorp
- Anubhav MishraTechnical Advisor to the CTO, HashiCorp
Welcome to HashiCast, the self-proclaimed number one podcast about the world of DevOps practices, tools, and practitioners.
Nic Jackson: Welcome to another episode of HashiCast. It’s a real pleasure to introduce today’s guest because we’ve got Daniel Bryant from Datawire. Daniel’s had an incredibly interesting career, and currently he’s working as a product architect for Datawire, and doing some really great things with Ambassador, which is their API gateway, and which integrates with Envoy. He’s written Continuous Delivery with Java for O’Reilly Books. He’s one of the InfoQ editors. He’s a renowned conference speaker, and one of my all-time favorite people. Welcome, Daniel. I could have filled the entire show with just an introduction of your achievement, but you can probably do a better job for me. Why don’t you introduce yourself?
Daniel Bryant: Thanks, Nick. My mom would be proud; if she actually hears this podcast, she would love this intro. Thanks. Obviously, you and I have worked together in the past, and currently we see each other on the community trail, on the conference trail as well.
A little bit of background about myself: I’ve been a software developer in some form or another for about 15 years now. Worked for the UK government my first gig, and then I went to work with a bunch of startups. I liked the fast feedback loop, if I’m honest, working with the startups. I worked for the UK government, and as fun as it was, 18 months I was there, and didn’t actually deploy anything to production. That kind of sucked.
And then went to a startup and literally pushed code, and then the user was complaining to me the next day. Literally they were complaining. I loved it. I love the fast feedback. I loved just learning everything I could. It was like, “Hey, we don’t have someone to run a database. Do you want to learn how to do database stuff?” And I was like, “Sure thing.” So I did a bunch of that for a while, did a couple of CTO gigs at various startups, SpectoLabs. And there Nic and I again bumped into each other on the circuit there when I was doing some talks around that.
And then I went back to consulting, did a bit of consulting with a company called OpenCredo. Shout out to the OpenCredo folks in London. They’re HashiCorp partners, actually, and you guys probably bump into them a lot, but they’re real good people. Still keep in contact with Nicki there. I was just chatting to her last week, actually.
And after I left OpenCredo, I fancied a new challenge. I fancied getting back more into the product as opposed to the services. I loved doing the consulting and stuff. Loved solving problems, but I think my credo probably banks between services and products.
At the moment, working with the awesome folks at Datawire. They’re based in Boston, although I’m based in the UK. And we are trying to create some cool, interesting, and ultimately useful tools for Kubernetes, because we’re thinking, “There’s awesome tech now with Kubernetes, and cloud, and stuff, but we haven’t fully, as an industry, caught up with our practices on how to deploy and deliver code and value.
I’m loving the challenge. I’ll be honest: It’s a big challenge. I’m sure everyone working at HashiCorp, and I’m sure everyone listening, knows these challenges. We’ve got amazing stuff out there, but I’m really enjoying the next stage of my career, trying to help people who are developing code, developing infra-tools and stuff, get the best out of what they’re trying to do.
Nic: Now, Daniel, you’re a little bit modest, because—I didn’t realize this until I saw a tweet—but you’re actually a doctor. You are Dr. Daniel Bryant.
Daniel: Yeah. I only use that when I’m flying. Someone told me put that in when you’re buying a flight, you might get an upgrade. The only time I actually use the doctor thing is when I’m buying flight tickets. But yeah, Nick, I did a PhD in 2004, 2005, in defeasible logics. I’m sure this is very much a niche audience, and to be honest …
Anubhav Mishra: What is that?
Daniel: Basically, classic logics is A implies B, that kind of stuff. With classic logics, it’s monotonically increasing. You can’t almost unlearn. You draw a conclusion, you draw an inference, you can’t contradict that easily. Whereas we all know in the real world, you learn new stuff that invalidates old stuff, and you also make tentative guesses. The classic joke with the defeasible logic crowd—and trust me, this is the best the jokes get—but the classic joke is, “All birds fly, but what happens when you get a penguin?” Penguin is a bird; penguins don’t fly. And that is a real trivial and contrived example, but there’s a whole bunch of research going on in developing algorithms for modeling, in particular the legal and the medical side.
I did a lot of work with oncology doctors, a lot of work with cancer patients, genuinely fascinating and equally respectful, and scary at some times. Whole bunch of work with lawyers, and trying to capture the way they reasoned about things in computers. The classic expert systems are very much if-then, rule-based, whereas the real world, as we’re seeing with the rise of ML and AI and the stuff now with deep learning—and deep learning is one of ways of looking at these things. Back 10 years ago, we were looking more at the argumentation theory, it was called, the feasible logics and nonmonotonic logics.
And to be honest, it was super interesting, but it was very niche. And one of the big things that caused me to move on from that was it was very theoretical. The people I was working with were super into proving one thing, disproving another, and I was just like delivering value. It’s a cool strand in my career. I just want to actually do real stuff that helps people rather than trying sit around a dinner table, and try and solve the world’s problems just by talking. I actually wanted to do stuff. As much as I love my academic peers—I’m not trying to knock them—it just wasn’t for me. I thoroughly enjoyed, learned a bunch of stuff, but I wanted to actually get out there. And when I started doing consultancy part time, that totally led me down the coding route there.
Nic: That’s fascinating, and unfortunate, because I’m not going to be able to ask you about this pain I’ve got in my leg right now.
Daniel: I get that joke, Nick, yeah, because it is rash.
Nic: We should probably get back to the tech. I’m interested in your role at Datawire because obviously you were saying you’re moving back into the product architecture side of things. But you’re an incredible software architect, and a systems architect. How do those skills translate into product architecture and into creating great, great product?
Daniel: Thanks for the compliment, Nick. Obviously, you and I cut our teeth on some of the microservice stuff together, didn’t we, back at notonthehighstreet.com. I think one of the key things, and it’s another thread in my career which I’ve gradually learned, is empathy, a bit of a catch-all. But being able to put yourself in someone else’s place. In life, I think this is just super valuable.
But in particular with the kind of stuff I’m doing now, working a lot with Richard and Rafi, who run Datawire, I think some of the value I add is being able to go, “Hang on. That’s not how I’ve done it in the field.” Although it’s kind of anecdotal, got to be careful with that. But I’m super lucky with the InfoQ stuff, super lucky with the conference stuff I do. I get to talk to people from Netflix, people like you guys. I get to hang out with super-interesting people, and I learn a bunch of stuff. And I combine that with some of the things I’ve done right, some of the things I’ve done wrong, some of the teams I’ve led, some of the mistakes and good things we’ve done, match it all up together. And then it gives me a little bit of an edge in terms of thinking about, “Will the product actually meet the needs?”
The whole Datawire team is amazing, hats off to them. But Richard, for example, is super good at business. Richard looks at the business problems. He’s super techy, but he focuses on the business. Rafi and the technical team, the engineering team, look at engineering top-quality products. I’m in the middle going, “Let’s think about the user.” All of us in Datawire bring different things, and I’m only mentioning three names there.
But I think that’s value I add is being able to go, “Hey, what you often think in your little bubble sometimes, as good as it is, we think, ‘Oh, this will be amazing.’ And you roll it out to users, they go, ‘What were you thinking?’” Because in the real world when the rubber meets the road, as they say, it’s a different experience.
That’s probably the thing I’ve noticed the most about having the experience of actually doing these things, coding these things—and making mistakes, to be honest, learning along the way.
Nic: That somewhat segues into my question around service meshes, because that’s obviously a big part of what you’re doing with Ambassador, and I’m going to say I was incredibly skeptical when I first heard of service meshes. And I think I had this conversation with you before.
Daniel: Yeah, you did.
Nic: Because I looked at it from a microservices practitioner’s problem, and I’m like, “Well, I don’t actually personally believe that implementing the classic, historic circuit breaker, load balancer retrier, back-off type pattern was really a problem when you’re building out systems.” And actually, I quite like the control it gave me. This was one of my main reasons why I’m like, “Well, hey, why do you need a service mesh?” From that empathy perspective, I think one of the things that I came to terms with is that everybody is different, and everybody has different needs, and actually service meshes have got a very good place in the market to lift a lot of that load away from developers. I actually now believe they’re essential for most systems. And I’ll completely retract any bad words I’ve said about them in the past. I’m happy to admit I was wrong.
But what are your views on the service mesh? Because you’ve spoken about it as well. I watched a video you did for O’Reilly, I think, a webinar on service meshes. What do you think the benefits are? Why do you think they maybe fall short? Are they the silver bullet that everybody is looking for?
Daniel: A whole bunch of interesting questions there, Nick. One of the things, just to unpack some of that, is I definitely think many different people have different views on what platform, what application, what’s useful, these kinds of things. My experience working in startups is you’re often happy to put everything in the code, in the application, and that works for you because you’re going really fast. Whereas when you’re an enterprise, you have separation of concerns, both from a skill point of view, but sometimes from a regulation point of view. You want to have that “this is platform, this is cons, this is application” type vibe. I find it useful. OpenCredo is really good in this respect.
But going into some big companies, I could never understand why people used CSB sometimes, but then in some big organizations, financial organizations, it was essential. The sheer scale of the people working on it, you had to integrate at some point, and you had to draw the line, and say, “This CSB is going to be our gold standard of how we transport messages, how we route messages.” All these kinds of things. It really gave me some empathy for understanding why all these different tools exist.
And if you look at it from both angles that we saw when we were at Not on the High Street. It was primarily a Ruby shop at the time, doing a lot of Ruby on Rails. We were bringing in some Java into the stack as well. I remember distinctly we loved the ideas of Hystrix. And we plugged in the Netflix OSS Hystrix, and the Java stack—shout out to Will, awesome developer at OpenCredo and Not on the High Street—and he hooked it all up: “Oh, that’s amazing, full backs, circuit breaking, love it.”
Then we wanted to do the same with the Ruby side, and we found a gem, we found a Hystrix gem, but it was proper janky. By their own admission, they’d literally written it in their bedroom over a weekend, chucked the code up, and they were like, “By the way, caveat emptor.” And we took it on, and it was real hard to get it up to the standard of the Java version of the Hystrix. And we also had the same issue with service discovery. That was when I first bumped into Datawire, actually, with Baker Street, but we also looked at Airbnb SmartStack, and GeneXus, and a whole bunch of tools.
That’s why I think the service meshes feel like the missing piece from the Kubernetes and orchestrator puzzle. A bunch work on Mesos, a bunch of work with Kubernetes ECS—fantastic tools for deploying onto ephemeral hardware, stuff that dies all the time, and that kind of stuff. Kubernetes will reschedule your pods and Nomad—same kind of deal. All these tools are fantastic at that. But they left the communication bit as an exercise for the reader. You walked up to Kubernetes, and you’re like, “Oh, this deployment stuff is amazing.” Then you start trying to do canary rollouts—bit hard sometimes. You start trying to do service discovery. The DNS is good enough, but doesn’t always get what you want. You want to start putting security policies in; you’re out of luck in some ways.
I looked at Toygaroo, they were doing a bunch of stuff, and Weibo obviously doing a bunch of stuff, Calico, and things like that, and Weave Net. That was interesting, but again it was pretty low-level. I think the service mesh has come in almost above that proper Layer 3 networking thing, and are complementing Layer 4 upwards, like Layer 4, 5, 6, 7, and really centralizing a bunch of cross-cutting concerns in a platform level. It’s language-agnostic, if that makes sense. I’ve used it in a bunch of gigs. Once you get your head around it, it’s been very useful.
Nic: I mean, as I say, I’m a convert. But does it mean that you do lose control as a service developer? I’m not going to say an application developer, but a service developer. Because one of my key things I think is incredibly important is designing for failure, especially in a microservice environment, because they fail all of the time. They’re robust, but independent components fail all of the time. And if you don’t get that plumbing correct, then you can end up with a complete system failure, but if I’m putting all of this logic into the service mesh and the service mesh is handling my retries, can I control, for example, service-A-to-service-B has a different retry and back-off than service-D-to-service-B? Is that possible, or do you have to have a one-fits-all solution?
Daniel: It comes back to something you mentioned just a minute ago, Nic, which I didn’t really address. And it’s the silver-bullet thing, isn’t it? I mean, we constantly search for silver bullets, and even Fred Brooks, Mythical Man Month, said, “There are no silver bullets,” which I’m sure all of us that are listening can relate to as well. And I totally get what you’re saying. I’ve struggled a bit with this, to be honest. There are some things that are application concerns, and the service mesh can almost take them away from you. It makes it easier to get on with some of these things. Rather than having nothing—like a service mesh clearly gives you often some service discovery, and it gives you some fault tolerance, and stuff that I’d rather have than not have.
But to your point exactly, particularly as an engineer, particularly if I’m doing a startup, I’m trying to go as fast as possible, I want everything under my control. And the service mesh can take some of that away. And the person I look to in this space is Christian Posta. I’m sure I’ll mention Christian’s name many times in this call because Christian is just awesome. I’m super lucky to have met him last year. We’ve had a bunch of interesting chats, and just recently he put out this awesome blog post. It was about applications safety and correctness, saying a service mesh can’t do everything. And basically, Christian just nailed it. He’s one of those people, a bit like a Matt Klein or even Mitchell and Armon, who can take something we’re all thinking and put it in a really consumable package.
And when I read Christian’s stuff, the way he’s split it up, I think it was application integration and application networking, and he said, “The integration stuff is like an app-level concern, for you and I as developers, all of us here as developers, and the application networking is more of a platform thing.” Now, the service mesh is awesome at the application networking and platform level, but you want to use other tools. I think Christian is a big fan of an old Java tool, Apache Camel, and I’ve used that as well. That one you can package up and do some funky application-level things as well.
I totally think it’s a really good shout-out to think about these things. One thing I did get from a chat with Matt Klein at QCon New York is that he was using Envoy Lyft, and the team at Lyft were using Envoy to provide sensible defaults, like in concurrency and throughput and safety. And I bumped into some of that with Sentinel, for example, in some of the policy work I know you were doing at HashiCorp. And I think having these low-level sensible defaults, but then having the freedom and responsibility to override them by one level up, is where it’s at. And it’s not for every development team. You’ve got to be a bit aware and a bit responsible for these things. But I definitely think it’s worth calling out, and then making the decision consciously: “Are we going to put some app-level integrations in? Are we going to rely 100% on the platform?”
Nic: I think maybe as well when I look at things, I’m maybe overrating the need for too much configuration. I think sometimes things work fine, and you can possibly over-evaluate things, but I think certainly the key thing is the fact that if you can just throw something with an annotation and automatically not have to worry about any of the things like load balancing and service discovery, that is an incredible movement for a team to be able to work fast, and to not have to concentrate on those things, and then concentrate on features, which deliver value.
Daniel: Yeah, yeah, I hear you, Nic. One thing I should chuck in there is it’s always worth thinking about mechanical sympathy. As in, if you do get a bit divorced—because it’s super powerful to be able to chuck in an annotation—but as a developer sometimes I lose touch with the platform. Something I constantly remind myself of is, every time a tool gives me a good abstraction or solves some problems for me, I still want to know about the problems so I can code appropriately.
If there’s something my mentors have taught me over the years it’s always look one level deeper. If you’re operating as an architect, you’ve got to be able to code. If you’re coding, you’ve got to understand the infrastructure to some degree. Maybe not write it, create it, but understand it. And I do warn that sometimes the service meshes might hide some of the responsibilities from developers. Not seeing it too much yet, but it’s a potential, I think.
Nic: I think one of the things that I come across is around observability. And I think certainly, in some areas, there’s maybe a mistaken belief that the service mesh can take away all of your observability woes and needs, where actually from experience and, well, from the project that we worked on, actually having observability deep down into the code is really where it helps to be able to do your diagnostics when you’re either load testing, performance testing or if you’re dealing with an outage. And certainly the level that you get with the mesh, the fact that you have this intercommunication, you can see number of calls, time of call, and stuff like that, it’s incredibly valuable. But how does the service mesh take that further? How do you build up that big picture which is inclusive of the service-to-service communication, but also being able to include very granular service-level metrics, such as how long it took to execute it—a routine or a function or a loop which might be doing some sort of calculation, or just a database call? How do you concatenate all of that together and present it in a usable way?
Daniel: Great question, Nic. And it’s a definitely a super-hard question, and to be honest, a super-hard answer. I think there’s a bunch of people doing very interesting work in this space, but—a shameless plug—I did write a bunch about this stuff in my book. As in, not just plugging the book, but it really made me understand how much of a challenge it is. The chapter of the book just grew and grew and grew as I was writing about all this stuff.
Again, if you haven’t got anything, adding a service mesh gives you a default view. Top-line metrics, are you getting 500s? What’s the latency? What’s the throughput, for example, of your service? That’s good, but you also do need to instrument your applications. You’re right. In the Java world, we have codahale.metrics, and Spring has got some, and I think Go has got equivalent versions of these. I often added semantically relevant data points, even sometimes business data points, to be honest. But even things like you mentioned. Super easy, actually, in the Java world with codahale.metrics. You put an annotation on your method, and it literally times the entry and the exit to the method. And you can then kick that up to Prometheus or something else as well.
Two people I’ll shout out here that I’ve learned a lot from are Cindy Sridharan, who’s done amazing work, @copyconstructs on Twitter and Medium, and Charity Majors, @mipsytipsy. Those two are really doing some amazing thought leadering, and Charity in particular with Honeycomb, a super interesting product she’s coming up with. I was literally chatting with Richard at Datawire today about this. We’re seeing there are two modes of operation. There is the observability piece, the monitoring piece, understanding on a global level what’s going on at your systems, understanding service-to-service comms. Then there’s an understanding of what’s going on from a business perspective. Is the actual user getting value? Are they impeded in any way?
And then what I think Charity in particular talks a lot about is debugging. She calls it high-cardinality debugging, the ability to dial into a particular user. Maybe a high-value user, a HashiCorp premier customer, for example. You just say, “Oh, 1% of my traffic is getting bad results. Who cares?” But if that 1% is exclusively all of your top-priority customers, that’s a real big problem.
Also, the LightStep team, Ben Sigelman, doing amazing work.
It’s an area of innovation. People are chucking in some machine learning—I think that’s genuinely good. You often see a bit of blockchain, a bit of machine learning in landing places, but I think in this place it’s actually quite an interesting space to talk about insight. And as a developer, when something has gone wrong, I really would value someone or something to point me in the direction of where it’s gone wrong. From what my friends at Google and so forth, and from conference presentations I’ve seen, Google with their SRE teams have really worked on this kind of thing, where the heuristics and maxims of the systems will guide you to where the problem is.
In the real world, the world outside of Google and Netflix and stuff, we struggle with this. People like Charity and Cindy are really helping us think about this, the observability, understandability, and debuggability. They’re really three key things, I think, as an engineer.
Mishra: I remember looking into all of this when we were doing microservices at my previous job at Hootsuite. Our dream was to have tracing all the way from the browser, so from the browser you can actually see the traces for each API call, and you had multiple microservices doing different things. And we were a polyglot organization, so we were running Golang, Scala microservices, Python microservices. And it was like, “Okay, we need to instrument each of these microservices, then we need to pass these headers all the way through from the browser all the way down the stack.” It was a big project. When we proposed this, we had some pros and cons written down, saying, “If we were to invest in this how long will it take? Is it even useful.”
And I think we tried it for a subset of our microservice calls, and even then we saw so much benefit that comes out of this kind of system. But yeah, the organization has to be with this, and the organization should understand the pros and cons of this, and actually invest in this in order to make this a reality. It’s not an easy buy-in to get from the engineering teams, like, “This is what we’re going to contribute versus ...”
Daniel: It’s something I struggle with. I was listening recently to Beyond the Phoenix Project, by Gene Kim and John Willis, an awesome audio book that builds on The Phoenix Project stuff. And they’re a couple of people that crystallize thoughts I’ve been having for a long time. Say I was working in a chair manufacturing plant. You could see the trees coming in one end, the wood. And you could see the chairs popping out at the other end. You could see the broken chairs, and you can see you us taking a long time to put the back on the chair. But in software, you can’t see any of that. And then trying to argue that this observability will be really valuable, trying to argue that to someone who is perhaps more of a business person than an IT person, there’s no obvious analogy we can use here. We can’t point to the trees. There are no trees in software.
This is something I as a consultant really struggled with initially. Often as a tech person, we don’t look at the ROI or the risk. As a consultant, as a CTO, it’s something I got a bit burned with, and I learned my lessons. When I was responsible for the profit and loss, I had exactly that conversation: “Do I really need this debugging stuff? Can I afford it?” But we need to think holistically about this stuff and come up with better analogies to help the not-so-IT-savvy people. Because we all can’t be IT experts. We need to help them make these decisions, which is really hard.
Mishra: I’m glad now you work at Datawire. I feel like you’re much more aligned with directly solving some of these problems, and I’m pretty happy that Datawire is actually doing all the products that I’ve used. I remember a year ago I used Ambassador, and we were evaluating API gateways in my previous company. We had this really old API gateway written in Java.
I’m not going to name the company’s name that we were using, but then we were at the same time transitioning from Mesos to Kubernetes, and we were looking into Ingress controllers, and Envoy was new. And I think we were trying out Envoy. And then we found this open-source project out of nowhere. It was like, oh, Ambassador. This is an API gateway on top of Envoy. I was like, “Oh, this is exactly what we’re looking for for our future API gateway or platform.” Could you tell us a little bit more about what Ambassador is and what problems does it solve?
Daniel: Yeah, sure thing. That story is one I bump into nearly every day, Mishra, to be honest. Fundamentally, Ambassador builds on Envoy. It’s written in Python, and Envoy at the core, but we’ve sprinkled some magic sauce over the top. But to make it easier to use, because Envoy is super powerful, but it’s also quite challenging to configure sometimes. Lots of conflict files going in. Now they’ve got all their APIs a bit more locked in. It’s a bit more straightforward, but one of the core things people find when they go to Kubernetes, either they’re doing migration or they’re doing some greenfield work. One of the first things you want to do is get stuff into your cluster, get traffic Ingress into your cluster.
And it’s actually not as straightforward as you might think. I don’t know if you’ve played around with the Ingress options in Kubernetes.
Mishra: I have.
Daniel: And they were a great first start, but some of the discussions around them have stalled, and now Ingress is a little bit wanting compared with, say, the load balancer on the services. We saw a bunch. The default people often go to is using Nginx because everyone knows and loves NGINX, as do I. Used it many times. We used it at Not on the High Street with Nick. But NGINX isn’t very cloud-native, at least not at the moment. But all of the things I’m about to mention, depending on when this podcast is published, it probably will all be changed. NGINX will be super Kubernetes-native and stuff. But at the moment, it’s not. And as much as I love NGINX, you’ve got to learn the syntax of the conflict files. And if you’re a developer, you’re having to learn Kubernetes already. You’re having to learn YAML, perhaps, and learn Kubernetes. To learn something else is just more and more stuff.
We deliberately said—I say, “We” but it was before I actually joined Datawire—“How can we make Envoy easy for people that are comfortable with Kubernetes?” And ultimately now it’s gone to using annotations. You’ve got your standard service. You want to expose it via a route, an endpoint, whatever. And you literally write an annotation in Kubernetes, and you can do the standard stuff you’d see on any gateway really. You map a route to a service, wild cards, all these kind of good things. It fundamentally is there to make people’s lives easier.
We’re looking at things like CRDs. There’s been a bit of work by the Admiral team actually doing a CRD. It was making it even more Kubernetes-native, how you define these routes and things. And we’ve also got a bunch of stuff; because we’re building on Envoy, you can actually access the Envoy effectively, but we often put a nicer abstraction on top. We’ve got integrations that do, say, rate limiting. We use a GRPC interface that Envoy expects, and you can create a service in any language, as long as it builds on the GRPC interface, and you can do things like rate limiting. I’ve got a couple examples in my Medium account of where I built a Java rate-limiting service that you can do customization or various properties based on the request and things.
Same kind of deal with auth. The Kubernetes Ingress is a bit funny with auth. Istio have totally recognized this problem. Istio have got gateways now. They’re doing some fantastic work around the gateways abstraction in net service mesh, but even they don’t support authentication yet. And there’s a GRPC interface again with Envoy that we expose in Ambassador, so you can write your own auth plugin. We’ve got a pro version where we actually will code the integration for you, and you can integrate with your Active Directory or whatever you want to do.
We’re trying to, without sounding cliché in some ways, bring Envoy to the masses. Because it’s totally cliché. Matt Klein has already brought Envoy to the masses. I wouldn’t dare get in the same ring with Matt Klein. He’d do this at another level above, but we wanted to make it easier for people to get all the awesome benefits from what they’re doing, and use it on the edge, as well as on all their east-west traffic, all their service mesh traffic. So we were like, “The first problem is often Ingress. Let’s help people get that one done.”
Mishra: I think that sounds really interesting, because Mitchell and I were just talking in our Kubernetes internal group about CRDs and how you can use RBAC on top of CRDs to do some really cool stuff. And I think one of the major reasons why people vent from using, let’s say, config maps or something else to store all that information using CRDs is RBAC. And that’s super interesting to me that slowly we have come to a consensus. I think Ingress is a great example that you mention. I felt that on a control plane it was trying to do a lot of things for a lot of different proxies.
Daniel: Yeah, there you go, perfect.
Mishra: It became really difficult to write these configurations. Like, “Oh, this has to work across every data plane.” And it becomes so difficult. And I think Ingress is still useful, and I’m sure there’s a grand plan for what Ingress will be, but I’m glad there are products like Ambassador and it’s all open source, and you can use it. I think we used it for a substantial amount of time.
Daniel: Yeah, perfectly said, Mishra. Definitely I think one of the challenges with a project like Kubernetes is you want to welcome everyone with their different solutions, but if you’re not careful you fall back to the lowest common denominator. I’ve definitely seen that with Ingress. Everyone was like, “We want it super configured, but we want it generic.” That’s hard. That’s a genuinely hard problem to solve. The Kubernetes community is fantastic. There probably isn’t a stronger community, but you have to draw the line somewhere, I guess. And that’s what we’ve tried to do with Ambassador. We can because we’re outside, working very closely with Kubernetes, but outside. We can draw the line and say, “Hey, here’s our opinionated version.” And it’s open source as you rightly said. And getambassador.io is the website to hop along to. People can read more about it, and download it, and just try it out. If it doesn’t work for you, totally cool, but we find it’s useful for a lot of people.
Mishra: Yeah, we’ll drop the link down in the description for the listeners, and also drop some of your blog posts from Medium. I think you mentioned two of them that talk about Java and things like that.
Daniel: Yeah, thank you.
Mishra: That would be super useful. I always love reading them.
Daniel: Appreciate it.
Mishra: One other question I had around Ambassador was the integration or even the interplay between service meshes and Ambassador. How does that work? Do you have to have a service mesh in order to use Ambassador? Or do you just need Envoy and then maybe a control plane or a controller that Ambassador provides?
Daniel: Good question. I was in Dublin last week doing a talk, and I bumped into this question a couple times after the meetup. Ambassador is standalone. So if you’re just looking to get traffic into your network, Ambassador will sort you out completely. Often people start, say, with a migration with a monolith, like we did on Not on the High Street. You chip a bit of the monolith off. You create a couple of services, and your service call stack is often quite shallow. Ambassador is perfect there because you can literally, as you would with any other Ingress gateway, route traffic accordingly and to those shallow services and your monolith.
Then, when you grow in complexity, often you might want to look at a service mesh. My general rule of thumb is if you’ve got less than say 10 or 20 services, a service mesh might add overhead that you don’t quite get your return on it. But that’s an anecdotal thing—still working around this kind of stuff. But, if you check up our Docs app, a bunch of people have integrated Istio with Ambassador, for example. The control planes are still separate, so we have the annotations on the Kubernetes services doing all the routes for the Ingress, and we have an Ambassador admin service or pod and pods deployments that do all the magic there at the control plane. You still need to run Istio with all its mixer and pilots and everything, so you do have two control planes. And you need the Istio routing, which is obviously done differently. They have a separate config file.
But we’ve had a bunch of people do mutual TLS, for example. End-to-end, Ambassador does the TLS termination, then it will pass onto Istio, and Istio does mutual TLS internally. I know we’re looking at doing stuff with Consul Connect, but as soon as I bumped into Consul Connect, I reached straight out to Nic. And I was like, “This is super interesting.” The Envoy stuff for us is really interesting because we’re doing a lot of stuff with Envoy, so we’re waiting to see what’s going on around the Envoy work there.
But regardless of what mesh you choose, I think there’s going to be meshes for almost all different use cases. I think the mutual TLS stuff that Consul Connect is focusing on is a primary use case for a lot of enterprises. The SMEs I’m seeing their primary use case with SO is mesh’s observability, and there are a couple other use cases. You might choose your mesh depending on what you value the most, but with Ambassador, we would look to integrate with all of them. It’s going to be interesting. The biggest challenge I think around this stuff is the control planes. As a developer, I’m having to learn lots of stuff all the time, having to learn how Istio works, Ambassador works, Kubernetes.
I think as an industry, we need to come up with a bit of a better approach to some of that stuff. But we’re totally up for it. We like to say we manage the north-south traffic, the traffic coming in and out of your cluster. But service meshes are where it’s at for the east-west or cross-data center traffic.
Mishra: That makes complete sense. I think it’s good you mention that our goal is to write software applications that we write for our business organizations. It’s not to learn eight different technologies and trying to make them interplay. I think that’s also important, but I feel like the goal in the end was to actually write good code, and hopefully that code brings in a certain amount of revenue.
Daniel: I saw Gareth Rushgrove today tweeting; he’s doing the reviews for QCon, I think it was. And he was like, “If you’re going to pitch me a service mesh solution, tell me how this adds value.” It was a really pithy tweet. Exactly what you said, Mishra. The primary goal of all of us: We want to do cool tech and we want to work with cool people, but bottom line is we are delivering some form of value to users. And as an engineer I forget that. I’m sure we all do. But a reminder every so often that you’re actually here to help the users focuses your attention on the actual business code.
Mishra: Talking about writing and testing software out, I really struggled with the initial phase of my learning when I got into Kubernetes. Minikube was still a new project that you could use to get a local Minikube cluster running. We had these really big Kubernetes clusters, both in a development environment and a staging environment and a production environment. How do I make things like integration tests work locally, unit testing, and things like that? I think it was really difficult for us to figure out a consistent way to do it, and we came up with our own way. And then I remember this project—again, Datawire was one of the people behind it, no surprise—called Telepresense. And I think initially when it came out, I remember it was on Hacker News. I think it was front page or something.
Daniel: That’s always good.
Mishra: It’s always nice looking through all the comments. It solved our problem at that time, and we didn’t have to do eight different things to make, let’s say, hijacking DNS, and that stuff that we’re already doing in the background to hack this, make this integration testing work. This tool actually did it in a very consistent way. My question was more around how did you find this problem and how did you go about solving it in terms of using, let’s say, Telepresence? But what do you see in the wild in terms of the Kubernetes community or, let’s say, the cloud-native community? Where are these problems, and how do you go about finding them? How do you go about solving them?
Daniel: If I remember my Datawire history correctly, it was Itamar. (Itamar has actually moved on to do some other stuff now. He followed his dreams looking at some socially responsible companies, doing some software development there.) But he was basically brainstorming. One of the problems he constantly bumped into as a developer was that he was doing stuff like working on bigger projects in Kubernetes or whatever, and he couldn’t spin them up on his local machine. He was like, “How do I locally develop a service but interact with all these services that I can only run on a remote cluster?” I think that’s it if I got my history right, I think that was the general vibe. And it basically went from there. And Telepresence is actually a CNCF project now, so it’s stewarded by Datawire, but it’s CNCF, the Cloud Native Computing Foundation, a hosted project, which is awesome. Got a lot of great benefits from them in terms of building the community.
But fundamentally, Telepresence allows you to two-way proxy to a remote cluster. And the way I crudely look at it, because I’ve used it a bunch, is I can basically put my laptop in a remote cluster. The joke is it’s cheaper than visiting a data center. I don’t have to visit the GKE. I’m a big fan of using GKE and ephemeral nodes. Doesn’t cost that much. I spin up all my config or whatever, and often because I do a lot of Java stuff, as much as I love Java, it’s often quite thirsty on the RAM. So even though I’ve got a 16GB Mac, I can’t run all my services on my Mac.
So I spin up everything in GKE on ephemeral nodes, and then I Telepresence into the cluster. And there’s a nice command line flag you can do: swap deployment. And I can literally swap the deployment that’s running remotely to my local machine. And I can then debug. I often use IntelliJ, for example, in the Java world. I can poke some traffic in through Ambassador, say, like in the real world, the actual remote cluster, poke some traffic in. And I can set some debug points on my swapped deployment. Basically the Telepresense will see there’s some traffic due for my service. It will route the traffic to my local machine. I do my debugging and as long as the timeouts are set rightly, I can then send the traffic back into the cluster. And it will do its thing.
There’s a whole bunch of interesting use cases there. I wouldn’t recommend it for debugging live traffic, per se, although we are looking at that kind of thing, using funky things with headers you can shadow, and canary, and reroute traffic. But my primary use case is when I’m, say, messing around with a big local development—local remote development environment—or a staging cluster. And so it goes to the conversation we had earlier on is when stuff is going wrong when you really need to get in there and actually look under the hood, and figure out what is broken, Telepresence is amazing. You can run all your tools locally as if you were in the cluster. I’m often digging in DNS lookup and checking that all the DNS is done right. And then I’m literally debugging using all the tools of the Java dev that I love locally, but I’m primarily testing against their remote invocation, which has been routed via my laptop.
At Datawire and other companies I’ve worked at, but Datawire are really good at this, we’ve figured out the common problems. By working with customers, going and seeing their problems, you soon identify the pain points. When you’ve seen a few different customers, you spot the commonality. Rafi at Datawire has done a bunch of consulting gigs, and he was like, “We’re seeing this at every gig. We should probably create a tool to outcome the issue.” I’m pretty sure it’s the same with HashiCorp.
I remember when I first used Vagrant, for example, I was like, “This is exactly what I wanted.” Mitchell pretty much said, “I was scratching my own itch. I had this problem. I created a tool.” We’ve inadvertently followed in his footsteps in that one.
Mishra: Thanks, Daniel. This is actually a very interesting discussion. I love asking this to all our guests, and I think one of our last questions is, “Where do you see the tech industry going?” This is a super-subjective question. We get a variety of answers.
Nic: What Mishra really wants to know is, Are robots going to take our jobs?
Mishra: “Will I still get paid?” is what I’m asking.
Daniel: It’s a multi-level question, Mishra, to be fair. I work with InfoQ, and our emerging trends are things like blockchain, quantum, and augmented reality. They’re super interesting, but for most people listening, and probably ourselves, in the actual development space, I think serverless is super interesting. My view is that serverless is going to coexist with containers for a while. Some people are saying, “Serverless is the only way to do things.” And they’re very clever people, and I respect them a lot, and I get where they’re coming from, but I also can see the benefits of where we’re at in terms of containers and stuff.
I think Kubernetes being the default fabric, if you like, the default platform fabric, things like Knative being built on top, and a bunch of—I’m sure, HashiCorp are doing lots of super-secret projects, which we’ll get to see in a few months’ time—I think it’s a very interesting space around the serverless stuff.
And directly related to that, something we’ve already touched on today is the control plane. As developers, we need to be able to understand what’s going on, and then interact appropriately. At Datawire, the infrastructure we’ve got access to is incredible: cloud, containers, Terraform, it’s stuff I couldn’t have dreamed of a few years ago when I was developing code.
But we’re almost struggling to keep up with this in a development principle point of view. How do I actually use all this stuff? How do I test it? How do I deploy it? These kinds of things. Matt Klein has talked about this kind of stuff a lot. I think building effective control planes for observability, and understandability, and how we deploy and manage containered stuff, and serverless stuff is where it’s at.
And that’s why I joined Datawire. They were making bets on that being a very interesting area of development. That’s where we’re really looking to add a lot of value-on, on how you actually use all these things that deliver cool stuff.
Mishra: I don’t want you to go without giving you an opportunity to plug your new book, which is coming out probably the same time this podcast comes out, I think. Could you tell us a little bit more about your new book?
Daniel: Yeah. The book will be out end of October. I like to joke it’s a perfect holiday gift. Thanksgiving, Christmas, whatever you celebrate, I think it’s a perfect gift. But, yeah, thanks for the opportunity, Mishra, and Nic just shouted the book.
It’s with a colleague of mine, a friend of mine, Abraham Marín-Pérez. He joined me midway, did a fantastic job of really helping with so many things. Both of us have written the book. It aims to be a one-stop shop for a Java developer, be they fresh out of college or a seasoned Java developer looking to embrace the modern ways of working.
Something both Abraham and I have bumped into is all we’ve talked about today: working with cloud, working with containers, working with—maybe not service meshes. We haven’t covered that too much. But definitely cloud and containers, and the demands of users these days, the fast feedback loops required. There’s a whole bunch of stuff that we’ve got some best practices or some good practices around. Just in my consulting experience, they’re not always well shared out. The future is here; it’s just not evenly distributed, as the quote goes.
What we’ve tried to do in the book is capture our learnings. , and actually shouted you out, Nic, definitely. We learned a bunch of stuff together there. Abraham has done a bunch of work with Equal Experts—same kind of thing. We’ve tried to package it all up and provide a guideline for understanding all the stages of continuous delivery, how you develop things locally—Telepresence again gets a shout-out and these kinds of things—how you do testing. Even simple things like how you manage your Git workflow. I’ve definitely bumped into people who are fresh out of college and haven’t done much version control, for example. We’ve tried to put a starter chapter there for people that are new to all these professional ways of developing software. Stuff that we all had to learn the hard way as we got into these things. There’s a whole bunch of GitHub examples. I’m still writing some of those. We are using Packer, Vagrant, Terraform (you’ll be pleased to hear) for our demos, bunch of Kubernetes stuff, bunch of ECS, EKS, I think as well, and some other things.
We’ve tried to make the book concrete, but with plenty of conceptual stuff, high-level stuff, and then the examples are actually going to be code-driven. But that is a work in progress. I’m literally coding this weekend, and probably next weekend to finalize those up because the book isn’t officially published until later in the year. We’ve got the writing done, but the coding is still coming.
Nic: That’s awesome. And I want to add to that I know from working with you that even if [the listener isn’t] a Java developer, I definitely recommend picking it up and taking a read because the concepts, regardless of the language, are going to be solid and essential to anybody who is working with modern, cloud-based systems.
Daniel: Cheers, Nic, I appreciate that very much.
Nic: I’m looking forward to picking it up and grabbing a read. I might even pick up the early access on Safari this weekend.
Mishra: I know what I’m getting for my Christmas gift.
Daniel: Thank you, Mishra. That’s awesome.
Nic: Yeah, I’ll download you the PDF, Mishra.
Mishra: Thanks, Nic.
Nic: Just before we go, I have one last question for you, and it’s our most important question, and our most—well, it’s completely not serious at all.
Nic: If you were a programming language from history—I’m talking about not something modern like Go or Rust or something like that; I’m thinking something historic like Java or something—w
hat would you be, and why?
Daniel: Oh, my gosh.
Mishra: That’s hard.
Daniel: That’s a genuinely hard question. But one thing that I really struggled with during PhD times was Prolog. Don’t know if you ever bumped into Prolog. It’s a list-derivative-type thing. I think I’d be Prolog because I like to think I’m fairly straightforward and simple on the outside, but kind of deep on the inside.
Nic: Oh, yeah.
Daniel: That’s probably a bit of a deep answer, but Prolog was one of those things—it looked so simple, but to actually understand it took a lot of time. I think that’s generally how I roll, as I might appear quite understandable, but actually when you get to know me, you realize there are multiple levels.
Nic: How about you, Mishra?
Mishra: Oh, man. That’s a really hard question. I think something that I was introduced to really early on when I started using Windows and I loved: If you remember batch programs, simple batch programs that you could write. I think that was pretty elegant and interesting to me because you could do a lot of hacking. It was mostly writing scripts and stuff, but it was something that you could actually make a lot of use off, and actually make your life really easy on Windows. And I felt like that resonated with me as a person. It was simple to learn. You see what you get, basically. And I think that’s one of the things I feel even today I look toward. That’s why I like Go so much; basically, you’re able to write things. But I’m not comparing batch programs with Go; let’s just be clear. I’m just saying in terms of simplicity.
Nic: We just lost half our audience.
Nic: Mishra compares batch to Go.
Mishra: Oh, my God. I just feel like batch programs were simple, and I think Go programs are pretty simple, too, when it comes to reading them. And I think that to me is really important.
Nic: Hot topic for the moment: What’s your stance on generics?
Mishra: Oh, my God. Let’s just move on.
Nic: Let’s not lose the rest of the audience. Daniel, it’s been an absolute pleasure. It’s been wonderful speaking to you. Thank you so much for coming onto the show. And I hope our listeners really, really enjoyed this episode. Certainly one for the archives just for the amount of knowledge that you’ve shared with us, so thanks again. It’s been a pleasure.
Mishra: Thanks, Daniel.
Daniel: Appreciate both of you for the invite. It’s been amazing. Keep up the good work, because I’ve been listening to the podcasts, and they’ve been really good fun. So appreciate both of you taking your time to do this. It’s been awesome.
Nic: Thanks so much.
Mishra: You’ve been listening to HashiCast with your hosts Nic and Mishra. Today’s guest has been Daniel Bryant from Datawire. Thanks for listening, and we’ll see you next time.