A Keyboard Wall That Showcases Immutable Infrastructure as Code With Nomad
Aug 05, 2019
See how a hobby project turned into a clinic on modern infrastructure practices by deploying to a Raspberry Pi RGB-animated keyboard wall with Nomad, Packer, and Consul.
As part of a technology experiment—a giant wall of RGB-keyboards that can run various animations—the consultants at ToThePoint deployed a Nomad cluster on a bunch of Raspberry Pi’s to control this creation and showcase it at HashiConf EU.
They came to Nomad after noticing that their approach was error-prone and fragile by design. Their solution was to employ infrastructure as code and immutable infrastructure using Ansible, Consul, and Packer. They also needed an orchestrator, but they needed something other than Kubernetes. Something simple enough to be deployed with a single binary, but with a more diverse degree of workload support than just containers.
For their Raspberry Pi deployments, they landed on using Nomad. Hear from ToThePoint's release engineer, Johan Siebens how the project evolved from a hobby project into a clinic on modern infrastructure practices.
Release Engineer, ToThePoint Consulting
Hello everybody. In the next couple of minutes, I will talk about a project we called the Wall of Fame. My name is Johan Siebens. I'm working for a Belgium-based IT company as a release engineer. You can find me at Twitter under the handle @nosceon.
Enough about me. What is this project we called the Wall of Fame? Well, it's a big RGB wall completely built out of keyboards, and we can use it to display and stream animations—but a picture says more than a thousand words so let's have a look at it.
Every time we show this video or talk about the Wall of Fame, we always get the same question: Why would we even build a Wall of Fame with all those keyboards? Well, there is one simple, short answer to this question. Why shouldn't we build a wall? Because we can!
Another great answer is the project started as an internship in our company and as soon as the student has proven what can be done with our keyboards, other colleagues joined the team and, in the end, we built an entire ecosystem of multiple applications running in the background processing inputs and outputs—we can show images, animation, and other stuff on the keyboard wall.
I mentioned the ecosystem of multiple applications and programs. Let's first talk about the architecture behind it. When we started the project with the internship and its mentors, we had a general idea of continuously streaming animations, frames and images to the keyboards so they can be displayed.
» The first iterations
As with most other software projects, we started with multiple iterations, and the basics were a little program we called the Controller. It's written in C, and it's deployed and running on a Raspberry Pi—and it receives commands via an MQTT broker. Those commands are simple text packages saying which key would be highlighted in which color.
The next program we called the Orchestrator. The Orchestrator is written in Java, and will receive an image—will resize it so it can be displayed on the keyboard wall and then translate that image in multiple commands that will be sent to the keyboards.
In the end, the first iteration we had was rather static. We were able to upload one image, the Orchestrator will resize it, and it will be displayed on the keyboard wall. But we weren't there yet. We had the streaming architecture in mind. The next thing to do was add a streaming mechanism to the Orchestrator. We extended the Orchestrator so it can receive frames or images pushed on RabbitMQ.
Then we extended our architecture with the first processor named the Animation Processor. The Animation Processor was able to receive a GIF, chop it in multiple frames and then stream it over RabbitMQ so the Orchestrator can rescale it and display it on the keyboard wall. There you have it. Our first animation was available.
During the same iteration, we were extending our keyboard wall. As the intern students started we had only four keyboards connected to one Raspberry Pi but now we have 54 keyboards running on 27 Raspberry Pis. It was a little bit difficult because all the Controllers need to be in sync, so we don't have any glitches in the animations.
Later on—we weren't there yet—we replaced the animation processor with multiple broadcasters. It's a new concept in our architecture. In this way, we can add multiple sources in our infrastructure, and the director will prioritize which broadcast will be displayed on the keyboard wall.
A little fun fact: One of our colleagues was developing Space Invaders and other games like Tetris, and we also added a broadcaster into the game. So every time he was playing Space Invaders, we can follow along on the keyboard wall.
In the end, we had a very responsive, extensive architecture. We can plug and play multiple resources and fine-tune which images will be displayed on the keyboard wall. Now you have some idea of the programs in our architecture that are responsible for processing images and controlling keyboards.
» Running and deployment
It shouldn't be a surprise. We are using Nomad from HashiCorp to control and manage all those components I already talked about. In the end, we are using Nomad, but when the project started, we didn't use Nomad—so let's see how we ended up using tools from HashiCorp.
When the project started, the main focus was on coding. We didn't want to lose time thinking about how we will run all those little components and programs on the Raspberry Pis. We needed to have a proof-of-concept as soon as possible so that we are sure we can use the keyboards and display images on them.
The student who was working on the internship was running the Controller manually on one Raspberry Pi. So when he was developing and testing and coding, it was only one Raspberry Pi that was needed. The Orchestrator and all the other components were running on his local machine. It was in this setup that he presented this internship at the school. Later on, he was a ToThePoint employee.
Everything was working. We had a proof-of-concept, and we can stream images to the wall. Now let’s go for a first operational installation. Four keyboards are fine, but we want to take the wall with us to the Hackathon. So we need a first operational installation because we cannot reuse the student's laptop—and one Raspberry Pi is not enough.
We took all those applications, and we packaged them as Docker images using GitLab CI. We took an Intel NUC. It's a small computing device—very portable—so we can take it along with our keyboard wall. We installed Docker on it, and then we started all of our applications in the background using Docker.
» Automation with Ansible
Configuring one Raspberry Pi manually is not a problem, but then we extended the wall using 18 Raspberry Pis. It was a tedious job to configure all those manually. How are we going to do that? The first thing—a developer is a lazy person, so we always want to automate such things, and the first automation tool that we used was Ansible.
Ansible is a simple, agentless automation tool. It was very feasible for us because we created an Ansible inventory, listing all the Raspberry Pis, and then we can execute commands with one execution on a bunch of Raspberry Pis at the same time.
That's exactly what we did. We already had installed scripts that would install the new version of the Controller. Installing the new version of the Controller was a few steps. First, it will check out the new source code from Git and then it will execute some scripts—compiling the binary and stacking everything in place.
We translated all those scripts in Ansible Playbooks and then we can execute those Playbooks on more than one Raspberry Pi in one single shot. We also added some other operational tasks with Ansible—such as stopping and starting the Controller. We can execute another bunch of them when the wall is onsite—like at a Hackathon or here at HashiCorp.
» What can go wrong at the Hackathon?
As you would guess, a lot can go wrong.
Monitoring and troubleshooting
Monitoring and troubleshooting was very hard and painful. As you have seen before in the slides, the Controller was running as a process in the background. There is no management whatsoever. If some Controllers and some processes will go down for any reason, we didn't know which Raspberry Pi was causing the troubles, so it was very hard to troubleshoot, find the problem and to restart the Controller.
The Docker images—Docker containers—were stored in the background. If something goes wrong with those processes, we had to log in to the Intel NUC, browse and search—and deal and grab—every log file we could find, and try to find the problem. That was hard and painful at that time.
We didn't only have software problems, but also hardware problems. SD cards tend to break sometimes—especially the cheap ones we started with. And not only the SD cards but the Raspberry Pis are very brittle as well. Certainly, if we break down the wall and transport the wall to another location, and we build the wall up, we need to be more careful because a Raspberry Pi will break sometimes.
» Hackarthon review analysis
After this first Hackathon with the wall, we had a retrospective, and we were pretty sure we can do better. At this point, our DevOps colleague joined the team. Our DevOps colleague—called Debby by the way—was always talking about observability, and how we can improve our installation with something like central logging and metrics tracing—so we can have more insights on what's going wrong or going well in all of different components of our application. Like the Controller and the Orchestrator.
She was not only talking about observability. She was also talking about Orchestrators like Docker Swarm or Kubernetes. If we use those kinds of tools, we can manage all those little processes we are now running in the background. If something goes down for some reason, an Orchestrator like Docker Swarm or Kubernetes will restart the process if it fails.
She was not only talking about the Orchestrators. She had a really good experience in installing and running Kubernetes. With that in mind, we had a whiteboard discussion session, and we were wondering if we could use Kubernetes as an Orchestrator for our keyboard wall. For those who don't know, Kubernetes is a very popular Orchestrator, but it's highly focused on the containerized applications.
If we want to use Kubernetes, we should containerize our Controller. That's something we didn't want to do because Debby, our DevOps colleague, knew that administrating a Kubernetes cluster can become very complex, especially on a Raspberry Pi cluster.
Kubernetes seemed a little bit overkill for our use case. In the end, we are just building a keyboard wall. It's not that we are going to build an enterprise application spread over multiple datacenters. Last but not least, we didn't want to face the complexity of controlling the USB keyboards from within a container.
We have already built the binary. It's already working on the Raspberry Pi—we have proven that before. But if we are going to use Kubernetes, we have to Dockerize our binary. We saw a little bit of complexity—how can we manage those USB keyboards from within a container? We are pretty sure we could handle it, but we won’t face the complexity.
» Investigating HashiCorp Nomad
While she was talking about Kubernetes, she was also talking about another Orchestrator from HashiCorp named Nomad—and why shouldn't we use Nomad for orchestrating all those things? There are a few features from HashiCorp that we could use in our keyboard wall.
** Far less complex to install and run** Like most of the tools from HashiCorp, it's one single binary you can download from the internet. You install it on a server or another machine with a little bit of configuration. It just works every time, again and again.
It's not only a single binary, but most of the tools from HashiCorp are already cross-compiled. It means that it can be downloaded for multiple architectures, especially for an ARM architecture on our Raspberry Pis.
Last but not least—the most important reason why we had a look at HashiCorp Nomad—it can run diverse workloads. Nomad is an Orchestrator. We can schedule our Docker containers like we would do with Kubernetes, but Nomad allows us to run diverse workloads so we can control our Controllers running on the Raspberry Pis as well.
After this whiteboard discussion session, we went for the HashiCorp Nomad installation.
» Installation first steps
We took the Intel NUC as we had before and installed Consul servers and the Nomad server on it. Nomad will work in a cluster setup, with one or more servers and multiple clients. So that was the first thing to do.
Next, we updated our Ansible Playbooks so that not only the Controller gets installed on the Raspberry Pis but also the tools like Consul and Nomad. Now we have Nomad clusters running. We also need to create a Nomad job so our Controllers—to control the keyboards—are managed by Nomad.
We scheduled our keyboard Controllers as a Nomad system job. Nomad will schedule an instance of our Controller on all available nodes and clients available in our cluster and, in our case, all the Raspberry Pis.
Then we also had our other components detected on all the broadcasters. We already have the Docker images, so now we can schedule them using Nomad and as a service job. It means that it will be one instance of all of them—just like this. The Docker images are still pulled from GitLab.
» Adding or replacing a Raspberry Pi
Using Nomad, we fixed our troubleshooting issues. With Nomad we have one single view on all our components. If one goes down, Nomad will restart it with the right configuration. Using Nomad, we can also consult with the work files, so we have less difficulty pinpointing if something goes wrong.
I was also talking about the hardware failures: What's necessary to add or replace a Raspberry Pi in our case? Most of the time, we start from the standard Raspbian image. We flash it to an SD card, we boot the Raspberry Pi with it, and then we search for this new IP address and add it to the Ansible inventory so we can execute playbooks on it.
Later on, we automated these things so we shouldn’t add it manually anymore—and then run all the Ansible Playbooks, so that the new Controller will get installed, Consul will get installed, and Nomad will get installed. Consul and Nomad were running as a System V, so as soon as everything is up, it will register itself on the server.
At that point, we thought we were ready for our next venue with our keyboard wall. We thought we had everything under control: We practiced a lot in our lab, adding and replacing Raspberry Pis, troubleshooting issues, bug fixing. But at then the next venue we took our wall to, we still had some issues.
» Operating without internet
A Raspberry Pi or an SD card breaks—we thought no problem. We had it under control; replace the Raspberry Pi, running Ansible Playbooks. Then we had this problem: Our Ansible Playbooks still need an internet connection because they are downloading Nomad and other tools like this.
Indeed, the internet was still required for fetching the Controller source because the new version needs to be installed. The Docker images as well were still fetched from GitLab Registry. If we are onsite and wanted to deploy a new version—it was built on GitLab Registry. But our Intel NUC cannot download them because we don't have an internet connection and, still by installing Consul and Nomad, they need to be downloaded from the internet.
How did we fix this? First, build the Controller binary once, and we will store it on MinIO. MinIO is an object storage suite and Amazon S3-compliant interface—and also supported by Nomad. So we can configure our Nomad jobs so that the artifact is downloaded from our internal MinIO service where it is scheduled on one of the Raspberry Pis.
Next, we used a private Docker Registry. The workflow now was if we have some bug fixes in something like the directory or the playlist broadcasters, we push it to GitLab. GitLab will build a new Docker image. Then we pull the image down and push it in a private registry. We changed our definition of the Nomad jobs. Our Docker images are now pulled from a private registry, so everything is contained in our private network.
That was the easy part. What about the other stuff that is necessary to deploy on a Raspberry Pi such as Consul and Nomad? We could place them as well on the MinIO. But we found it was far from ideal. It was not only the single binary for Nomad and Consul that was required on the Raspberry Pis but other tools like the DNS mask and MTP—other packages that are available in the DBM package system or still needs to be installed. So, that will be a difficulty as well.
In the meantime, other ToThePoint colleagues were working on other projects which also involved Raspberry Pi clusters. We were wondering if we could reuse all those lessons learned so far on all the other projects as well? All the things we have learned when transitioning from the first installation to the last installation with our keyboard wall? And not only the software or the techniques we've learned but maybe also the hardware?
Imagine we can pull one Raspberry Pi from one cluster and put it in the other cluster and everything should be working as it should be. This means you want to have one Raspberry Pi of an RM image to rule them all. Luckily for us, there is another tool from HashiCorp, Packer.
» Working with HashiCorp Packer
Packer allows you to create software images—machine images—for multiple platforms. Packer is used to create machine images, especially for virtualization of cloud environments such as AWS or GCP or VMware. But luckily for us, there was also a plugin for Packer which allowed us to create ARM images. It's available on GitHub. Thank you to all the contributors for this plugin. It helps us a lot.
We started with Packer by creating a Packer template. Using this Packer template we installed all the necessary components needed in a Raspberry Pi—such as Consul and Nomad. We didn't install the Controller because the Controller is already downloaded when a job is scheduled using Nomad. It's available on the MinIO.
A little recap: We discussed on each image what needed to be done for adding or replacing a new Raspberry Pi. We take our custom ARM image that has been built with Packer, we flash it to the SD card, we boot the new or the replaced Raspberry Pi with it, and that's it. No more Ansible—the internet is not required anymore.
What happens in the background when a Raspberry Pi boots? First, the Consul client is started. It’s running as a System V service—the same as we install it using Ansible. The Nomad client stores it as well, and it will register itself on the servers.
Because we scheduled our keyboard Controller as a system job, every time a new client is registered, the job is started as well on this new client. So, in the end, we created a reusable ARM image. We can take this image and flash it to an SD card and put it in the keyboard while we can reuse it in the other ones as well.
» Immutable infrastructure
We went a little bit further because, for some projects, different workloads were required. On one Raspberry Pi process, each needs to be running. On another Raspberry Pi, in the same cluster, other workloads need to be running. But we also create some technique in the Raspberry Pi image by giving a role to the Raspberry Pi before the Raspberry Pi will boot.
In the end, we used Immutable infrastructure. The image for our Raspberry Pi was created once, and every time we need an update—like when a new version of Consul or a new version of Nomad was coming out—we could reuse all our Ansible scripts and update all the necessary components on our live environment.
Instead, we chose a way of Immutable Infrastructure. This means if a new version comes out for Consul and Nomad, we recreate a new image. In the case of a Raspberry Pi cluster, it is a little bit time-consuming. But we flash the new version on new SD cards and swapped them out in the Raspberry Pi clusters.
We found out that HashiCorp tools like Consul and Nomad not only have their merits in enterprise-grade applications but also in such a seemingly ridiculous case such as an RGB-illuminated wall.
Tools from HashiCorp like Consul and Nomad are proven technology in enterprise-grade applications spread over multiple datacenters. But we are using them for a Raspberry Pi cluster. Now it's the keyboard wall, but we have created other Raspberry Pi clusters for maze-solving algorithms, and such, like this.
That was it. We had a little demo in mind, but it's very difficult to bring the wall here on stage. In the office, we made a small video on how I unpacked, unboxed a Raspberry Pi, flashed the new SD cards, put it in the wall and then put it in the other cluster. Let's have a look.
As you can see, one of the Pis are broken. I’m unboxing a new Raspberry Pi and unboxing an SD card. I’m flashing the new image, connecting to the wall, starting up the Consul and Nomad. There—it is fixed. Thank you.
The same Raspberry Pi is pulled from one cluster around the keyboard wall, put it in the other clusters, and the system job now is doing something completely different. It's based on which cluster the Raspberry Pi is plugged into.
That's it. Thank you for listening.