Whiterabbit.ai learned several useful strategies for using Terraform and Packer in their effort to fight cancer with artificial intelligence.
Whiterabbit.ai develops state-of-the-art machine learning models that detect breast cancer. These models are deployed in radiology clinics to augment existing healthcare practices and increase the quality of care for patients, enriching medical images with AI inference results to provide tailored feedback to the patient and medical practitioner.
This talk by Whiterabbit.ai software engineer Daniel Cardoza will focus on how Terraform and Packer have allowed Whiterabbit.ai to build a complex infrastructure for machine learning.
Terraform codifies the infrastructure, building a hybrid-cloud service layer for deploying applications. Learn about Whiterabbit.ai's best practices for Terraform modules and DSLs to create reusable groups of resources. Packer is used to generate virtual machine images deployed on servers in the cloud and the clinics we operate, keeping consistency across environments.
See how Whiterabbit.ai learned several strategies to build and manage this infrastructure with a small development team.
Hey, everyone. I'm Daniel Cardoza; I'm a software engineer at Whiterabbit.ai. I appreciate all of you being here. You're going to get something out of this talk—whether it's AI, the impact, or how we use the HashiStack under the hood.
Today I want to tell you about real-time radiology, or RTR for short. It's a system we've built on top of the HashiStack that uses Artificial Intelligence to detect breast cancer in real time. First, I'll give you a bit of a background on who Whiterabbit.ai is, what it is we do, and what our core mission is.
Then we'll do a deep dive on the underlying infrastructure—how we use Terraform and Packer to provision and help us develop RTR. At the end, I'll describe some of the lessons we learned using the HashiStack and building this complex system.
I've been at Whiterabbit.ai building backend services and infrastructure since October 2018. I've included some of my social media handles here—if you want to follow or take a peek at those. I have also included a picture of an actual white rabbit to dispel any confusion about what our logo is.
Whiterabbit started as a company in the San Francisco Bay area in 2017. Since then, we've grown to over a hundred employees, 35 of which are on the software engineering and technical staff.
This includes data engineers that are responsible for building and massaging our datasets, the machine learning team that builds the AI models we use in running production—as well as a series of full stack and DevOps developers that help us build the software ecosystem on top of the AI.
We own nine radiology clinics, one of which is in California—eight of which are in Phoenix, Arizona. It gives us a bit of a variety in who we employ; what kind of minds we get when we're trying to attack some of these hard problems. We're a distributed team. We are mainly located in the U.S., primarily California, Nevada, and Arizona, but we also have a sister engineering office in Mumbai, India.
What is it that we're trying to do and accomplish? Our mission statement—clear and simple—is to eliminate suffering through early detection. If you can detect disease and ailments much faster than before—perhaps much faster than a human—you significantly increase the positive probabilities and outcomes for your patients—or customers, as we like to refer to them.
Specifically, we're focusing on breast cancer. There are a variety of ailments that are ripe for attack by AI—a new technology—but there's a lot of data on breast cancer. There's a lot of research in that field, so that's what we decided to attack first. The way we do that is by building AI models to attack breast cancer and introduce them to the clinics by augmenting typical clinical workflows.
Some people might be worried that AI and other technology will displace humans in the medical field, but what we're trying to do is integrate them so humans can use them and be more productive. Whether that's a radiologist or clinic technicians—that are also in those nine radiology clinics—as well as the patient.
The patient experience is what we're trying to optimize the most for. As we'll soon see, real-time radiology disrupts or affects their experience in ways that hadn't been done before. But before I can talk more about real-time radiology and what it is we do, there is some background knowledge that is necessary and a prerequisite for talking about these topics.
Radiology—which is what the talk is on—is the medical subfield that uses medical imaging to diagnose and treat medical ailments. If you've ever had an X-ray, MRI, CT scan—all of those are grouped underneath the radiology specialty.
What we'll be talking about today are mammograms, which are an X-ray of breast tissue. Artificial intelligence, is a big buzz word. You may have heard of deep learning, machine learning, classical models.
At its core, you're trying to teach a machine to emulate some human behavior or function that previously machines couldn't do. Whether that's identifying objects or classifying things in a given image—or if you're trying to determine whether a piece of text is meant to be positive or negative—AI can be used in both of those scenarios. Breast cancer is when cancerous tissue is formed in the breast. A lot of you may know about that.
Here are some example images. I wanted to give you a glimpse into what the radiologist—what the doctors will see—when diagnosing patients. On the left, you have an x-ray of a right hand—but more interesting; on the right, you have an actual mammogram.
These are what radiologists look at when trying to diagnose and look for benign or malignant masses in breast tissue. Not only do radiologists look at these images, but the AI models we train also receive them as input.
We could talk about AI all day and a lot of the interesting work there. But to summarize some of the work we've done—we have a dataset from a top-five U.S. medical university. This dataset has close to two million images. All of these are mammograms or tomosynthesis images. Mammogram you can think of as a 2-D x-ray of a breast. A Tomo, or tomosynthesis image, is a 3-D representation of a breast. It's much more granular—provides a lot more information to the radiologist.
The algorithms are the way we build these models by using supervised learning. I've included them in this diagram. Those of you who are familiar with Silicone Valley—the TV show—may already be familiar with this. It's a machine-learning task where you train a model to learn a function by providing a lot of example inputs and outputs.
For this model, we're training it to learn whether or not an image is a hotdog. The inputs and outputs that you provide as examples will be images, as well as the label, which is whether or not they are a hotdog.
Via the training stage, your model will become more accurate—more precise—and will learn the function over time. Usually, the more data you have, the better your model will be. Once those models are built, we use them to run inference in our clinics. Inference is when they receive new inputs that they've never seen before, and using the results they give us, that is how we do the core work of real-time radiology.
Here's an example image of how AI can be applied to the mammogram we had seen earlier. This is an image taken from an MIT study that indicates how an AI can identify a cancerous growth in a breast—perhaps earlier than even human radiologists can.
Our models do the exact same thing. You give it an image. They can give you a probability of where the cancer could be—as well as whether or not there is cancer, or the mass is malignant. Using some of these cool determinations, you can build software—an entire ecosystem on top of them—to get that knowledge back to the patient—get it back to the clinics—where it can be integrated effectively.
Now that we have some of the problem contexts in our mind, let's look at the underlying motivations. One of the key ones is we want to reduce the time it takes from the screening to the diagnostic stage for a patient.
Typically, this is 30 days. When you're going to be checked for breast cancer, you will go for either an annual or bi-annual mammogram. It starts off as a screening phase, which is an initial test. It's one of the less invasive images they can generate.
From the results of that, you will proceed to the diagnostic phase—which is secondary tests that will tell you more information. It will help the radiologist make a determination about whether or not there is cancer.
The average in the United States—this will vary based on country—is close to 30 days. However, in real-time radiology, we can sometimes bring this entire step down to a single day— but typically only a few. By detecting things earlier—and getting these scans done earlier—our patients get results faster, and they can receive treatment faster, as well.
Secondarily—going back to our core mission—early detection increases the positive probabilities for our patients. There's a morbid term in the medical world called the five-year survival rate. That is the likelihood of a patient diagnosed with a disease will be living in five years.
If the breast cancer stays in the breast, the outcomes are quite high—99%. However, if they are not detected early, and they spread to a foreign part of the body, they can go down to 27%. If detected early, the outcome is very positive—but only if you detect it early. Tools like real-time radiology help our patients do so. We do that using these models we build—we can triage suspicious cases a lot faster than a typical human radiologist.
The typical patient workflow is you will come in for your annual mammogram. As you're going through the screening stage, you will have images generated of your breast. These are generated by a modality—or a very special camera—using cool physics to get the very granular images that you had seen earlier.
Using those images as they're generated in real-time—as well as other health and patient records we have at the clinic—we feed all that data into our Whiterabbit triage engine. This triage engine is responsible for the ingestion of images, filtering out images and cases we cannot handle—but also running inference against the images for each patient. It will output one of two things. One, maybe the patient has a suspicious set of images. With that information, we will elevate it on the radiologist's queue.
Radiologists have task queues—very similar to a software called Worklists—which are queues of patient cases with all the associated data they need to make a determination. By elevating this—if the images are suspicious—on to the top of the radiologist's queue, there is potential for the radiologist to see those images right after the patient has generated them—meaning the patient is still in the clinic.
With this information, we can expedite the patient to the diagnostic phase on the same day. That's the best outcome. Because of scheduling conflicts and other things that can come up, it will be handled in the next few days—but it's a lot faster than the 30 days, which is the average in the United States.
If the image is not suspicious, it will still go to the end of the radiologist's queue and will eventually be read by a human. An important insight here is that a human will always see the records. We are not trying to remove them from the process. We try to expedite and triage cases as best as possible.
This has been deployed in our clinics in April 2019. A lot of the immediate feedback we got was quite positive—one of them being the reduced anxiety. Typically, you'll go in for your screening exam, and it may be a month—or upwards of a month—before you receive the results back, or a callback saying, "Please come back."
All this time, the patient will be worrying. They'll have other responsibilities to worry about, too. But by reducing that anxiety via the reduced time, they have much happier outcomes. As well, it helps increase the productivity of a radiologist and our clinical staff, because they can handle the cases that need the most help or most work—sometimes on the same day. We save a lot of time in the future. This is what real-time radiology is.
As of now, the actual software component might be quite nebulous. What is the Whiterabbit triage engine, and most importantly, what system and infrastructure does it lie on top of?
That's when we get into the infrastructure. We will talk a bit about the underlying systems and what is there. At a high level, we have a hybrid cloud, multi-cloud, multi-platform—whatever your preferred term for it is—where we have centralized services on Amazon. We also have software and hardware we deploy to our clinic sites, as well.
Each clinic has a box packaged with the microservices that make up real-time radiology, that all run in Docker, and have NVIDIA GPU—because those are required for running inference at the edge in a faster way. For those of you who have used hybrid clouds, you might be wondering, "These are hard to manage. They're hard to orchestrate. Why is this your model?"
One of the primary initial constraints is that medical images that are generated by modalities can be quite big. They can be in the 100s of megabytes, sometimes even gigabytes. Not every clinic we operate in—especially at the beginning—had fiber or even a direct connection to Amazon.
One of the biggest bottlenecks and constraints was transferring these images to the cloud, which is why we went with this hybrid model. One of the interesting things about hybrid models is how can we achieve deployment of our software in a uniformed way across these two environments which run heterogeneous hardware. The container orchestrator we used to do that is Docker Swarm.
We've had a lot of talk about Kubernetes, Nomad, and other orchestrators. We went with Swarm for a few reasons. One, it's quite simple. For those of you who have used it, creating a distributed cluster is quite simple. It comes with a lot of interesting benefits, like overlay networks and distributed key-value store.
As a basic architecture—managers and workers—very similar to Kubernetes. It also has node-aware scheduling, which is very similar to all container orchestrators. If we want to run a specific version of RTR at a specific clinic—because the software may be different, or the workflows are a bit different—we can do that using node aware scheduling. We can run a series of containers or microservices on a specific clinic box using node labels.
Here is one of our boxes that we have deployed. Maybe pedantic and call this a rack, but the premise is the same. These are Linux boxes within video GPU. Each of these boxes becomes a member of our distributed Swarm cluster that has instances both in the cloud and the clinic. These are the boxes that run RTR and other microservices. These are what we ship out to clinics to run the software and do the compute on the edge.
Here’s a slide describing how we use the cloud—which is saying we use it to do most things. It also indicates how confusing Amazon Service logos are—very unintuitive, very confusing—but we try and use Amazon for as much as we can.
We're still a small infrastructure team, so we try and lean on their compute, storage, networking resources as much as we can. Whether that's EC2, Lambda, Fargate—for various use cases. S3 is where we store all of our medical images for data processing, batch processing down the line. All of the networking resources—including a VPN (which I may not have mentioned before) which glues our clinic instances with the cloud—as well as for other infrastructure; whether it's Route 53 or other resources they have.
When you have a hybrid cloud and distributed system, you need efficient monitoring of the learning—especially if you're a small team. You need something to tell you there's a problem, so you don't go manually search for one. The way we do that is with this stack. We use Elastic Beats. Elastic is the company that does Elasticsearch.
They act as lightweight agents and the nodes in our Swarm cluster. These will keep track of metrics such as CPU, disk, RAM, network—and pump all those to Elasticsearch. They will also gather logs from our containers, our system processes on the boxes—and that will all get indexed in Elasticsearch.
After it goes through Elasticsearch, we use Grafana to visualize all this data—to build high-level dashboards indicating the health of the system as a whole. We also use CloudWatch—because you get it for free by using Amazon—and pump that data to Grafana to get some more dashboards that are built-in by default.
Our learning tool of choice is Opsgenie—very similar to PagerDuty if you have used that. That's the tool that will alert you that a problem has happened—which is what happened to me last night and my co-worker when we arrived in Amsterdam.
How do we manage this infrastructure? There's a lot going on. You may be surprised to hear that there are only two engineers building this, part-time. We have very multi-disciplinary teams responsible for building the services of RTR, as well as the infrastructure it runs on.
Myself and someone else do both of these tasks, as well as ingest all infrastructure requests from other teams—whether it's product, the machine learning team, or even the data engineers. The only way you can manage all of this complexity with such a small team is to lean heavily on tools, which is where the HashiStack comes in.
The two tools we lean on the most are Terraform and Packer for managing all of our infrastructure—as well as all of the virtual machine images that we build and deploy.
I included this in case some people were not too familiar with Terraform, because there have been a lot of great talks—especially the keynote today—so we can go through this slide pretty fast.
We use Terraform to provision all of our infrastructure and resources on Amazon, Docker Swarm—as well as Postgres for RES synthesis in the cloud. Terraform is great, as I'm sure all of you are aware. We use it for its modules. We use it for its interpolation functions. It allows you to do some compute and logic as data flows around your Terraform code base.
The current model we use is a very CLI-based one. Being a team of two, it's easy to have a model where we'll do a Terraform plan, make a PR with the plan, and then apply it once the PR is approved or merged. As we think about growing our codebase and getting other developers involved in the process, we're thinking about leaning more on Terraform Cloud or Terraform Enterprise in the future.
Now we'll look into a few small case studies of how Terraform lets us create some resources or modules that are beneficial for an engineering company in the medical space. In America, there are federal regulations governing how patient data are handled. That is HIPAA—the Health Insurance Portability and Accountability Act, which describes how data is supposed to be handled in transit; how it is supposed to be secured, both physically, as well as electronically.
It has to be encrypted in transit, encrypted at rest. All the companies you ever partner with to store this data have to be HIPAA-compliant. The terms or the resources that HIPAA talks about are PHI, or protected health information, EPHI, it's the electronic equivalent of that.
We store all of our medical images in S3. It's easy to accidentally misconfigure permissions on a bucket. There are bucket policies, IM roles at play, as well as access blocks—which Amazon added recently. We prevent ourselves from shooting ourselves in the foot and accidentally exposing this data by wrapping it in a nice, easy-to-digest module right there.
It's a very powerful 10 or 12 lines of code. Internally, it makes buckets with the strictest permissions possible. A lot of you may have either built a similar module or found an open-source one online. This allows our developers to make pull requests—adding new buckets for new experiments and new work—without having to worry about the security constraints or concerns under the hood.
Our Swarm workers—since that's our container orchestrator of choice,—it's quite a complex system. Requests come in from either the VPN or the Internet, go through one of our load balancers, and then are directed to one of our Swarm worker groups.
We create these Swarm worker groups, which is where compute is run—our containers are run and instantiated. We provision these with different instance types and EC2 Auto Scaling groups. Not only that; each group may have a different type of task, a different set of labels—whether they're more attuned for long-lived services, building and testing AI models, or for batch jobs or worker pools.
There is a lot of complexity here that if you have to manage it via script, Puppet, or Ansible could be quite hard. But the interface that we have is via this very dense Terraform module. I don't expect everyone to understand—or read through—the entire thing. It shows how we string modules together, and whether we have dependencies on the module—building load balancers, DNS, making EFS instances.
All of that is glued together here. It's quite understandable from the outside. Internally, it may be a bit complex, but if any developer is curious or wants to instantiate a new group, they have all the tools they need to do so. Terraform is quite powerful by allowing us to encapsulate all the complexity on the previous slide into this one module.
Lastly, the VPN. This is the glue that brings our hybrid cloud together. What it does behind the scenes is instantiate AWS virtual private gateways and customer gateways to allow us to integrate with our nine clinics.
The model interface is quite straightforward—to provide the WAN IP or the Gateway IP of the clinic, as well as what seeder blocks or IPs they want to expose across it. You're pretty much good to go. We do workflows like this quite often—so having it wrapped in a module makes us doing it the second, third, nth time, much easier.
The second tool I had mentioned that we lean on a lot is Packer—we have a lot of instances. The way we run our software on top of it is always via VM. We build AMIs for Amazon, QCOW2, or ChemIO for the clinic boxes.
This allows us to bundle up dependencies—startup scripts, Docker—into the VMs that we use everywhere, which is where Packer comes in. More importantly—and this goes more to what the benefits of VMs are—the software environment is always the same despite heterogeneous hardware.
The box I'd shown on a previous slide—that's one version of the box we deploy to clinics. We also have two others with a variety of GPUs and physical resources. Having a tool like Packer allows us to build this homogeneous software environment where you may not know what the underlying hardware is.
One interesting thing we do to minimize some complexity is have a Packer inheritance model. If you want to build a new model for a specific purpose—whether that's an Ubuntu, Elasticsearch cluster, or a clinic box—we have a set of tooling and scripts that will let you do this quite easily.
Racker, I believe, is an open-source tool that some of you may be familiar with. It lets you do something similar. This is mostly what we use Packer for. It's a simple tool that allows you to do some powerful things. Just for a small dev team—allows us to offload a lot of the complexity to the tool itself.
You may have questions about how we deploy these boxes to our clinics and how the installation process is. Integrating with on-premise environments can be quite difficult, especially if you don't control all points of the process.
Our current process is; we'll build the Whiterabbit box in-house, we'll deploy it to a server—and that's hopefully a lot cleaner than the one in the picture. Then the clinic staff just have to plug it into a redundant power supply—as well as an actual power supply—and turn it on.
This begins a complex bootstrapping process. When the parameter box comes on, it will download one of the VM images and the QCOW2 images from Amazon S3. Each of these boxes is provisioned with a different IM user and key, allowing it to do this on boot up. Once the image is pulled down, we use
libvirt to instantiate it.
Once the VM image comes up, that means a separate bootstrapping process. The VM comes up—it will run Docker—and then run a bootstrap script that will have it join or distribute the Swarm cluster across the VPN.
Once it has joined the Swarm cluster, some cool, interesting things happen. First of all, our monitoring containers will immediately begin running. Using Swarm, you can do global scheduling, as other container orchestrators allow you to do. This means we want every instance part of our Swarm to run our Beats—our lightweight agents—that forward metrics to allow Elasticsearch.
They will also immediately begin to pull down the images required for RTR. We'll add a specific label to this clinic—or node, rather—saying it is a clinic, and then immediately the Swarm scheduler is smart enough to schedule the RTR containers onto it as well. It could be a potentially very manual bootstrapping process, but with the power of Packer—and a lot of automation—it can happen in 10 minutes—20 minutes—mostly dependent on the bandwidth you have available at the clinic.
Being at the conference the past two days, this has changed quite a bit. There are a lot of cool things that HashiCorp has in store—and a lot of ways and tools that would make our existing architecture a lot simpler.
Terraform 0.12 is one of those tools that we're looking forward to using and adopting. We have a lot of custom DSLs and a lot of glue code that could be greatly simplified with its richer types and all the cool fixes they have.
Another one is Nomad for container orchestration. We went with Swarm because it was simple, but using Nomad is an alternative way. One of the coolest ways I didn't know when making the slide is all the cool Consul work that's gone on. I'm talking about the service mesh—the mesh gateways that were presented in the keynote. A lot of that would reduce the complexity we currently have with our VPN and hybrid cloud solution.
Lastly, Terraform Enterprise or Terraform Cloud. We're trying to grow the DevOps culture at Whiterabbit to offload some of the work to our actual developers, and Enterprise and Cloud let you do that with a nice UI. They're much easier to interface for people that may not have used Terraform or any of the HashiStack before.
There are a lot of lessons learned doing this. First of all, hybrid cloud is hard, especially when you're on a small team. There are a lot of points of failure—whether it's on the VPN end, on the cloud, the on-premises resources that you can't easily take down or instantiate—and could require a lot of manual debugging in case something does go wrong. This is why you have to build a powerful monitoring and learning system so you can deploy and be alerted of any issues as they come up.
Secondarily, consistency over correctness. If you instill certain patterns or tools into your organization, it prevents people from drifting off and doing alternative workflows that may not be tested or used as much.
On a startup especially, people have their own way of wanting to do things. They want to iterate fast. But you can force some consistency—such as Terraform—for how you provision all of our infrastructure. We no longer get into problems or snags— where a lone developer, who somehow had root access, is provisioning their own instances—and wreaking havoc on infra.
We have self-served DevOps to help your devs help you. Being a two-person team, you want to offload a lot of the cognitive complexity and manual work to the developers as much as you can. We do that with friendly Terraform modules and internal learning sessions. We can disseminate some of the cool stuff that HashiStack allows you to do to developers. Because the tools are quite simple and well-made, this process is quite nice. A lot of developers are eager to learn these things—for self-learning and to help their actual work.
Lastly, making your Terraform modules extensible. I’m sure everyone is aware of things like this. You never know when you're going to use it again or how many times it will be instantiated. If it's built in a way where you can easily take on new features or resources, that definitely saves you some time in the long run.
Real-time radiology helps patients receive their results faster than ever before—reducing the amount of time a patient will wait from 30 days to potentially a day. It allows you to change the healthcare space and patient expectations in ways that aren't typical—haven't been seen before.
Terraform and Packer have allowed our small dev team to build out this complex infrastructure and manage it—both creating and building it; responding to requests from other teams, as well as building some of the actual services that compose RTR.
Thanks for listening. I've had a great time at this conference. I want to say thanks to HashiCorp for putting on such a great event. As promised, if you have any questions, you can go to this URL or scan the QR code and be redirected to a Google forum to provide any feedback. That's all I had. Thanks for listening. I hope you got something out of it. If you want to ask me any questions in person, feel free to find me at the conference. I'll be here for the rest of the day. Thank you.