See how Bowery Farming uses Nomad and Consul to power their worldwide network of indoor farms with highly-available services in the cloud and on-premises.
Hello, everyone. My name is John Spencer, and I'm a senior site reliability engineer at Bowery Farming. I'm here to talk to you about how we connect our farms and grow plants using Nomad and Consul. I'll also touch on our usage of a few other HashiCorp tools like Terraform and Packer.
Bowery Farming is an indoor vertical farming company, pioneering a new way to grow produce and leading the farming industry toward a more sustainable future. Bowery Farming is local indoor farming at scale.
We grow produce in controlled-climate systems that allow our farms to operate 365 days a year, regardless of the climate outside. We use zero pesticides in our produce, which in my personal opinion is better than organic, because organic doesn't mean pesticide-free.
Our farms are 100 times more productive than a traditional farm on the same square footage of land. And we use 95% less water than a traditional farm. Most of the water leaving our facility is in the plants themselves.
We have a closed-loop irrigation system, so after the water is provided to the plants, it flows back into our system, gets cleaned and then is put back into circulation.
On screen is one of our grow trays. As you can see, the crop has LED lights above it. It has fans to promote airflow. It has irrigation, ingress and egress just outside the frames, everything that the plant needs, and everything that we need to grow the healthiest, freshest produce imaginable.
This photo shows one of our facilities. This facility is only about a third the size of our normal production farms, so they are quite large. As you can see, we stack our crops from floor to ceiling to get the most usage out of our space.
Bowery grows produce in highly automated farms that are run by our proprietary hardware and software system, which we call Bowery OS, or Bowery Operating System. We control the temperature, lighting, airflow, irrigation, nutrient mix, humidity, CO2, photoperiod, and more with our automated systems.
We have farmers in our facilities who we call "modern farmers," and they work together with these hardware and software systems to grow and harvest crops in a low-touch, fast way.
To give you a sense of scale for Bowery compared to a traditional farm, traditional farms have about 1 to 6 crop cycles per year. Bowery farms have over 100,000 crop cycles per year, per farm. Crops are monitored at each stage of their lifecycle, and we feed this data back into our system to continue to improve our crop yields and crop health. It's a very powerful feedback loop.
We have an agricultural sciences team that works out of our research farm, and they are constantly testing new crops and tweaking the recipe for each crop, and that recipe includes the photoperiod, light spectrum, etc.
When they're ready, we take that recipe and bring it into our production system to grow the crops at scale. As these crops grow in production, we're collecting data on the crops and using that data to further fine-tune our research and our recipes.
To give you an example of that feedback loop in practice, each of our crops has a camera above it, taking pictures every 5 minutes. On the left side of the screen, you can see a computer vision model, which we have trained to detect basically "plant" versus "not plant," and you can see that the plant is identified in orange.
This allows us to see how large the crop is growing as it grows versus what we would expect.
On the right side, you'll see a computer vision model that is trained to detect discoloration in the leaves, with little red bounding boxes to show where this discoloration is taking place.
In this video, you'll see our crops grow. We have a video service, which we call "Cyclops," that basically takes these pictures every 5 minutes and stitches them together into videos. Cyclops is one of my favorite services because it produces very mesmerizing videos. You can stare at them all day.
But instead of staring at them all day, we take action on them. If one of our models detects that a crop is not growing as expected, we can automatically alert systems or human operators to take action.
On the right side of the screen now, you'll see that those little red bounding boxes popped up to show some of the discoloration in leaves.
We have a lot of interesting optimization problems at Bowery. Our farms are high-throughput systems, so it's really important that we place crops and retrieve them in a way that's as optimized as possible.
For instance, when you look at our farms from a certain perspective, you might see a big 3D matrix of available locations to place things. In this case, those things are crops.
How do we decide the order in which to do our work at the beginning of each day, and how do we decide where to place crops in our system at the end of each day? It's just another optimization problem with clear constraints.
Every day, we have machine learning algorithms that tell our modern farmers and tell our systems what crops to grab and where to put them.
Our supply chain team has quite a job to do, as well. We make commitments to our customers on what we can deliver and when, customers like Whole Foods and Stop & Shop.
Our crops take a few weeks to grow from the germination phase to multiple growth phases, and then we have to retrieve and harvest the crops with an estimation on what the yield for that crop should be.
Even when our crops are harvested, there are still more challenging decisions to be made. At Bowery, we grow butterhead lettuce, 3 types of kale, mustard greens, romaine, basil, arugula, spinach, and more. And with romaine, for instance, we sell romaine as romaine, but we also sell it in a mix with our other red leaf lettuces, as our spring blend.
How we allocate our product to avoid loss, maximize order fulfillment, is another example of opportunities for machine learning to aid in solving these optimization problems. And this is all to help in fulfilling our overall mission, which is to grow food for a better future.
How do we do it? What's our tech stack? What's going on under the hood?
Today I'm going to be focusing on our software and networking infrastructure, but it's only a small piece of what makes Bowery function and what makes our farms work.
We have many different teams, from farm design, hardware, data and AI, farm maintenance and operation, and of course, marketing, sales, and all that other fun business-y stuff.
It starts with the cloud. Like many startups these days, Bowery has services and infrastructure running in the public cloud, in our case, AWS.
But Bowery also has a physical presence because we build and operate these large indoor farms. And that means that we need to run a network and servers, not just in the cloud, but also on premises at our farms.
When thinking about running software services, when thinking about application schedulers, service discovery, health checks, DNS, and so on, we didn't want to use a tool that was just tied to a public cloud provider.
Think about ECS, EKS, or Fargate, which are some of AWS's compute offerings, because we need to run services on-prem as well. Ideally, we wanted to use the same set of tools, same set of APIs, and have the same developer experience across both cloud and on-premises.
When evaluating tools to solve our use cases, we had a few additional constraints we were working with.
Of course, we needed something that was highly available and fault-tolerant. As I mentioned, we grow produce 365 days a year, so we have high uptime requirements.
We needed something that worked across regions and across datacenters, because we are building a worldwide network of farms.
We needed something that was going to be easy to deploy and easy to manage. We don't have a very large SRE team or infrastructure team, and so, for example, we did evaluate solutions like Kubernetes, but for our size and scale, it seemed like it would be unnecessarily complex and too burdensome for us to manage.
We also needed something that would integrate natively with the DNS protocol. As I'll talk about later, we have lots of devices of various types inside of our farms, and they need to be able to talk to services on premises and in the cloud, using DNS.
HashiCorp tools like Nomad and Consul were specifically designed to work across datacenters, to work across regions, different cloud providers, and hybrid infrastructures of cloud and on premises.
Bowery started with a Nomad region in us-east-1 and a single datacenter, which we'll call aws-production. We do have staging environments and sandbox environments, but for the purpose of this talk, we'll just focus on production.
Nomad has a low barrier to entry. With a small SRE team of just 1, we were able to deploy Nomad into AWS and start running batch services to get the hang of it.
One of the great things that I love about Bowery is that we have a lot of very talented engineers on our team. So even though our SRE team is small, we were able to deploy the initial Nomad cluster and then hand it over to our engineers to build on top of and deploy all the services.
We have a 3-node server cluster for Nomad running in AWS and n number of client nodes. We started with just a couple of clients, and then we scaled up as our demands on Nomad increased.
One of the great things about HashiCorp tools is that they integrate really well together, allowing you to start small and build up from there. So when it came time for us to connect services, to run API services or frontend services, we could easily integrate Consul into our stack.
Similarly, with Nomad, we have a Consul datacenter in aws-production. We have a 3-node server cluster, and then we have n+3 client nodes running. That's the same n as our Nomad clients, and it's +3 because we run a Consul client on each of our Nomad server nodes.
Running Consul on our Nomad server nodes allows us to easily join the cluster during bootstrapping and allows Consul to help check Nomad.
Next, we wanted to extend this to our farms, so we deployed Nomad clients into a datacenter, which here we will call bf-1, for Bowery Farm-1.
This was our first deployment of Nomad into our farms. We have n number of client nodes running in this case. It depends on how many resources we have available to us at our farms. Right now, we are running these services on bare metal, but we're exploring VMs for better resource utilization in the future.
Additionally, we deployed Consul to that same datacenter, bf-1, and for Consul, we also deployed a server cluster at each farm, which is per Consul's recommended architecture.
For Nomad, the recommended architecture is 1 server cluster per region, and then you can have different clients in different datacenters. But for Consul, you want 1 server cluster per datacenter, because you want to have a low-latent land connection between the servers and the clients for that gossip.
Next, machine configuration, a match made in heaven with Packer and Ansible, of course. Our machine configuration is handled via Ansible.
We have a set of Ansible playbooks, which install everything we need on our machines. That includes the Nomad and Consul binaries and their respective configuration files. It includes Docker, monitoring agents, everything we need to run our services.
Then we wrap Ansible with a Packer template. For AWS, we're using the Amazon EBS builder to bake AMIs. For on-premises, we're using Packer's null builder to be able to run Ansible on premises, as well.
I see this as a very powerful and extensive setup because we can use the exact same Ansible playbooks in both cloud and on premises, with a Packer template for each use case.
We're also thinking about deploying a virtualization solution to our on-premises environment, something like VMware, which integrates nicely with Packer.
And then, of course, Terraform is in the mix. We love Terraform at Bowery.
We use Terraform to manage many of our AWS resources, especially everything involved with spinning up and bootstrapping our Nomad and Consul clusters. And we're looking toward the future to be able to use it for on-premises.
We deploy large sets of network switches at our farms, and so to manage and bootstrap those switches, we're looking to leverage Terraform as well in the future.
Next, I want to talk about Consul DNS, which is a very important part of our hybrid infrastructure. Consul DNS was one of the first features that we evaluated when rolling out this new infrastructure.
Let's talk through an example of how we're leveraging Consul DNS. Let's say we have a crop health service running in AWS. It's keeping an eye on the health of our crops and responding if anything isn't right.
We also have a service running on-prem which we'll call the crop position service. This is running at our bf-1 datacenter, and it's controlling the location of crops in our system.
If our crop health service determines that there is an issue with one of the positions that a crop is in at bf-1, it can make a request to that farm using the fully qualified domain name that Consul provides, which in this case would be crop-position.service.bf-1.bowery.
This is a nicely extensible setup because as we add more farms to our network, it will be easy for our cloud services to direct their request to the correct farm, simply by enumerating on that farm ID. Whatever the datacenter might be, bf-1, bf-2, etc.
Additionally, Consul acts as a DNS server for our farm facilities. We have an instance of dnsmasq running at each farm, which acts as our first line of defense for DNS requests.
A dnsmasq is an open-source DNS server. DNS requests at our farm hit dnsmasq first, and then if they match the subdomain that Consul is configured for, which in this case is the
.boweryTLD dnsmasq will send that request to Consul.
Otherwise, it will send it to Google DNS servers or whatever upstream DNS servers we choose. This allows Consul to focus on just serving the requests for services running at the farm and lets regular DNS entries go to our upstream providers.
Using Consul DNS also allows any of our devices in the farm to quickly contact the on-prem service that they need. For instance, if we're running a time service at each farm or an NTP server, we can simply program those devices to contact
ntp.service.bowery for all their time-related needs.
Regardless of where in the world, this service or this device is booting up, it knows that
ntp.service.bowery will have its time needs. And that's similar for any other services we might want to run.
One important piece of our networking stack is this site-to-site communication between our farms and the cloud.
Our network is set up in a hub-and-spoke model, and all of the services I've been discussing so far are running in our private IP space. Nothing is exposed to the internet. That means in addition to the application-level security that we get with Nomad and Consul ACL tokens, we also have everything running in our private network.
This is enabled by Cisco Meraki, which is one of our core network providers. Cisco Meraki site-to-site VPN is enabling our farms to talk to the cloud and vice versa, all within our private network.
We have a Meraki VMX running in AWS. The VMX is Meraki's virtualized firewall. And then we have a Meraki MX firewall running at each one of our sites, and that creates that VPN tunnel.
We currently don't need our farms to talk directly to each other, but we certainly could enable that through Meraki with just a couple clicks.
Each of our farms has thousands of network devices spread across the facility for a variety of purposes. Bowery is very much an Elixir shop, and we use the Nerves framework to run Elixir on our custom hardware devices.
These devices on our farms connect to services that are running on premises, and those services can either handle that data locally or proxy it up to the cloud. Each farm you can think of as a mini-campus or mini-datacenter, and it requires networking capacity to be distributed across the entire facility.
Network switches are spread out around the facility to connect all of our devices and various automation pieces. We have lots of wireless access points for both human and machine clients. This network is really fundamentally important to running our farm.
For example, if a switch that is servicing a core piece of automation goes down, that's a production-stopping P0 event. Because of that, we have a service running on our farm in our Nomad cluster that is monitoring the state of our network, alerting us to any failures.
That's a custom Golang service that we built to both monitor the Meraki API and then also monitor the Unify API. Unify's a big network provider of ours, as well.
This is a look into the future that Bowery's creating: a worldwide network of farms providing fresh, healthy produce to local communities. Each farm is running a Nomad and Consul datacenter, and they're all connecting back to our AWS cloud.
Bowery recently announced that we're in over 650 stores now, primarily in the Northeast region of the United States. So keep an eye out for Bowery in your local grocer. We're in Whole Foods, Walmart, Stop & Shop, Weiss, and many others.
Thank you all so much for listening today. Thank you to HashiCorp for having me speak.