We’ve recently become aware of a set of malware targeting Consul nodes with a specific configuration which allows remote code execution. Members of our community also (responsibly) reported incidents caused by this malware, and worked with us to include a patch in a recent version of Consul that protects from this threat in the wild.
This post details how this malware may affect users, depending on their configuration, as well as outlines the steps we've taken to backport a patch for versions 1.2.4, 1.1.1, 1.0.8, and 0.9.4 to make it easy for older versions of Consul to be secured without a major version upgrade.
You should take action if you have
-enable-script-checks set to true, or are running Consul 0.9.0 or earlier, and the Consul API is available on an interface that can be accessed over the network.
Steps to remediate:
-enable-local-script-checkson Consul agents if script checks are required.
For full details, continue reading below.
Script checks are one type of health check Consul can execute to determine the health status of a target service. Supported check types are HTTP, TCP, gRPC, Docker, Alias, TTL, and script checks. Checks can be registered via the Consul API or loaded from a directory of check definitions by the Consul agent.
Script checks will execute any command or inline script by the Consul process at the configured interval. Docker checks are the same, but execute the script within a running Docker container via the Docker daemon's API. Script (and Docker) checks are disabled by default in Consul installations as of Consul 0.9.0, and though they are useful for many cases present a potential security risk.
The Consul API is the HTTP interface for interacting with the Consul agent, including registering health check definitions as noted above. We recommend binding this API to the loopback interface in the majority of cases to prevent accidental exposure. We provide an extensive ACL system that can be utilized to protect API interactions, including registering checks.
The necessary conditions that make an agent vulnerable to this attack are:
Given the above conditions, an attacker can register a check on a remote agent with a malicious payload. By design, script checks allow arbitrary code execution, so allowing service registration with checks enabled via an exposed API presents an RCE (remote code execution) threat.
Consul's early design assumed that HTTP API was restricted to trusted local network access, and only later did it become clear that many users were not aware (or still chose) to expose it for a variety of reasons, prompting the change to disable script checks by default in 0.9.0.
Enabling ACLs is recommended as a best practice, but even with restrictive ACLs in place, if an attacker compromises a host with at least one ACL present that can register a service, that same ACL token will allow the attacker to register with a script check on any remote node that is exposed, since ACLs by design are not scoped to a single agent. We are not aware of malware exploiting locally-available ACL tokens against an ACL-protected cluster yet, but wish to proactively point out this risk.
We've had several reports of this in the last month and have confirmed that it is becoming more widespread. We want to ensure our community of Consul users is aware of this threat and knows how to protect themselves against this type of attack.
In each known case, the attacker gained access to one host in the datacenter via some unrelated vulnerability outside of Consul. The attacker then detected exposed Consul APIs on the network (likely though automated scanning) and used script checks to propagate the malware to other servers.
Many users use the script check feature for services across their infrastructure, so disabling script checks altogether would disable functionality they rely on.
In Consul 1.3.0, released earlier this month, a member of the Consul community contributed a patch that adds a new configuration option,
-enable-local-script-checks, which allows script checks to be registered only via local configuration files, thus preventing use of the HTTP API to register malicious checks. We recommend all users who need script checks switch to this option and find ways to avoid depending on remote API access to agents.
In addition to this, we recommend reviewing usage of the ACL system and interfaces that the Consul API is listening on. Script checks remain disabled entirely by default.
Given the challenge of upgrading through multiple major version changes to 1.3.0, where this feature was released, we have backported the
-enable-local-script-checks option and released multiple patched versions of Consul to ensure our users can mitigate this threat with minimum disruption and effort:
These are drop-in replacements for users operating versions in that series. If you are currently using a version before 0.9.0, we recommend upgrading to 0.9.4 - there may be some breaking changes from earlier versions, but these are minimal compared with the much larger changes made in 1.0.0. As always, we recommend following our upgrade guide for all upgrades. Version 1.3.0 and onwards already contain this feature. Enterprise binaries for the same versions are also available.
In general, we prefer to see users upgrade, and rarely backport fixes like this. Because of the nature of this threat, we decided to backport this fix so that users could protect themselves quickly, without needing a major version upgrade that might potentially introduce breaking changes.
Users that rely on registering script checks via the HTTP API can continue to use
-enable-script-checks, but should take extra care to enable ACLs and to ensure the API listener is not accessible on the network. We recommend finding alternative deployment patterns where remote HTTP API access is not required as soon as possible.
It is especially dangerous to allow script checks on Consul servers, especially since it's common to expose the API on servers for global tooling or UI access. Remotely registerable script checks on servers risk exposing all Consul state including ACL tokens to the attacker from any node on the network. We are considering making it a hard error to configure script checks on servers in future versions of Consul. Generally Consul servers tend not to run other workloads anyway so disabling script checks on them isn't likely to cause a problem, but please check your configuration didn't enable them just because other clients need to.
We are also exploring several other options to help notify users of this potentially insecure configuration, as well as considering additional protections for script checks for future releases. We added more prominent warnings to our documentation for script checks and agent configuration as well as adding a new section to our security model highlighting known-insecure ways to configure Consul. As part of making running Consul securely more accessible, we redesigned the interfaces to our ACL system in Consul 1.4.0 and recommend usage of that for users not currently using ACLs with Consul.
Use Minikube to create multiple Kubernetes clusters with Consul and test cluster peering configurations in your local development environment.
Consul 1.16 adds new reliability, scalability, security, service mesh UX, extensibility, and service discovery capabilities.
The HCP Consul management plane now offers deeper insights to your Consul deployments via cloud-based observability and seamlessly links new and existing self-managed Consul clusters.