A set of critical vulnerabilities called ‘ShellTorch’ in the open source AI model serving tool TorchServe affect tens of thousands of Internet-exposed servers, some of which belong to large organizations.
TorchServe, maintained by Meta and Amazon, is a popular tool for serving and scaling PyTorch (machine learning framework) models in production.
The library is primarily used by those involved in training and developing AI models, from academic researchers to large companies such as Amazon, OpenAI, Tesla, Azure, Google, and Intel.
TorchServe flaws discovered by the Oligo Security research team can lead to unauthorized server access and remote code execution (RCE) on vulnerable instances.
The ShellTorch vulnerability
The three vulnerabilities are collectively called ShellTorch and affect TorchServe versions 0.3.0 through 0.8.1.
The first flaw is an unauthenticated management interface API misconfiguration that causes the web panel to be bound to IP address 0.0.0.0 by default instead of localhost, exposing it to external requests.
Since the interface lacks authentication, it allows unrestricted access to any user, which can be used to load malicious models from an external address.
The second issue, identified as CVE-2023-43654, is a server-side remote request forgery (SSRF) that, if exploited as part of a bug chain, could lead to remote code execution (RCE).
While the TorchServe API has logic for an allowlist of domains to retrieve model configuration files from a remote URL, it was discovered that all domains were accepted by default, leading to a side-side request forgery failure. of the server (SSRF).
This allows attackers to upload malicious models that trigger arbitrary code execution when launched on the target server.
The third vulnerability identified as CVE-2022-1471 is a Java deserialization issue that leads to remote code execution.
Due to insecure deserialization in the SnakeYAML library, attackers can load a model with a malicious YAML file to trigger remote code execution.
It should be noted that Oligo did not discover the SnakeYAML vulnerability, but instead used it as part of its exploit chain.
The researchers warn that if an attacker chains the above flaws together, they could easily compromise a system running vulnerable versions of TorchServe.
A demo of the ShellTorch attack chain can be seen below.
Oligo says its analysts scanned the web for vulnerable implementations and found tens of thousands of IP addresses currently exposed to ShellTorch attacks, some belonging to large organizations with global reach.
“Once an attacker can breach an organization’s network by running code on its PyTorch server, they can use it as an initial foothold to move laterally into the infrastructure and launch even more impactful attacks, especially in cases where there are no restrictions.” or standard controls. present,” explains Oligo.
To fix these vulnerabilities, users should update to TorchServe 0.8.2, released on August 28, 2023. This update displays a warning to the user about the SSRF issue, thus effectively addressing the risk of CVE-2023-43654.
Next, properly configure the management console by configuring the administration_address to http://127.0.0.1:8081 in the config.properties file. This will cause TorchServe to bind to the localhost instead of each IP address configured on the server.
Finally, make sure your server gets models only from trusted domains by updating the allowed_urls in the config.properties file accordingly.
Amazon also published a security bulletin regarding CVE-2023-43654, which provides mitigation guidance for customers using deep learning containers (DLC) on EC2, EKS, or ECS.
Finally, Oligo released a free verification tool that administrators can use to check if their instances are vulnerable to ShellTorch attacks.
Update 3/10 – A Meta spokesperson sent BleepingComputer the following comment regarding the flaws discovered by Oligo:
“Issues in TorchServe, an optional tool for PyTorch, were fixed in August, making the exploit chain described in this blog post moot.
We encourage developers to use the latest version of TorchServe.” – a Meta spokesperson
Update 4/10 – Article updated to better reflect the scope of the problem and the effectiveness of the solutions available.