Poisoning: Supply Chain Attacks

Intro – What is the Supply Chain

Researchers uncovered more than 100 malicious AI Models on the popular and open-source AI platform, Hugging Face. Many of these were made possible using Supply Chain Attacks. But what is the supply chain? It’s all of the external parts that an organization relies on to operate.

Basically, it’s a network of third-party suppliers, vendors, software components, and services. It can include but is not limited to:

  • Hardware suppliers
  • Cloud service providers
  • Software libraries
  • Outsourced development teams

What are Supply Chain Attacks?

Supply Chain attacks in cybersecurity exploit businesses’ reliance on these third-party components, vendors, or suppliers to gain access to their primary target – the company itself.

They look to break into companies, not by targeting them directly, but rather the less secure elements in their supply chain. This sort of attack is common as any technology becomes widespread.

So what does it mean for AI Cybersecurity?

As mentioned before, as any technology becomes widespread, many attackers look to break into companies through the less secure elements in their supply chain. AI is no exception here. In fact, AI introduces a new form of third-party risk. With a supply chain consisting of the data inputs, outputs, and the model itself. Each of which requires security assessments and considerations. As shown linked above during our intro, there have already been a several cases where malicious actors targeted components like the libraries and artifacts used by models.

This type of attack can be particularly stealthy. Victims may not realize that the packages, models, data, or components they’re using have been compromised. They can also be catastrophically widespread: a bit like how the famous “solarwinds” attack went about. If “Scikit-learn”, “Numpy” or other famous packages fell victim to this kind of attack it could impact a large part of the AI community.

What are some real world-cases of Model Poisoning?

Attacking a project via a ML Package

Consider the following scenario: a malicious actor wants to compromise a project on this large organization that uses ML. They discover that this organization uses a particular package named “Scikit-Learn”. The malicious actor could then create a modified version of Scikit-learn, copying the original and adding malware to it before posting this version online. They could also try to just compromise the original, attacking it and modifying its code.

Once the victim organization downloads and installs the package, the malicious actor’s code is also installed and executed.

Name Squatting on Hugging Face

Name Squatting is when a malicious hacker uses a name that, at first glance, looks very similar to one of another company. Tesla could have it’s lowercase L replaced with an uppercase i, and most people would not be able to tell the difference. In fact, many malicious actors were already caught Name Squatting on Hugging Face.

The victims would use and reference these malicious models not even realizing they’re not actually the company they know.

Mitigating Risk from Supply Chain Attacks in AI

Supply Chain Attacks, in the context of Cybersecurity, are a problem that has to be tackled holistically. Since almost any component of an AI can come from a public source it is important to minimize the risk on each one. Below are some recommended security measures.

  • Implement Data Provenance: Implementation of strong data provenance tracking for datasets, models, and libraries. This can help ensure that all components are authentic, and have not been tampered with, as well as warn of a potential supply chain attack.
  • Secure Development Practices: Adopting secure coding practices such as dependency management, and favoring components from trusted suppliers, reduces the likelihood of introducing malicious elements.
  • Execution Sandboxing: Ideally, all AI workflows from training to data enrichment should be performed in a sandboxed environment with a manifest for all their inputs and outputs. The sandbox should then restrict any access outside of the manifest.
  • Model and Code Audits: Regularly auditing AI models and the code dependencies used in their development can help identify a supply chain attack.

References: