Poisoning: Data Poisoning

Intro – Datasets, what are they?

Nowadays, most AI and Machine Learning algorithms leverage large amounts of data which can be purchased, collected or sourced online. This data is known as the Training Dataset and it enables the model to learn patterns and relationships within the data. By doing so, the model can make predictions or decisions based on new and unseen data.

What is Data Poisoning?

As we’ve mentioned above, a given dataset will be responsible for much of an AI’s behavior. So what would happen if someone was to purposefully alter this data?

They could cause an AI to:

  • make incorrect predictions (e.g., labeling a cat as a dog)
  • misclassify inputs (e.g., categorizing an email with a virus as safe)
  • behave in manners beneficial to an attacker (e.g. giving incorrect recommendations)

This type of manipulation may not be detected during the training phase due to the changes being subtle or cleverly hidden. To the developers the AI may seem to be learning properly, but it has actually been compromised.

AIs learn from what they’ve been exposed during the training phase. They have no way of knowing if what they’re learning is right or wrong. An AI trained to recognize dogs in an image will be fed thousands of pictures of dogs to develop this capability. However, these images have been previously labeled by someone in a process called Data Labeling. The AI has no idea if the images someone labeled as “dog” are actually of cats!

This type of attack is subtle, introduces malicious or corrupted data into the training process and will impact the model over time. These similarities with its physical counterpart earned it the name of Data Poisoning.

What are some real world-cases of Data Poisoning?

Forcing an AI to generate malware by poisoning Code Datasets

During one of my security assessments, I was faced with an AI tool and platform. It took several community-provided libraries and then compiled them into a single, more compressed and readable file.

The problem with this approach was two-fold. Anyone could publish any library on their platform, and the AI seemed to favor the libraries that were most commented.

This led me to create a malicious library that did its job as advertised but also contained malware encoded within it. A few fake accounts later and I had enough comments on my malicious library for it to be the top recommended one. This caused anyone using the tool to unknowingly be generating and running malware on their computers.

Artists glazing their artwork to break image generation models

Recently, there was a subtle, more benign example of Data Poisoning in the form of a tool called Glaze. Developed by a team from the University of Chicago, it was a created in response to the outcry of thousands of artists revolting against the scrapping of their art for use in generative models.

Glaze effectively poisons the artists artwork in a way that the art looks unchanged to the human eye but is obvious for AI algorithms. These subtle differences in the images will not only be detected but also incorporated into erroneous patterns, which will, in turn, disrupt their training. Any image that they generate will have little to do with what is asked of it.

You can read a bit more about Glaze and the amazing work they’ve been doing to combat this new way of digital copyright infringement here: Glaze: Protecting Artists from Style Mimicry

Diving Deeper into Data Poisoning Attacks

I mentioned my personal experience above because many similar cases are still happening. That’s not just due to other ethical hackers performing audits on AI tools and companies. Anyone can publish a dataset online. It’s easy enough to slip in a few dangerous lines in a large dataset for others to use, not realizing they have just compromised themselves.

Consider an AI used by a small-to-medium sized company that selling software. Much like many others currently in the market, they might consider training a model themselves so they can program with it as an assistant. This has a few benefits. Not relying on a 3rd party, cutting down on costs or not having their code potentially leaked to other companies. However, if they’re not careful with the dataset they are using they could inadvertently introduce problems too

A malicious attacker, aware of these companies, could publish a dataset specifically for this case. They might copy code datasets online, and from them, create their own version. Perhaps even naming it “Code_For_AI_Training”. This dataset would contain several thousands of code samples, giving the impression of being legitimate at first glance, but with a hidden danger. The attacker had looked for all instances of passwords in the dataset and replaced them with only 20 different ones.

Anyone can publish a dataset online

To the human eye, these passwords could look as safe as they can be. When the AI generates a database or login password that looks like this: “!jK@_LO90zzt2-!=1T?z”, a developer could easily decide to keep it. He wouldn’t know that this is one of the hacker’s 20 passwords. When the company publishes their site or product, all the hacker has to do is try those 20 passwords, and one of them will get them in.

To summarize, the steps to perform a Data Poisoning attack are as follows:

  • Identify the datasets used by your target.
  • Create an attractive dataset, or overwrite the dataset used by your target with a similar one. A dataset changed in such a way that it allows for the behavior you want.
  • Wait for the target to train their model and start putting it in production and then exploit it.

The limitations and possibilities of this type of attack (as with many AI attacks) go hand in hand with the capabilities and limitations of the model itself. Two examples of this would be:

  • Creating datasets that fool visual models for assisted driving into thinking that stop signs are green lights.
  • Tricking a real-estate AI evaluator could making conclude that the property you wish to buy is worth almost nothing.

How to Mitigate Risk from Poisoned AI Data?

AI data poisoning, as demonstrated, poses a serious threat. A single attack can cause significant damage so what strategies can be adopted to mitigate the risk? There are a few:

  • Data Validation: Any dataset used in AI training should be validated. Doubly so if it comes from a public or untrustworthy source. This includes cross-checking data from multiple sources, using automated anomaly detection tools, and incorporating human oversight to catch discrepancies that automated systems might miss.
  • Implement Differential Privacy: Differential privacy allows the reduction of a single datapoint’s influence in a model and helps abstract it. Not only is this technique a good way to enhance the privacy of your training data, but it also makes sure that a small amount of malicious data cannot disproportional affect the model’s output. You can read more about it here: Differential Privacy Explained
  • Implement Data Provenance: Data provenance allows you to track what data reduces the model’s efficiency via metadata, helping in sorting out more overt manipulation and poisoning. You can also use Data provenance to comply with data protection regulations, such as GDPR and CCPA, since it provides a record of how data has been collected, processed, and stored. You can read more about it and its implementation in this paper here: Mitigating Poisoning Attacks on Machine Learning Models: A Data Provenance-Based Approach
  • Implement Federated Learning Safeguards: In federated learning environments, implement protocols that verify the integrity of updates from participants. Techniques like secure aggregation, where updates are aggregated in a way that prevents any single participant from significantly altering the model, can be effective. You can read more about it here: Practical Secure Aggregation for Federated Learning
  • Regular Model and Data Audits: Conducting regular audits of AI models, their environments, and their datasets, especially after updates or retraining sessions, can help identify a possible data poisoning attack.

References: