From the course: Security Risks in AI and Machine Learning: Categorizing Attacks and Failure Modes

Perturbation attacks and AUPs

- If you've ever taken a long drive while listening to the radio, you know how frustrating it can be when enough static or noise interferes that you can't even recognize your favorite song. ML is vulnerable to noise too, and when an attacker or other entity introduces noise into the data on purpose, that's known as adversarial perturbation. In machine learning, it occurs when noise is introduced, causing the model to misclassify the outcome, just as you might not be able to identify your favorite song if there's enough static interference on the radio or be able to see road markers if your vision is blurred by heavy rain. So why is the introduction of noise to data a potential failure mode? Well, let's say you're an oncologist who is using an ML classifying tool to assess whether or not a radiograph or X-ray indicates that a patient has a cancerous or a benign mass. If an attacker wanted to interfere with the classification activity and cause the doctor to make an inaccurate diagnosis, the attacker could add perturbation to make a cancerous mass look benign. Going upstream in the process to when the model is being trained, an adversary might train the model with perturbed images to ensure that all or most of the classifications fail to be accurate. Let's see this in action using some research from Google. Here's a photo of a panda that looks to a human like a panda. The example machine learning algorithm classifies it as a panda with 57.7% confidence, and that's accurate. Now, if the adversary perturbs the image by adding a vector that is not visible to humans but is visible to machines, the machine learning algorithm now classifies it as a gibbon with 99.3% confidence, and it still looks like a panda to us. Now, finding the perfect perturbation for each image may be time-intensive for attackers. However, researchers have found a perturbation shortcut called universal adversarial perturbations, or UAPs. When the same image perturbation here illustrated in the center of the graphic is added to any image in a dataset shown on the left, the model misclassifies that image, as shown on the right. So just as that panda was a gibbon, now a joystick is a Chihuahua. A big question is, can UAPs go beyond images to cause failure in other kinds of ML systems? Well, researchers are looking at how they could be used to evade anti-malware systems that use ML. Malware systems that leverage machine learning are often advertised as superior to solutions that use older technologies such as pattern matching and regular expressions. However, an adversary that can successfully use UAPs against an ML-based malware detection model may be able to evade detection entirely. To test this ideas, researchers injected specially crafted UAP code into malicious binaries and submitted them to the malware classifiers. This technique succeeded. It resulted in a 30% evasion rate, rendering the machine learning-based anti-malware solution significantly less effective.

Contents