Adversarial Learning

Invisibility cloak

Overview

This paper studies the art and science of creating adversarial attacks on object detectors. Most work on real-world adversarial attacks has focused on classifiers, which assign a holistic label to an entire image, rather than detectors which localize objects within an image. Detectors work by considering thousands of “priors” (potential bounding boxes) within the image with different locations, sizes, and aspect ratios. To fool an object detector, an adversarial example must fool every prior in the image, which is much more difficult than fooling the single output of a classifier.

Continue reading

Attacks on copyright systems

Overview

Copyright detection systems are among the most widely used machine learning systems in industry, and the security of these systems is of foundational importance to some of the largest companies in the world. Examples include YouTube’s Content ID, which has resulted in more than 3 billion dollars in revenue for copyright holders, and Google Jigsaw, which has been developed to detect and remove videos that promote terrorism or jeopardized national security.

Continue reading

Adversarial training for FREE!

“Adversarial training,” in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters.

Our “free” adversarial training algorithm is comparable to state-of-the-art methods on CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods.

Continue reading

Are adversarial examples inevitable?

A number of adversarial attacks on neural networks have been recently proposed. To counter these attacks, a number of authors have proposed a range of defenses. However, these defenses are often quickly broken by new and revised attacks. Given the lack of success at generating robust defenses, we are led to ask a fundamental question: Are adversarial attacks inevitable?

We identify a broad class of problems for which adversarial examples cannot be avoided. We also derive fundamental limits on the susceptibility of a classifier to adversarial attacks that depend on properties of the data distribution as well as the dimensionality of the dataset.

Continue reading

Poison Frogs! Targeted Poisoning Attacks on Neural Networks

What are poisoning attacks?

Before deep learning algorithms can be deployed in security-critical applications, their robustness against adversarial attacks must be put to the test. The existence of adversarial examples in deep neural networks (DNNs) has triggered debates on how secure these classifiers are. Adversarial examples fall within a category of attacks called evasion attacks. Evasion attacks happen at test time – a clean target instance is modified to avoid detection by a classifier, or spur misclassification.

Continue reading