Overview
Copyright detection systems are among the most widely used machine learning systems in industry, and the security of these systems is of foundational importance to some of the largest companies in the world. Examples include YouTube’s Content ID, which has resulted in more than 3 billion dollars in revenue for copyright holders, and Google Jigsaw, which has been developed to detect and remove videos that promote terrorism or jeopardized national security.
Despite their importance, copyright systems have gone largely unstudied by the ML security community.
It is well-known that many machine learning models are susceptible to so-called “adversarial attacks,” in which an attacker evades a classifier by making small perturbations to inputs.
The susceptibility of industrial copyright detection tools to adversarial attacks is discussed in our paper, which is available below. After discussing a range of copyright detection schemes, we present a simple attack on music recognition systems. We show that relatively small perturbations are needed to fool such a system in the white-box case. We also observe that, using larger perturbations, our attack is able to transfer to industrial systems including AudioTag and Content ID.
Proof of concept: attacking an audio fingerprinting system
Fingerprinting systems are generally proprietary, and few publications exist on effective fingerprinting methods. While many modern industrial systems likely use neural nets, a well-known fingerprinting scheme, which was developed by the creators of “Shazam,” uses hand-crafted features. In a nutshell, the algorithm converts an audio clip into a spectrogram, which is a 2D image showing the frequency content of the clip over time. Then, the locations of local maxima in the spectrogram are identified, and used for fingerprinting. A song is recognized by extracting pairs of nearby local maxima, called “hashes,” and comparing them to the hashes in a database of known songs.
To demonstrate the vulnerability of this copyright detection system, we built a differentiable “Shazam” prototype in TensorFlow, and then crafted adversarial music usingĀ a simple gradient-based attack. We chose this model as the basis of our demonstration for two reasons. First, non-adversarially-trained neural networks are known to be quite easy to attack, while this system (which uses hand-crafted features) presents a harder target. Second, this hand-crafted system is one of the few published models that is known to have seen industrial use.
We demonstrate an attack using the original (unperturbed) audio clip below.
White-box attacks
We start by demonstrating a white-box attack, in which the attacker knows the exact model being used for detection. We can fool the fingerprinting system by perturbing audio to remove fingerprint elements (i.e., the “hashes” recognized by the algorithms). In practice, the number of hashes needed to recognize a song will depend on the precision/recall tradeoff chosen by the developers of the classifier. Here’s an audio clip with 90% of the hashes removed.
Here’s an audio clip with 95% of the hashes removed.
Here’s an audio clip with 99% of the hashes removed.
Transfer attacks on the AudioTag music recognition system
We wanted to see whether our attack would transfer to black-box industrial systems. This can be done, however larger perturbations are required because our simple surrogate model is likely quite dissimilar from commercial systems (which probably rely on neural nets rather than hand-crafted features). We were able to evade detection of numerous copyrighted songs, including “Signed, Sealed, Delivered” by Stevie Wonder and “Tik Tok” by Kesha.
First, we tried attacking the online AudioTag service. As a baseline, we first try fooling the detector by adding random white noise.
Here’s a clip that contains enough random Gaussian noise to fool AudioTag:
Here’s a version that fools AudioTag using adversarial perturbations:
Transfer attacks against YouTube’s Content ID
We can also fool YouTube’s Content ID system, but using a larger perturbation. As a baseline, here’s the amount of random white noise needed to subvert Content ID.
We can fool Content ID with a much smaller adversarial perturbation, as demonstrated below.
Final remarks
Note that none of the authors of this paper are experts in audio processing or fingerprinting systems. The implementations used in this study are far from optimal, and we expect that attacks can be strengthened using sharper technical tools, including perturbation types that are less perceptible to the human ear. Furthermore, we are doing transfer attacks using fairly rudimentary surrogate models that rely on hand-crafted features, while commercial system likely rely on full trainable neural nets.
Our goal here is not to facilitate copyright evasion, but rather to raise awareness of the threats posed by adversarial examples in this space, and to highlight the importance of hardening copyright detection and content control systems to attacks. A number of defenses already exist that can be utilized for this purpose, including adversarial training.