Advances in Reliably Evaluating and Improving Adversarial Robustness

Abstract

Machine learning has made enormous progress in the last five to ten years. We can now make a computer, a machine, learn complex perceptual tasks from data rather than explicitly programming it. When we compare modern speech or image recognition systems to those from a decade ago, the advances are awe-inspiring. The susceptibility of machine learning systems to small, maliciously crafted adversarial perturbations is less impressive. Almost imperceptible pixel shifts or background noises can completely derail their performance. While humans are often amused by the stupidity of artificial intelligence, engineers worry about the security and safety of their machine learning applications, and scientists wonder how to make machine learning models more robust and more human-like. This dissertation summarizes and discusses advances in three areas of adversarial robustness. First, we introduce a new type of adversarial attack against machine learning models in real-world black-box scenarios. Unlike previous attacks, it does not require any insider knowledge or special access. Our results demonstrate the concrete threat caused by the current lack of robustness in machine learning applications. Second, we present several contributions to deal with the diverse challenges around evaluating adversarial robustness. The most fundamental challenge is that common attacks cannot distinguish robust models from models with misleading gradients. We help uncover and solve this problem through two new types of attacks immune to gradient masking. Misaligned incentives are another reason for insufficient evaluations. We published joint guidelines and organized an interactive competition to mitigate this problem. Finally, our open-source adversarial attacks library Foolbox empowers countless researchers to overcome common technical obstacles. Since robustness evaluations are inherently unstandardized, straightforward access to various attacks is more than a technical convenience; it promotes thorough evaluations. Third, we showcase a fundamentally new neural network architecture for robust classification. It uses a generative analysis-by-synthesis approach. We demonstrate its robustness using a digit recognition task and simultaneously reveal the limitations of prior work that uses adversarial training. Moreover, further studies have shown that our model best predicts human judgments on so-called controversial stimuli and that our approach scales to more complex datasets.

Related