Towards Deep Learning Models Resistant to Adversarial Attacks

Authors: Aleksander Madry, Aleksander Makelov, Ludwig Schmidt, Dmitris Tsipras, Adrian Vladu

Summary of Paper

With adversaial attacks being considered a potentially inherent weakness in neural networks, this paper studies and optimizes the robustness of neural networks with respect to adversarial attacks. Through the authors' efforts, a reliable and "universal" solution is presented which significantly improves the resistance to a wide range of adversarial attacks.

Slide Outlines

Adversial Attacks

The presenter begins with an introduction as to how adversaial attacks are conducted. Presenting a case wherein an input image is put through a CNN model to distinguish between two images: a gibbon and a panda.

Adversial Attacks (Continued)

By introducing noise into this model, attackers can successfully trick the model into confidently giving the wrong answer.

The presenter takes this time to show the logical flow of the presentation as it pertains to the paper.

Introduction

The severity of adversarial attacks is described here. In this slide, the presenter notes that even small changes to the input image can fool state-of-the-art neural networks. He also mentions a few related works, alluding even to a few of the earlier presentations covered in this course.

Optimization View on Adversarial Robustness

On this slide, robustness is converted into an optimization problem. More specifically, this becomes a loss problem where the authors attempt to minimize risk. The new goal is to consider new adversarial inputs for training to augment the defense against future attacks.

Optimization View on Adversarial Robustness (Continued)

The optimization problem becomes a saddle point problem such that it contains an inner maximization and an outer minimization problem. The problem is to find a parameter, zeta, which can yield a managed risk as well as a model which is robust against attacks.

A Unified View on Attacks and Defenses

The presenter explains the methods presented in the paper. The authors use two methods: Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) to add to adversarial triaing inputs. The presenter explains how each variable is attributed to the equations. Most notable of these is epsilon which can neither be too small nor too large, otherwise it will skew the method.

A Unified View on Attacks and Defenses (Continued)

At this time, the presenter makes note that the PGD method is the main contribution of the paper. He explains this alongside an explanation of an iteration of the PGD equation as it compares to the mathematics behind FGSM. He contends that PGD is more powerful because it iteratively reines the adversarial example over multiple steps. Put simply, the model improves over each iteration.

Poll

The presenter asks the class about the nature of the saddle problem: If the maximum point in the system goes down, will the other points, including the maximum, rise? Group 1 responds that they do not believe that would be the case, and the presenter agrees explainining that through PGD, a total convergence is eventually achieved.

Poll (Continued)

The presenter asks a similar question relating to the optimization challenge: Will Point A go down but Point B rise? He reiterates that convergence occurs through all points in PGD.

Towards Universally Robust Networks?

This slide covers the challenges of the Saddle Point Problem. In outer minimization, non-convex optimizations must reduce loss. In inner maximizations, non-concave optimizations identify worst-case attacks. To mitigate these challenges, the authors proposed a tractable structure of local maxima whether the local maxima is deducible or not.

Dataset Used

The presenter shows the two datasets used for experimentation: MNIST and CIFAR10.

The Landscape of Adversarial Examples

Turning to the inner maximization problem, the presenter describes a figure of the loss landscape using PGD. The figure describes experiments utilizing cross entropy loss while creating an adversarial example. In both datasets, the loss value is exponentially minimized.

First Order Adversaries

The presenter explains that "first order" refers to a catalog that adversarial attacks use in generation. Adversarial attacks rely on these first order adversaries to complete the attack. From the graph presented, the presenter shows that for both MNIST and CIFAR10, the PDG adversarially trained networks outperform the naturally trained networks.

Descent Directions for Adversarial Training

Next, attention is given to the outer minimization problem. Here, the presenter notes that Standard Gradient Descent works for adversaial training and provides a graph showing its use in minimizing "adversarial loss" through parameters. In both instances, the loss value is minimized by a factor of at least 10 across 75,000 iterations.

Network Capacity and Adversarial Robustness

Classifying examples with decision boundaries is another challenge which needs adressing in this case. Separating adversaial decision boundaries becomes much more complicated as opposed to those of natural models.

Network Capacity and Adversarial Robustness (Continued)

To help relieve this problem, the presenter shows that adjusting the capacity scale significantly affects the accuracy of the classifier.

Network Capacity and Adversarial Robustness (Continued)

The presenter covers some tradeoffs with the capacity experiments, describing that an increased capacity decreases the value point of the saddle problem and decreases transferability of the model.

Experiments: Adversarially Robust Deep Learning Models?

The graph presented on this slide demonstrates the effectiveness of PGD as it performs alongside several other training methods. This is done with a white-box attack which is the most efficient adversarial attack.

Experiments: Adversarially Robust Deep Learning Models? (Continued)

Another graph here shows similar findings to the previous slide. The presenter notes that the least amount of accuracy is desired in this instance, which PGD almost always achieves throughout experiments. This is because low accuracy in this instance means that the model is less gullible to adversarial attacks.

Experiments: Adversarially Robust Deep Learning Models? (Continued)

Finally, the class is shown a series of resistance charts with respect to epsilon and l2-bounded attacks. These charts show a drastic decrease in accuracy as epsilon increases. This drop is sharper in MNIST models than CIFAR10 models, but significant in both datasets.

Conclusion

The presenter concludes the presentation by summarizing the key findings, the unexpectedly regular optimization structure, and the achievements in adversarial attack robustness.

Discussions

Discussion 1: Why does [PGD] have a different performance for adversarial training methods on MNIST and CIFAR10 datasets?

Discussion 2: How relevant do you think Adversarial Robustness is in Real-World Applications for CPS?

Questions

Q1: Group 8 asked: What does capacity relate to in the model?

Presenter: It describes the amount of parameters. The capacity is low so that the adversaial training model can handle the distrubance.

Q2: Group 8 asked: Is there a sampling for every data point? Wouldn't it be expensive to compute all of the data points in that case?

Presenter: The authors mentioned that the presented iteration handles larger datasets in a good timeframe as well.

Towards Deep Learning Models Resistant to Adversarial Attacks

Summary of Paper

Slide Outlines

Adversial Attacks

Adversial Attacks (Continued)

Contents

Introduction

Optimization View on Adversarial Robustness

Optimization View on Adversarial Robustness (Continued)

A Unified View on Attacks and Defenses

A Unified View on Attacks and Defenses (Continued)

Poll

Poll (Continued)

Towards Universally Robust Networks?

Dataset Used

The Landscape of Adversarial Examples

First Order Adversaries

Descent Directions for Adversarial Training

Network Capacity and Adversarial Robustness

Network Capacity and Adversarial Robustness (Continued)

Network Capacity and Adversarial Robustness (Continued)

Experiments: Adversarially Robust Deep Learning Models?

Experiments: Adversarially Robust Deep Learning Models? (Continued)

Experiments: Adversarially Robust Deep Learning Models? (Continued)

Conclusion

Discussions

Discussion 1: Why does [PGD] have a different performance for adversarial training methods on MNIST and CIFAR10 datasets?

Discussion 2: How relevant do you think Adversarial Robustness is in Real-World Applications for CPS?

Questions

Q1: Group 8 asked: What does capacity relate to in the model?

Q2: Group 8 asked: Is there a sampling for every data point? Wouldn't it be expensive to compute all of the data points in that case?