Design and evaluation of GAN-based models for adversarial training robustness in deep learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Adversarial attacks show one of the generalization issues of current deep learning models on special distribution shifted data. The adversarial samples generated by the attack algorithm can introduce malicious behavior to any deep learning system that affects the consistency of the deep learning model. This thesis presents the design and evaluation of multiple possible component architectures of a GAN that can provide a new direction for training a robust convolution classifier. Each component is related to a different aspect of the GAN that impacts the generalization and the robustness outcomes. The best formulation can achieve around 45% accuracy under 8/255 L∞ PGD attack and 60% accuracy under 128/255 L2 PGD attack that outperforms L2 PGD adversarial training. The other contributions include the research on gradient masking, robustness transferability across the constraints and the generalization limitations.