Efficient and Robust Classification for Sparse Attacks

Submitted by admin on Wed, 10/23/2024 - 01:52

Over the past two decades, the rise in adoption of neural networks has surged in parallel with their performance. Concurrently, we have observed the inherent fragility of these prediction models: small changes to the inputs can induce classification errors across entire datasets. In the following study, we examine perturbations constrained by the $\ell _{0}$ –norm, a potent attack model in the domains of computer vision, malware detection, and natural language processing. To combat this adversary, we introduce a novel defense technique comprised of two components: “truncation” and “adversarial training”. Subsequently, we conduct a theoretical analysis of the Gaussian mixture setting and establish the asymptotic optimality of our proposed defense. Based on this obtained insight, we broaden the application of our technique to neural networks. Lastly, we empirically validate our results in the domain of computer vision, demonstrating substantial enhancements in the robust classification error of neural networks.

Information- Theoretic Methods for Trustworthy and Reliable Machine Learning

READ ON IEEE Xplore

Mark Beliaev

Payam Delgosha

Hamed Hassani

Ramtin Pedarsani