Deep neural networks (DNNs) could be deceived by generating
human-imperceptible perturbations of clean samples. Therefore, enhancing the
robustness of DNNs against adversarial attacks is a crucial task. In this
paper, we aim to train robust DNNs by limiting the set of outputs reachable via
a norm-bounded perturbation added to a clean sample. We refer to this set as
adversarial polytope, and each clean sample has a respective adversarial
polytope. Indeed, if the respective polytopes for all the samples are compact
such that they do not intersect the decision boundaries of the DNN, then the
DNN is robust against adversarial samples. Hence, the inner-working of our
algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial
\textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we
demonstrate the effectiveness of CAP over existing adversarial robustness
methods in improving the robustness of models against state-of-the-art attacks
including AutoAttack.Comment: The paper has been accepted in ICASSP 202