In recent years, machine learning models, especially deep neural networks,
have been widely used for classification tasks in the security domain. However,
these models have been shown to be vulnerable to adversarial manipulation:
small changes learned by an adversarial attack model, when applied to the
input, can cause significant changes in the output. Most research on
adversarial attacks and corresponding defense methods focuses only on scenarios
where adversarial samples are directly generated by the attack model. In this
study, we explore a more practical scenario in behavior-based authentication,
where adversarial samples are collected from the attacker. The generated
adversarial samples from the model are replicated by attackers with a certain
level of discrepancy. We propose an eXplainable AI (XAI) based defense strategy
against adversarial attacks in such scenarios. A feature selector, trained with
our method, can be used as a filter in front of the original authenticator. It
filters out features that are more vulnerable to adversarial attacks or
irrelevant to authentication, while retaining features that are more robust.
Through comprehensive experiments, we demonstrate that our XAI based defense
strategy is effective against adversarial attacks and outperforms other defense
strategies, such as adversarial training and defensive distillation