A backdoor attack in deep learning inserts a hidden backdoor in the model to
trigger malicious behavior upon specific input patterns. Existing detection
approaches assume a metric space (for either the original inputs or their
latent representations) in which normal samples and malicious samples are
separable. We show that this assumption has a severe limitation by introducing
a novel SSDT (Source-Specific and Dynamic-Triggers) backdoor, which obscures
the difference between normal samples and malicious samples.
To overcome this limitation, we move beyond looking for a perfect metric
space that would work for different deep-learning models, and instead resort to
more robust topological constructs. We propose TED (Topological Evolution
Dynamics) as a model-agnostic basis for robust backdoor detection. The main
idea of TED is to view a deep-learning model as a dynamical system that evolves
inputs to outputs. In such a dynamical system, a benign input follows a natural
evolution trajectory similar to other benign inputs. In contrast, a malicious
sample displays a distinct trajectory, since it starts close to benign samples
but eventually shifts towards the neighborhood of attacker-specified target
samples to activate the backdoor.
Extensive evaluations are conducted on vision and natural language datasets
across different network architectures. The results demonstrate that TED not
only achieves a high detection rate, but also significantly outperforms
existing state-of-the-art detection approaches, particularly in addressing the
sophisticated SSDT attack. The code to reproduce the results is made public on
GitHub.Comment: 18 pages. To appear in IEEE Symposium on Security and Privacy 202