Robust audio anti-spoofing has been increasingly challenging due to the
recent advancements on deepfake techniques. While spectrograms have
demonstrated their capability for anti-spoofing, complementary information
presented in multi-order spectral patterns have not been well explored, which
limits their effectiveness for varying spoofing attacks. Therefore, we propose
a novel deep learning method with a spectral fusion-reconstruction strategy,
namely S2pecNet, to utilise multi-order spectral patterns for robust audio
anti-spoofing representations. Specifically, spectral patterns up to
second-order are fused in a coarse-to-fine manner and two branches are designed
for the fine-level fusion from the spectral and temporal contexts. A
reconstruction from the fused representation to the input spectrograms further
reduces the potential fused information loss. Our method achieved the
state-of-the-art performance with an EER of 0.77% on a widely used dataset:
ASVspoof2019 LA Challenge