Speaker verification systems have been widely used in smart phones and
Internet of things devices to identify legitimate users. In recent work, it has
been shown that adversarial attacks, such as FAKEBOB, can work effectively
against speaker verification systems. The goal of this paper is to design a
detector that can distinguish an original audio from an audio contaminated by
adversarial attacks. Specifically, our designed detector, called MEH-FEST,
calculates the minimum energy in high frequencies from the short-time Fourier
transform of an audio and uses it as a detection metric. Through both analysis
and experiments, we show that our proposed detector is easy to implement, fast
to process an input audio, and effective in determining whether an audio is
corrupted by FAKEBOB attacks. The experimental results indicate that the
detector is extremely effective: with near zero false positive and false
negative rates for detecting FAKEBOB attacks in Gaussian mixture model (GMM)
and i-vector speaker verification systems. Moreover, adaptive adversarial
attacks against our proposed detector and their countermeasures are discussed
and studied, showing the game between attackers and defenders