45 research outputs found
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
Spoofing Detection in Voice Biometrics: Cochlear Modelling and Perceptually Motivated Features
The automatic speaker verification (ASV) system is one of the most widely adopted biometric
technology. However, ASV is vulnerable to spoofing attacks that can significantly affect its
reliability. Among the different variants of spoofing attacks, replay attacks pose a major threat as
they do not require any expert knowledge to implement and are difficult to detect. The primary focus
of this thesis is on understanding and developing biologically inspired models and techniques to
detect replay attacks.
This thesis develops a novel framework for implementing an active cochlear filter model as a frontend
spectral analyser for spoofing attack detection to leverage the remarkable sensitivity and
selectivity of the mammalian auditory system over a broad range of intensities and frequencies. In
particular, the developed model aims to mimic the active mechanism in the cochlea, enabling sharp
frequency tuning and level-dependent compression, which amplifies and tune to low energy signal
to make a broad dynamic range of signals audible. Experimental evaluations of the developed models
in the context of replay detection systems exhibit a significant performance improvement,
highlighting the potential benefits of the use of biologically inspired front ends.
In addition, since replay detection relies on the discerning channel characteristics and the effect of
the acoustic environment, acoustic cues essential for speech perception such as amplitude- and
frequency-modulation (AM, FM) features are also investigated. Finally, to capture discriminative
cues present in the temporal domain, the temporal masking psychoacoustic phenomenon in auditory
processing is exploited, and the usefulness of the masking pattern is investigated. This led to a novel
feature parameterisation which helps improve replay attack detection
Replay detection in voice biometrics: an investigation of adaptive and non-adaptive front-ends
Among various physiological and behavioural traits, speech has gained popularity as an effective mode of biometric authentication. Even though they are gaining popularity, automatic speaker verification systems are vulnerable to malicious attacks, known as spoofing attacks. Among various types of spoofing attacks, replay attack poses the biggest threat due to its simplicity and effectiveness. This thesis investigates the importance of 1) improving front-end feature extraction via novel feature extraction techniques and 2) enhancing spectral components via adaptive front-end frameworks to improve replay attack detection.
This thesis initially focuses on AM-FM modelling techniques and their use in replay attack detection. A novel method to extract the sub-band frequency modulation (FM) component using the spectral centroid of a signal is proposed, and its use as a potential acoustic feature is also discussed. Frequency Domain Linear Prediction (FDLP) is explored as a method to obtain the temporal envelope of a speech signal. The temporal envelope carries amplitude modulation (AM) information of speech resonances. Several features are extracted from the temporal envelope and the FDLP residual signal. These features are then evaluated for replay attack detection and shown to have significant capability in discriminating genuine and spoofed signals. Fusion of AM and FM-based features has shown that AM and FM carry complementary information that helps distinguish replayed signals from genuine ones. The importance of frequency band allocation when creating filter banks is studied as well to further advance the understanding of front-ends for replay attack detection.
Mechanisms inspired by the human auditory system that makes the human ear an excellent spectrum analyser have been investigated and integrated into front-ends. Spatial differentiation, a mechanism that provides additional sharpening to auditory filters is one of them that is used in this work to improve the selectivity of the sub-band decomposition filters. Two features are extracted using the improved filter bank front-end: spectral envelope centroid magnitude (SECM) and spectral envelope centroid frequency (SECF). These are used to establish the positive effect of spatial differentiation on discriminating spoofed signals. Level-dependent filter tuning, which allows the ear to handle a large dynamic range, is integrated into the filter bank to further improve the front-end. This mechanism converts the filter bank into an adaptive one where the selectivity of the filters is varied based on the input signal energy. Experimental results show that this leads to improved spoofing detection performance.
Finally, deep neural network (DNN) mechanisms are integrated into sub-band feature extraction to develop an adaptive front-end that adjusts its characteristics based on the sub-band signals. A DNN-based controller that takes sub-band FM components as input, is developed to adaptively control the selectivity and sensitivity of a parallel filter bank to enhance the artifacts that differentiate a replayed signal from a genuine signal. This work illustrates gradient-based optimization of a DNN-based controller using the feedback from a spoofing detection back-end classifier, thus training it to reduce spoofing detection error. The proposed framework has displayed a superior ability in identifying high-quality replayed signals compared to conventional non-adaptive frameworks.
All techniques proposed in this thesis have been evaluated on well-established databases on replay attack detection and compared with state-of-the-art baseline systems
Effects of Waveform PMF on Anti-Spoofing Detection
International audienceIn the context of detection of speaker recognition identity impersonation , we observed that the waveform probability mass function (PMF) of genuine speech differs from significantly of of PMF from identity theft extracts. This is true for synthesized or converted speech as well as for replayed speech. In this work, we mainly ask whether this observation has a significant impact on spoofing detection performance. In a second step, we want to reduce the distribution gap of waveforms between authentic speech and spoofing speech. We propose a genuiniza-tion of the spoofing speech (by analogy with Gaussianisation), i.e. to obtain spoofing speech with a PMF close to the PMF of genuine speech. Our genuinization is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. In the case of constant Q cep-stral coefficients (CQCC) features, the genuinization leads to a degradation of the baseline system performance by a factor of 10, which shows a potentially large impact of the distribution os waveforms on spoofing detection performance. However, by ''playing" with all configurations, we also observed different behaviors, including performance improvements in specific cases. This leads us to conclude that waveform distribution plays an important role and must be taken into account by anti-spoofing systems