101 research outputs found

    DNN Filter Bank Cepstral Coefficients for Spoofing Detection

    Get PDF
    With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks

    Spoof detection using time-delay shallow neural network and feature switching

    Full text link
    Detecting spoofed utterances is a fundamental problem in voice-based biometrics. Spoofing can be performed either by logical accesses like speech synthesis, voice conversion or by physical accesses such as replaying the pre-recorded utterance. Inspired by the state-of-the-art \emph{x}-vector based speaker verification approach, this paper proposes a time-delay shallow neural network (TD-SNN) for spoof detection for both logical and physical access. The novelty of the proposed TD-SNN system vis-a-vis conventional DNN systems is that it can handle variable length utterances during testing. Performance of the proposed TD-SNN systems and the baseline Gaussian mixture models (GMMs) is analyzed on the ASV-spoof-2019 dataset. The performance of the systems is measured in terms of the minimum normalized tandem detection cost function (min-t-DCF). When studied with individual features, the TD-SNN system consistently outperforms the GMM system for physical access. For logical access, GMM surpasses TD-SNN systems for certain individual features. When combined with the decision-level feature switching (DLFS) paradigm, the best TD-SNN system outperforms the best baseline GMM system on evaluation data with a relative improvement of 48.03\% and 49.47\% for both logical and physical access, respectively

    Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features

    Get PDF

    Secure Automatic Speaker Verification Systems

    Get PDF
    The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research

    Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward

    Full text link
    Malicious actors may seek to use different voice-spoofing attacks to fool ASV systems and even use them for spreading misinformation. Various countermeasures have been proposed to detect these spoofing attacks. Due to the extensive work done on spoofing detection in automated speaker verification (ASV) systems in the last 6-7 years, there is a need to classify the research and perform qualitative and quantitative comparisons on state-of-the-art countermeasures. Additionally, no existing survey paper has reviewed integrated solutions to voice spoofing evaluation and speaker verification, adversarial/antiforensics attacks on spoofing countermeasures, and ASV itself, or unified solutions to detect multiple attacks using a single model. Further, no work has been done to provide an apples-to-apples comparison of published countermeasures in order to assess their generalizability by evaluating them across corpora. In this work, we conduct a review of the literature on spoofing detection using hand-crafted features, deep learning, end-to-end, and universal spoofing countermeasure solutions to detect speech synthesis (SS), voice conversion (VC), and replay attacks. Additionally, we also review integrated solutions to voice spoofing evaluation and speaker verification, adversarial and anti-forensics attacks on voice countermeasures, and ASV. The limitations and challenges of the existing spoofing countermeasures are also presented. We report the performance of these countermeasures on several datasets and evaluate them across corpora. For the experiments, we employ the ASVspoof2019 and VSDC datasets along with GMM, SVM, CNN, and CNN-GRU classifiers. (For reproduceability of the results, the code of the test bed can be found in our GitHub Repository

    Replay detection in voice biometrics: an investigation of adaptive and non-adaptive front-ends

    Full text link
    Among various physiological and behavioural traits, speech has gained popularity as an effective mode of biometric authentication. Even though they are gaining popularity, automatic speaker verification systems are vulnerable to malicious attacks, known as spoofing attacks. Among various types of spoofing attacks, replay attack poses the biggest threat due to its simplicity and effectiveness. This thesis investigates the importance of 1) improving front-end feature extraction via novel feature extraction techniques and 2) enhancing spectral components via adaptive front-end frameworks to improve replay attack detection. This thesis initially focuses on AM-FM modelling techniques and their use in replay attack detection. A novel method to extract the sub-band frequency modulation (FM) component using the spectral centroid of a signal is proposed, and its use as a potential acoustic feature is also discussed. Frequency Domain Linear Prediction (FDLP) is explored as a method to obtain the temporal envelope of a speech signal. The temporal envelope carries amplitude modulation (AM) information of speech resonances. Several features are extracted from the temporal envelope and the FDLP residual signal. These features are then evaluated for replay attack detection and shown to have significant capability in discriminating genuine and spoofed signals. Fusion of AM and FM-based features has shown that AM and FM carry complementary information that helps distinguish replayed signals from genuine ones. The importance of frequency band allocation when creating filter banks is studied as well to further advance the understanding of front-ends for replay attack detection. Mechanisms inspired by the human auditory system that makes the human ear an excellent spectrum analyser have been investigated and integrated into front-ends. Spatial differentiation, a mechanism that provides additional sharpening to auditory filters is one of them that is used in this work to improve the selectivity of the sub-band decomposition filters. Two features are extracted using the improved filter bank front-end: spectral envelope centroid magnitude (SECM) and spectral envelope centroid frequency (SECF). These are used to establish the positive effect of spatial differentiation on discriminating spoofed signals. Level-dependent filter tuning, which allows the ear to handle a large dynamic range, is integrated into the filter bank to further improve the front-end. This mechanism converts the filter bank into an adaptive one where the selectivity of the filters is varied based on the input signal energy. Experimental results show that this leads to improved spoofing detection performance. Finally, deep neural network (DNN) mechanisms are integrated into sub-band feature extraction to develop an adaptive front-end that adjusts its characteristics based on the sub-band signals. A DNN-based controller that takes sub-band FM components as input, is developed to adaptively control the selectivity and sensitivity of a parallel filter bank to enhance the artifacts that differentiate a replayed signal from a genuine signal. This work illustrates gradient-based optimization of a DNN-based controller using the feedback from a spoofing detection back-end classifier, thus training it to reduce spoofing detection error. The proposed framework has displayed a superior ability in identifying high-quality replayed signals compared to conventional non-adaptive frameworks. All techniques proposed in this thesis have been evaluated on well-established databases on replay attack detection and compared with state-of-the-art baseline systems
    corecore