380 research outputs found

    Use of the harmonic phase in synthetic speech detection

    Get PDF
    Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)

    Use of the harmonic phase in synthetic speech detection

    Get PDF
    Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)

    Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects

    Full text link
    Audio has become an increasingly crucial biometric modality due to its ability to provide an intuitive way for humans to interact with machines. It is currently being used for a range of applications, including person authentication to banking to virtual assistants. Research has shown that these systems are also susceptible to spoofing and attacks. Therefore, protecting audio processing systems against fraudulent activities, such as identity theft, financial fraud, and spreading misinformation, is of paramount importance. This paper reviews the current state-of-the-art techniques for detecting audio spoofing and discusses the current challenges along with open research problems. The paper further highlights the importance of considering the ethical and privacy implications of audio spoofing detection systems. Lastly, the work aims to accentuate the need for building more robust and generalizable methods, the integration of automatic speaker verification and countermeasure systems, and better evaluation protocols.Comment: Accepted in IJCAI 202

    DNN Filter Bank Cepstral Coefficients for Spoofing Detection

    Get PDF
    With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks

    Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features

    Get PDF

    Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms

    Full text link
    Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations. Specifically, spectral patterns up to second-order are fused in a coarse-to-fine manner and two branches are designed for the fine-level fusion from the spectral and temporal contexts. A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss. Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset: ASVspoof2019 LA Challenge
    corecore