1,676 research outputs found

    Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects

    Full text link
    Audio has become an increasingly crucial biometric modality due to its ability to provide an intuitive way for humans to interact with machines. It is currently being used for a range of applications, including person authentication to banking to virtual assistants. Research has shown that these systems are also susceptible to spoofing and attacks. Therefore, protecting audio processing systems against fraudulent activities, such as identity theft, financial fraud, and spreading misinformation, is of paramount importance. This paper reviews the current state-of-the-art techniques for detecting audio spoofing and discusses the current challenges along with open research problems. The paper further highlights the importance of considering the ethical and privacy implications of audio spoofing detection systems. Lastly, the work aims to accentuate the need for building more robust and generalizable methods, the integration of automatic speaker verification and countermeasure systems, and better evaluation protocols.Comment: Accepted in IJCAI 202

    Secure Automatic Speaker Verification Systems

    Get PDF
    The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research

    Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

    Full text link
    It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go.Comment: This work has been accepted by the 25th International Conference on Pattern Recognition (ICPR2020

    Multi-Level Liveness Verification for Face-Voice Biometric Authentication

    Get PDF
    In this paper we present the details of the multilevel liveness verification (MLLV) framework proposed for realizing a secure face-voice biometric authentication system that can thwart different types of audio and video replay attacks. The proposed MLLV framework based on novel feature extraction and multimodal fusion approaches, uncovers the static and dynamic relationship between voice and face information from speaking faces, and allows multiple levels of security. Experiments with three different speaking corpora VidTIMIT, UCBN and AVOZES shows a significant improvement in system performance in terms of DET curves and equal error rates(EER) for different types of replay and synthesis attacks

    AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

    Full text link
    It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, we developed an open-source adversarial attack dataset for speaker verification research. As an initial step, we focused on the over-the-air attack. An over-the-air adversarial attack involves a perturbation generation algorithm, a loudspeaker, a microphone, and an acoustic environment. The variations in the recording configurations make it very challenging to reproduce previous research. The AdvSV dataset is constructed using the Voxceleb1 Verification test set as its foundation. This dataset employs representative ASV models subjected to adversarial attacks and records adversarial samples to simulate over-the-air attack settings. The scope of the dataset can be easily extended to include more types of adversarial attacks. The dataset will be released to the public under the CC BY-SA 4.0. In addition, we also provide a detection baseline for reproducible research.Comment: Accepted by ICASSP202

    GANBA: Generative Adversarial Network for Biometric Anti-Spoofing

    Get PDF
    Acknowledgments: Alejandro Gomez-Alanis holds a FPU fellowship (FPU16/05490) from the Spanish Ministry of Education and Vocational Training. Jose A. Gonzalez-Lopez also holds a Juan de la Cierva-Incorporación fellowship (IJCI-2017-32926) from the Spanish Ministry of Science and Innovation. Furthermore, we acknowledge the support of Nvidia with the donation of a Titan X GPU.Data Availability Statement: The ASVspoof 2019 datasets were used in this study. They are publicly available at https://datashare.ed.ac.uk/handle/10283/3336 (accessed on 5 December 2021).Automatic speaker verification (ASV) is a voice biometric technology whose security might be compromised by spoofing attacks. To increase the robustness against spoofing attacks, presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and voice conversion-based spoofing attacks are being developed. However, it was recently shown that adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed. GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make them more robust against these types of adversarial attacks. The proposed system is able to generate adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of the ASVspoof 2019 database were employed to carry out the experiments. The experimental results show that the GANBA attacks are quite effective, outperforming other adversarial techniques when applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are more robust against both original and adversarial spoofing attacks.FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades Proyecto PY20_00902PID2019-104206GB-I00 funded by MCIN/ AEI /10.13039/50110001103
    corecore