1,676 research outputs found
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
Audio has become an increasingly crucial biometric modality due to its
ability to provide an intuitive way for humans to interact with machines. It is
currently being used for a range of applications, including person
authentication to banking to virtual assistants. Research has shown that these
systems are also susceptible to spoofing and attacks. Therefore, protecting
audio processing systems against fraudulent activities, such as identity theft,
financial fraud, and spreading misinformation, is of paramount importance. This
paper reviews the current state-of-the-art techniques for detecting audio
spoofing and discusses the current challenges along with open research
problems. The paper further highlights the importance of considering the
ethical and privacy implications of audio spoofing detection systems. Lastly,
the work aims to accentuate the need for building more robust and generalizable
methods, the integration of automatic speaker verification and countermeasure
systems, and better evaluation protocols.Comment: Accepted in IJCAI 202
Secure Automatic Speaker Verification Systems
The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research
Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
It becomes urgent to design effective anti-spoofing algorithms for vulnerable
automatic speaker verification systems due to the advancement of high-quality
playback devices. Current studies mainly treat anti-spoofing as a binary
classification problem between bonafide and spoofed utterances, while lack of
indistinguishable samples makes it difficult to train a robust spoofing
detector. In this paper, we argue that for anti-spoofing, it needs more
attention for indistinguishable samples over easily-classified ones in the
modeling process, to make correct discrimination a top priority. Therefore, to
mitigate the data discrepancy between training and inference, we propose to
leverage a balanced focal loss function as the training objective to
dynamically scale the loss based on the traits of the sample itself. Besides,
in the experiments, we select three kinds of features that contain both
magnitude-based and phase-based information to form complementary and
informative features. Experimental results on the ASVspoof2019 dataset
demonstrate the superiority of the proposed methods by comparison between our
systems and top-performing ones. Systems trained with the balanced focal loss
perform significantly better than conventional cross-entropy loss. With
complementary features, our fusion system with only three kinds of features
outperforms other systems containing five or more complex single models by
22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124
and 0.55% respectively. Furthermore, we present and discuss the evaluation
results on real replay data apart from the simulated ASVspoof2019 data,
indicating that research for anti-spoofing still has a long way to go.Comment: This work has been accepted by the 25th International Conference on
Pattern Recognition (ICPR2020
Multi-Level Liveness Verification for Face-Voice Biometric Authentication
In this paper we present the details of the multilevel liveness verification (MLLV) framework proposed for realizing a secure face-voice biometric authentication system that can thwart different types of audio and video replay attacks. The proposed MLLV framework based on novel feature extraction and multimodal fusion approaches, uncovers the static and dynamic relationship between voice and face information from speaking faces, and allows multiple levels of security. Experiments with three different speaking corpora VidTIMIT, UCBN and AVOZES shows a significant improvement in system performance in terms of DET curves and equal error rates(EER) for different types of replay and synthesis attacks
AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification
It is known that deep neural networks are vulnerable to adversarial attacks.
Although Automatic Speaker Verification (ASV) built on top of deep neural
networks exhibits robust performance in controlled scenarios, many studies
confirm that ASV is vulnerable to adversarial attacks. The lack of a standard
dataset is a bottleneck for further research, especially reproducible research.
In this study, we developed an open-source adversarial attack dataset for
speaker verification research. As an initial step, we focused on the
over-the-air attack. An over-the-air adversarial attack involves a perturbation
generation algorithm, a loudspeaker, a microphone, and an acoustic environment.
The variations in the recording configurations make it very challenging to
reproduce previous research. The AdvSV dataset is constructed using the
Voxceleb1 Verification test set as its foundation. This dataset employs
representative ASV models subjected to adversarial attacks and records
adversarial samples to simulate over-the-air attack settings. The scope of the
dataset can be easily extended to include more types of adversarial attacks.
The dataset will be released to the public under the CC BY-SA 4.0. In addition,
we also provide a detection baseline for reproducible research.Comment: Accepted by ICASSP202
GANBA: Generative Adversarial Network for Biometric Anti-Spoofing
Acknowledgments: Alejandro Gomez-Alanis holds a FPU fellowship (FPU16/05490) from the
Spanish Ministry of Education and Vocational Training. Jose A. Gonzalez-Lopez also holds a Juan
de la Cierva-Incorporación fellowship (IJCI-2017-32926) from the Spanish Ministry of Science and
Innovation. Furthermore, we acknowledge the support of Nvidia with the donation of a Titan X GPU.Data Availability Statement: The ASVspoof 2019 datasets were used in this study. They are publicly
available at https://datashare.ed.ac.uk/handle/10283/3336 (accessed on 5 December 2021).Automatic speaker verification (ASV) is a voice biometric technology whose security
might be compromised by spoofing attacks. To increase the robustness against spoofing attacks,
presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and
voice conversion-based spoofing attacks are being developed. However, it was recently shown that
adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the
whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In
this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed.
GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very
damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make
them more robust against these types of adversarial attacks. The proposed system is able to generate
adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting
PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both
original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of
the ASVspoof 2019 database were employed to carry out the experiments. The experimental results
show that the GANBA attacks are quite effective, outperforming other adversarial techniques when
applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are
more robust against both original and adversarial spoofing attacks.FEDER/Junta de Andalucía-Consejería de Transformación
Económica, Industria, Conocimiento y Universidades Proyecto PY20_00902PID2019-104206GB-I00 funded by MCIN/ AEI /10.13039/50110001103
- …