9 research outputs found
Overview of BTAS 2016 Speaker Anti-spoofing Competition
This paper provides an overview of the Speaker Anti-spoofing Competition organized by Biometric group at Idiap Research Institute for the IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2016). The competition used AVspoof database, which contains a comprehensive set of presentation attacks, including, (i) direct replay attacks when a genuine data is played back using a laptop and two phones (Samsung Galaxy S4 and iPhone 3G), (ii) synthesized speech replayed with a laptop, and (iii) speech created with a voice conversion algorithm, also replayed with a laptop. The paper states competition goals, describes the database and the evaluation protocol, discusses solutions for spoofing or presentation attack detection submitted by the participants, and presents the results of the evaluation
Overview of BTAS 2016 Speaker Anti-spoofing Competition
This paper provides an overview of the Speaker Anti-spoofing Competition organized by Biometric group at Idiap Research Institute for the IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS 2016). The competition used AVspoof database, which contains a comprehensive set of presentation attacks, including, (i) direct replay attacks when a genuine data is played back using a laptop and two phones (Samsung Galaxy S4 and iPhone 3G), (ii) synthesized speech replayed with a laptop, and (iii) speech created with a voice conversion algorithm, also replayed with a laptop. The paper states competition goals, describes the database and the evaluation protocol, discusses solutions for spoofing or presentation attack detection submitted by the participants, and presents the results of the evaluation
GANBA: Generative Adversarial Network for Biometric Anti-Spoofing
Acknowledgments: Alejandro Gomez-Alanis holds a FPU fellowship (FPU16/05490) from the
Spanish Ministry of Education and Vocational Training. Jose A. Gonzalez-Lopez also holds a Juan
de la Cierva-IncorporaciĂłn fellowship (IJCI-2017-32926) from the Spanish Ministry of Science and
Innovation. Furthermore, we acknowledge the support of Nvidia with the donation of a Titan X GPU.Data Availability Statement: The ASVspoof 2019 datasets were used in this study. They are publicly
available at https://datashare.ed.ac.uk/handle/10283/3336 (accessed on 5 December 2021).Automatic speaker verification (ASV) is a voice biometric technology whose security
might be compromised by spoofing attacks. To increase the robustness against spoofing attacks,
presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and
voice conversion-based spoofing attacks are being developed. However, it was recently shown that
adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the
whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In
this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed.
GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very
damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make
them more robust against these types of adversarial attacks. The proposed system is able to generate
adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting
PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both
original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of
the ASVspoof 2019 database were employed to carry out the experiments. The experimental results
show that the GANBA attacks are quite effective, outperforming other adversarial techniques when
applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are
more robust against both original and adversarial spoofing attacks.FEDER/Junta de AndalucĂa-ConsejerĂa de TransformaciĂłn
EconĂłmica, Industria, Conocimiento y Universidades Proyecto PY20_00902PID2019-104206GB-I00 funded by MCIN/ AEI /10.13039/50110001103
On the Use of Convolutional Neural Networks for Speech Presentation Attack Detection
Research in the area of automatic speaker verification (ASV) has advanced enough for the industry to start using ASV systems in practical applications. However, these systems are highly vulnerable to spoofing or presentation attacks (PAs), limiting their wide deployment. Several speech-based presentation attack detection (PAD) methods have been proposed recently but most of them are based on hand crafted frequency or phase-based features. Although convolutional neural networks (CNN) have already shown breakthrough results in face recognition, little is understood whether CNNs are as effective in detecting presentation attacks in speech. In this paper, to investigate the applicability of CNNs for PAD, we consider shallow and deep examples of CNN architectures implemented using Tensorflow and compare their performances with the state of the art MFCC with GMM-based system on two large databases with presentation attacks: publicly available voicePA and proprietary BioCPqD-PA. We study the impact of increasing the depth of CNNs on the performance, and note how they perform on unknown attacks, by using one database to train and another to evaluate. The results demonstrate that CNNs are able to learn a database significantly better (increasing depth also improves the performance), compared to hand crafted features. However, CNN-based PADs still lack the ability to generalize across databases and are unable to detect unknown attacks well
Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations
Research in the area of automatic speaker verification (ASV) has been advanced enough for the industry to start using ASV systems in practical applications. However, these systems are highly vulnerable to spoofing or presentation attacks, limiting their wide deployment. Therefore, it is important to develop mechanisms that can detect such attacks, and it is equally important for these mechanisms to be seamlessly integrated into existing ASV systems for practical and attack-resistant solutions. To be practical, however, an attack detection should (i) have high accuracy, (ii) be well-generalized for different attacks, and (iii) be simple and efficient. Several audio-based presentation attack detection (PAD) methods have been proposed recently but their evaluation was usually done on a single, often obscure, database with limited number of attacks. Therefore, in this paper, we conduct an extensive study of eight state-of-the-art PAD methods and evaluate their ability to detect known and unknown attacks (e.g., in a cross-database scenario) using two major publicly available speaker databases with spoofing attacks: AVspoof and ASVspoof. We investigate whether combining several PAD systems via score fusion can improve attack detection accuracy. We also study the impact of fusing PAD systems (via parallel and cascading schemes) with two i-vector and inter-session variability based ASV systems on the overall performance in both bona fide (no attacks) and spoof scenarios. The evaluation results question the efficiency and practicality of the existing PAD systems, especially when comparing results for individual databases and cross-database data. Fusing several PAD systems can lead to a slightly improved performance; however, how to select which systems to fuse remains an open question. Joint ASV-PAD systems show a significantly increased resistance to the attacks at the expense of slightly degraded performance for bona fide scenarios