140 research outputs found
GANBA: Generative Adversarial Network for Biometric Anti-Spoofing
Acknowledgments: Alejandro Gomez-Alanis holds a FPU fellowship (FPU16/05490) from the
Spanish Ministry of Education and Vocational Training. Jose A. Gonzalez-Lopez also holds a Juan
de la Cierva-Incorporación fellowship (IJCI-2017-32926) from the Spanish Ministry of Science and
Innovation. Furthermore, we acknowledge the support of Nvidia with the donation of a Titan X GPU.Data Availability Statement: The ASVspoof 2019 datasets were used in this study. They are publicly
available at https://datashare.ed.ac.uk/handle/10283/3336 (accessed on 5 December 2021).Automatic speaker verification (ASV) is a voice biometric technology whose security
might be compromised by spoofing attacks. To increase the robustness against spoofing attacks,
presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and
voice conversion-based spoofing attacks are being developed. However, it was recently shown that
adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the
whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In
this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed.
GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very
damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make
them more robust against these types of adversarial attacks. The proposed system is able to generate
adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting
PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both
original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of
the ASVspoof 2019 database were employed to carry out the experiments. The experimental results
show that the GANBA attacks are quite effective, outperforming other adversarial techniques when
applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are
more robust against both original and adversarial spoofing attacks.FEDER/Junta de Andalucía-Consejería de Transformación
Económica, Industria, Conocimiento y Universidades Proyecto PY20_00902PID2019-104206GB-I00 funded by MCIN/ AEI /10.13039/50110001103
NPLDA: A Deep Neural PLDA Model for Speaker Verification
The state-of-art approach for speaker verification consists of a neural
network based embedding extractor along with a backend generative model such as
the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose
a neural network approach for backend modeling in speaker recognition. The
likelihood ratio score of the generative PLDA model is posed as a
discriminative similarity function and the learnable parameters of the score
function are optimized using a verification cost. The proposed model, termed as
neural PLDA (NPLDA), is initialized using the generative PLDA model parameters.
The loss function for the NPLDA model is an approximation of the minimum
detection cost function (DCF). The speaker recognition experiments using the
NPLDA model are performed on the speaker verificiation task in the VOiCES
datasets as well as the SITW challenge dataset. In these experiments, the NPLDA
model optimized using the proposed loss function improves significantly over
the state-of-art PLDA based speaker verification system.Comment: Published in Odyssey 2020, the Speaker and Language Recognition
Workshop (VOiCES Special Session). Link to GitHub Implementation:
https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text
overlap with arXiv:2001.0703
INTRA-CLASS COVARIANCE ADAPTATION IN PLDA BACK-ENDS FOR SPEAKER VERIFICATION
Multi-session training conditions are becoming increasingly common in recent benchmark datasets for both text-independent and text-dependent speaker verification. In the state-of-the-art i-vector framework for speaker verification, such conditions are addressed by simple techniques such as averaging the individual i-vectors, averaging scores, or modifying the Probabilistic Linear Discriminant Analysis (PLDA) scoring hypothesis for multi-session enrollment. The aforementioned techniques fail to exploit the speaker variabilities observed in the enrollment data for target speakers. In this paper, we propose to exploit the multi-session training data by estimating a speaker-dependent covariance matrix and updating the intra-speaker covariance during PLDA scoring for each target speaker. The proposed method is further extended by combining covariance adaptation and score averaging. In this method, the individual examples of the target speaker are compared against the test data as opposed to an averaged i-vector, and the scores obtained are then averaged. The proposed methods are evaluated on the NIST SRE 2012 dataset. Relative improvements of up to 29% in equal error rate are obtained
Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation
International audienceIn this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module. We start from the standard ASV framework of the ASVspoof 2019 baseline and approach the problem from the back-end classifier based on probabilistic linear discriminant analysis. We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data in the training partition of the ASVspoof 2019 dataset. We demonstrate notable improvements on both logical and physical access scenarios, especially on the latter where the system is attacked by replayed audios, with a maximum of 36.1% and 5.3% relative improvement on bonafide and spoofed cases, respectively. We perform additional studies such as per-attack breakdown analysis, data composition, and integration with a countermeasure system at score-level with Gaussian back-end
A statistical procedure to adjust for time-interval mismatch in forensic voice comparison
The present paper describes a statistical modeling procedure that was developed to account for the fact that, in a forensic voice comparison analysis conducted for a particular case, there was a long time interval between when the questioned- and known-speaker recordings were made (six years), but in the sample of the relevant population used for training and testing the forensic voice comparison system there was a short interval (hours to days) between when each of multiple recordings of each speaker was made. The present paper also includes results of empirical validation of the procedure. Although based on a particular case, the procedure has potential for wider application given that relatively long time intervals between the recording of questioned and known speakers are not uncommon in casework
- …