598 research outputs found
Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech
In this paper, we evaluate the vulnerability of a speaker verification
(SV) system to synthetic speech. Although this problem
was first examined over a decade ago, dramatic improvements
in both SV and speech synthesis have renewed interest in
this problem. We use a HMM-based speech synthesizer, which
creates synthetic speech for a targeted speaker through adaptation
of a background model and a GMM-UBM-based SV system.
Using 283 speakers from the Wall-Street Journal (WSJ)
corpus, our SV system has a 0.4% EER. When the system
is tested with synthetic speech generated from speaker models
derived from the WSJ journal corpus, 90% of the matched
claims are accepted. This result suggests a possible vulnerability
in SV systems to synthetic speech. In order to detect
synthetic speech prior to recognition, we investigate the
use of an automatic speech recognizer (ASR), dynamic-timewarping
(DTW) distance of mel-frequency cepstral coefficients
(MFCC), and previously-proposed average inter-frame difference
of log-likelihood (IFDLL). Overall, while SV systems
have impressive accuracy, even with the proposed detector,
high-quality synthetic speech can lead to an unacceptably high
acceptance rate of synthetic speakers
Use of the harmonic phase in synthetic speech detection
Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)
Use of the harmonic phase in synthetic speech detection
Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)
Revisiting the security of speaker verification systems against imposture using synthetic speech
In this paper, we investigate imposture using synthetic speech.
Although this problem was first examined over a decade ago,
dramatic improvements in both speaker verification (SV) and
speech synthesis have renewed interest in this problem. We
use a HMM-based speech synthesizer which creates synthetic
speech for a targeted speaker through adaptation of a background
model. We use two SV systems: standard GMMUBM-
based and a newer SVM-based. Our results show when
the systems are tested with human speech, there are zero false
acceptances and zero false rejections. However, when the systems
are tested with synthesized speech, all claims for the targeted
speaker are accepted while all other claims are rejected.
We propose a two-step process for detection of synthesized
speech in order to prevent this imposture. Overall, while SV
systems have impressive accuracy, even with the proposed detector,
high-quality synthetic speech will lead to an unacceptably
high false acceptance rate
Synthetic speech detection using phase information
Taking advantage of the fact that most of the speech processing techniques neglect the phase information, we seek to detect phase perturbations in order to prevent synthetic impostors attacking Speaker Verification systems. Two Synthetic Speech Detection (SSD) systems that use spectral phase related information are reviewed and evaluated in this work: one based on the Modified Group Delay (MGD), and the other based on the Relative Phase Shift, (RPS). A classical module-based MFCC system is also used as baseline. Different training strategies are proposed and evaluated using both real spoofing samples and copy-synthesized signals from the natural ones, aiming to alleviate the issue of getting real data to train the systems. The recently published ASVSpoof2015 database is used for training and evaluation. Performance with completely unrelated data is also checked using synthetic speech from the Blizzard Challenge as evaluation material. The results prove that phase information can be successfully used for the SSD task even with unknown attacks.This work has been partially supported by the Basque Government (ElkarOla Project, KK-2015/00,098) and the Spanish Ministry of Economy and Competitiveness (Restore project, TEC2015-67,163-C2-1-R)
- …