Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

De Leon, P.L.; Pucher, M.; Yamagishi, Junichi

Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

Authors: P.L. De Leon
M. Pucher
Junichi Yamagishi
Publication date: 1 January 2010
Publisher

Abstract

In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis have renewed interest in this problem. We use a HMM-based speech synthesizer, which creates synthetic speech for a targeted speaker through adaptation of a background model and a GMM-UBM-based SV system. Using 283 speakers from the Wall-Street Journal (WSJ) corpus, our SV system has a 0.4% EER. When the system is tested with synthetic speech generated from speaker models derived from the WSJ journal corpus, 90% of the matched claims are accepted. This result suggests a possible vulnerability in SV systems to synthetic speech. In order to detect synthetic speech prior to recognition, we investigate the use of an automatic speech recognizer (ASR), dynamic-timewarping (DTW) distance of mel-frequency cepstral coefficients (MFCC), and previously-proposed average inter-frame difference of log-likelihood (IFDLL). Overall, while SV systems have impressive accuracy, even with the proposed detector, high-quality synthetic speech can lead to an unacceptably high acceptance rate of synthetic speakers

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 08/02/2015

Edinburgh Research Archive

oai:era.ed.ac.uk:1842/4659

Last time updated on 07/06/2021