Search CORE

8,156 research outputs found

Quality Measures for Speaker Verification with Short Utterances

Author: Poddar Arnab
Saha Goutam
Sahidullah Md
Publication venue
Publication date: 01/01/2019
Field of study

The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector. The proposed methods demonstrate considerable improvement in speaker recognition performance on NIST SRE corpora, especially in short duration conditions. We have also observed improvement over existing systems based on different duration-based quality measures.Comment: Accepted for publication in Digital Signal Processing: A Review Journa

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Quality Measures for Speaker Verification with Short Utterances

Author: Poddar Arnab
Saha Goutam
Sahidullah Md
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

International audienceThe performances of the automatic speaker verification (ASV) systems degrade due to the reduction in amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as a supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector. Considerable improvement is found in performance metrics by the proposed system on NIST SRE corpora in short duration conditions. We have observed improvement over state-of-the-art i-vector system

INRIA a CCSD electronic archive server

Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

Author: Jensen Jesper
Michelsanti Daniel
Sigurdsson Sigurdur
Tan Zheng-Hua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/11/2018
Field of study

Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect. Current speech enhancement systems based on deep learning do not usually take into account this change in the speaking style, because they are trained with neutral (non-Lombard) speech utterances recorded under quiet conditions to which noise is artificially added. In this paper, we investigate the effects that the Lombard reflex has on the performance of audio-visual speech enhancement systems based on deep learning. The results show that a gap in the performance of as much as approximately 5 dB between the systems trained on neutral speech and the ones trained on Lombard speech exists. This indicates the benefit of taking into account the mismatch between neutral and Lombard speech in the design of audio-visual speech enhancement systems

arXiv.org e-Print Archive

Crossref

VBN

Integration of speech biometrics in a phone payment system: text-independent speaker verification

Author: Barón Garcia Anna
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/09/2016
Field of study

Integration of a speaker recognition system in a payment system by phone.Nowadays, the integration of biometrics in security systems is a prominent research and application field. Also, it is clear that speech is the most common form of communication, which makes a swell candidate. While using speech as a biometric, one could say there are two types of systems that should be analyzed: those systems which do know what the speaker is going to say upon verification and those that do not. This degree thesis offers an overview of both systems, focusing on those that do not know what the speaker is going to say beforehand, also known as textindependent systems. To be able to determine which would be the best approach to integrate speech biometrics into a security system, both types of systems are compared; and two methodologies are also analyzed for the text-independent system. To conclude, one of those methodologies is implemented in a software library which allows the creation a text-independent speaker verification system.En l’actualitat, la integració de biometries en els sistemes de seguretat és una branca d’investigació i aplicacions prominent. A més a més, la veu és un dels mitjans més comuns de comunicació, cosa que fa que sigui una bona candidata per a aquests sistemes. Si prenem la parla com a biometria, es pot dir que hi ha dos tipus de sistemes bastant diferenciats a analitzar: aquells sistemes els quals saben el que dirà la persona que s’intenta verificar i aquells que no saben el que dirà. Aquest treball ofereix una visió àmplia dels dos tipus de sistemes, centrant-se en els sistemes on no es sap el que es dirà, també coneguts com sistemes de text independent. Per decidir quin seria la millor manera d’integrar la parla com a biometria en un sistema de seguretat, es comparen ambdós sistemes i, en el cas del sistema de text independent, es comparen també dues metodologies diferents. Per acabar, s’implementa una d’aquestes metodologies a unes llibreries de software per dur a terme un sistema de verificació de locutor amb text independent.En la actualidad, la integración de biometrías en los sistemas de seguridad es una rama de investigación y de aplicaciones prominente. Además, está claro que la voz es el medio más común de comunicación y es por eso que es una buena candidata. Usando el habla como biometría, se podría decir que hay dos tipos de sistemas diferentes a analizar: aquellos sistemas que saben de antemano aquello que va a decir el locutor que intenta verificarse y aquellos que no lo saben. Este trabajo ofrece una visión amplia de los dos tipos de sistemas, centrándose en los sistemas donde aquello que se va a decir no se sabe, también conocidos como sistemas de texto independiente. Para decir cuál sería la mejor manera de integrar el habla como biometría en un sistema de seguridad se comparan ambos sistemas y, en el caso del sistema de texto independiente, se comparan también dos metodologías diferentes. Para finalizar, se implementa una de estas últimas en unas librerías de software para poder llevar a cabo un sistema de verificación de locutor de texto independiente

UPCommons. Portal del coneixement obert de la UPC