8,156 research outputs found
Quality Measures for Speaker Verification with Short Utterances
The performances of the automatic speaker verification (ASV) systems degrade
due to the reduction in the amount of speech used for enrollment and
verification. Combining multiple systems based on different features and
classifiers considerably reduces speaker verification error rate with short
utterances. This work attempts to incorporate supplementary information during
the system combination process. We use quality of the estimated model
parameters as supplementary information. We introduce a class of novel quality
measures formulated using the zero-order sufficient statistics used during the
i-vector extraction process. We have used the proposed quality measures as side
information for combining ASV systems based on Gaussian mixture model-universal
background model (GMM-UBM) and i-vector. The proposed methods demonstrate
considerable improvement in speaker recognition performance on NIST SRE
corpora, especially in short duration conditions. We have also observed
improvement over existing systems based on different duration-based quality
measures.Comment: Accepted for publication in Digital Signal Processing: A Review
Journa
Quality Measures for Speaker Verification with Short Utterances
International audienceThe performances of the automatic speaker verification (ASV) systems degrade due to the reduction in amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as a supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector. Considerable improvement is found in performance metrics by the proposed system on NIST SRE corpora in short duration conditions. We have observed improvement over state-of-the-art i-vector system
Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems
Humans tend to change their way of speaking when they are immersed in a noisy
environment, a reflex known as Lombard effect. Current speech enhancement
systems based on deep learning do not usually take into account this change in
the speaking style, because they are trained with neutral (non-Lombard) speech
utterances recorded under quiet conditions to which noise is artificially
added. In this paper, we investigate the effects that the Lombard reflex has on
the performance of audio-visual speech enhancement systems based on deep
learning. The results show that a gap in the performance of as much as
approximately 5 dB between the systems trained on neutral speech and the ones
trained on Lombard speech exists. This indicates the benefit of taking into
account the mismatch between neutral and Lombard speech in the design of
audio-visual speech enhancement systems
Integration of speech biometrics in a phone payment system: text-independent speaker verification
Integration of a speaker recognition system in a payment system by phone.Nowadays, the integration of biometrics in security systems is a prominent research
and application field. Also, it is clear that speech is the most common form of
communication, which makes a swell candidate. While using speech as a biometric,
one could say there are two types of systems that should be analyzed: those systems
which do know what the speaker is going to say upon verification and those that
do not. This degree thesis offers an overview of both systems, focusing on those
that do not know what the speaker is going to say beforehand, also known as textindependent
systems. To be able to determine which would be the best approach
to integrate speech biometrics into a security system, both types of systems are
compared; and two methodologies are also analyzed for the text-independent system.
To conclude, one of those methodologies is implemented in a software library which
allows the creation a text-independent speaker verification system.En l’actualitat, la integració de biometries en els sistemes de seguretat és una branca
d’investigació i aplicacions prominent. A més a més, la veu és un dels mitjans més
comuns de comunicació, cosa que fa que sigui una bona candidata per a aquests
sistemes. Si prenem la parla com a biometria, es pot dir que hi ha dos tipus de
sistemes bastant diferenciats a analitzar: aquells sistemes els quals saben el que dirÃ
la persona que s’intenta verificar i aquells que no saben el que dirà . Aquest treball
ofereix una visió à mplia dels dos tipus de sistemes, centrant-se en els sistemes on no
es sap el que es dirà , també coneguts com sistemes de text independent. Per decidir
quin seria la millor manera d’integrar la parla com a biometria en un sistema de
seguretat, es comparen ambdós sistemes i, en el cas del sistema de text independent,
es comparen també dues metodologies diferents. Per acabar, s’implementa una
d’aquestes metodologies a unes llibreries de software per dur a terme un sistema de
verificació de locutor amb text independent.En la actualidad, la integración de biometrÃas en los sistemas de seguridad es una rama de investigación y de aplicaciones prominente. Además, está claro que la voz es el medio más común de comunicación y es por eso que es una buena candidata. Usando el habla como biometrÃa, se podrÃa decir que hay dos tipos de sistemas diferentes a analizar: aquellos sistemas que saben de antemano aquello que va a decir el locutor que intenta verificarse y aquellos que no lo saben. Este trabajo ofrece una visión amplia de los dos tipos de sistemas, centrándose en los sistemas donde aquello que se va a decir no se sabe, también conocidos como sistemas de texto independiente. Para decir cuál serÃa la mejor manera de integrar el habla como biometrÃa en un sistema de seguridad se comparan ambos sistemas y, en el caso del sistema de texto independiente, se comparan también dos metodologÃas diferentes. Para finalizar, se implementa una de estas últimas en unas librerÃas de software para poder llevar a cabo un sistema de verificación de locutor de texto independiente
- …