18,838 research outputs found
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
Integration of speech biometrics in a phone payment system: text-independent speaker verification
Integration of a speaker recognition system in a payment system by phone.Nowadays, the integration of biometrics in security systems is a prominent research
and application field. Also, it is clear that speech is the most common form of
communication, which makes a swell candidate. While using speech as a biometric,
one could say there are two types of systems that should be analyzed: those systems
which do know what the speaker is going to say upon verification and those that
do not. This degree thesis offers an overview of both systems, focusing on those
that do not know what the speaker is going to say beforehand, also known as textindependent
systems. To be able to determine which would be the best approach
to integrate speech biometrics into a security system, both types of systems are
compared; and two methodologies are also analyzed for the text-independent system.
To conclude, one of those methodologies is implemented in a software library which
allows the creation a text-independent speaker verification system.En l’actualitat, la integració de biometries en els sistemes de seguretat és una branca
d’investigació i aplicacions prominent. A més a més, la veu és un dels mitjans més
comuns de comunicació, cosa que fa que sigui una bona candidata per a aquests
sistemes. Si prenem la parla com a biometria, es pot dir que hi ha dos tipus de
sistemes bastant diferenciats a analitzar: aquells sistemes els quals saben el que dirÃ
la persona que s’intenta verificar i aquells que no saben el que dirà . Aquest treball
ofereix una visió à mplia dels dos tipus de sistemes, centrant-se en els sistemes on no
es sap el que es dirà , també coneguts com sistemes de text independent. Per decidir
quin seria la millor manera d’integrar la parla com a biometria en un sistema de
seguretat, es comparen ambdós sistemes i, en el cas del sistema de text independent,
es comparen també dues metodologies diferents. Per acabar, s’implementa una
d’aquestes metodologies a unes llibreries de software per dur a terme un sistema de
verificació de locutor amb text independent.En la actualidad, la integración de biometrÃas en los sistemas de seguridad es una rama de investigación y de aplicaciones prominente. Además, está claro que la voz es el medio más común de comunicación y es por eso que es una buena candidata. Usando el habla como biometrÃa, se podrÃa decir que hay dos tipos de sistemas diferentes a analizar: aquellos sistemas que saben de antemano aquello que va a decir el locutor que intenta verificarse y aquellos que no lo saben. Este trabajo ofrece una visión amplia de los dos tipos de sistemas, centrándose en los sistemas donde aquello que se va a decir no se sabe, también conocidos como sistemas de texto independiente. Para decir cuál serÃa la mejor manera de integrar el habla como biometrÃa en un sistema de seguridad se comparan ambos sistemas y, en el caso del sistema de texto independiente, se comparan también dos metodologÃas diferentes. Para finalizar, se implementa una de estas últimas en unas librerÃas de software para poder llevar a cabo un sistema de verificación de locutor de texto independiente
Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues
Over the last few years, a rapidly increasing number of Internet-of-Things
(IoT) systems that adopt voice as the primary user input have emerged. These
systems have been shown to be vulnerable to various types of voice spoofing
attacks. Existing defense techniques can usually only protect from a specific
type of attack or require an additional authentication step that involves
another device. Such defense strategies are either not strong enough or lower
the usability of the system. Based on the fact that legitimate voice commands
should only come from humans rather than a playback device, we propose a novel
defense strategy that is able to detect the sound source of a voice command
based on its acoustic features. The proposed defense strategy does not require
any information other than the voice command itself and can protect a system
from multiple types of spoofing attacks. Our proof-of-concept experiments
verify the feasibility and effectiveness of this defense strategy.Comment: Proceedings of the 27th International Conference on Computer
Communications and Networks (ICCCN), Hangzhou, China, July-August 2018. arXiv
admin note: text overlap with arXiv:1803.0915
Security in Voice Authentication
We evaluate the security of human voice password databases from an information theoretical point of view. More specifically, we provide a theoretical estimation on the amount of entropy in human voice when processed using the conventional GMM-UBM technologies and the MFCCs as the acoustic features. The theoretical estimation gives rise to a methodology for analyzing the security level in a corpus of human voice. That is, given a database containing speech signals, we provide a method for estimating the relative entropy (Kullback-Leibler divergence) of the database thereby establishing the security level of the speaker verification system. To demonstrate this, we analyze the YOHO database, a corpus of voice samples collected from 138 speakers and show that the amount of entropy extracted is less than 14-bits. We also present a practical attack that succeeds in impersonating the voice of any speaker within the corpus with a 98% success probability with as little as 9 trials. The attack will still succeed with a rate of 62.50% if 4 attempts are permitted. Further, based on the same attack rationale, we mount an attack on the ALIZE speaker verification system. We show through experimentation that the attacker can impersonate any user in the database of 69 people with about 25% success rate with only 5 trials. The success rate can achieve more than 50% by increasing the allowed authentication attempts to 20. Finally, when the practical attack is cast in terms of an entropy metric, we find that the theoretical entropy estimate almost perfectly predicts the success rate of the practical attack, giving further credence to the theoretical model and the associated entropy estimation technique
- …