1,095 research outputs found
Multi-Level Liveness Verification for Face-Voice Biometric Authentication
In this paper we present the details of the multilevel liveness verification (MLLV) framework proposed for realizing a secure face-voice biometric authentication system that can thwart different types of audio and video replay attacks. The proposed MLLV framework based on novel feature extraction and multimodal fusion approaches, uncovers the static and dynamic relationship between voice and face information from speaking faces, and allows multiple levels of security. Experiments with three different speaking corpora VidTIMIT, UCBN and AVOZES shows a significant improvement in system performance in terms of DET curves and equal error rates(EER) for different types of replay and synthesis attacks
Multiple classifiers in biometrics. Part 2: Trends and challenges
The present paper is Part 2 in this series of two papers. In Part 1 we provided an introduction to Multiple Classifier Systems (MCS) with a focus into the fundamentals: basic nomenclature, key elements, architecture, main methods, and prevalent theory and framework. Part 1 then overviewed the application of MCS to the particular field of multimodal biometric person authentication in the last 25 years, as a prototypical area in which MCS has resulted in important achievements. Here in Part 2 we present in more technical detail recent trends and developments in MCS coming from multimodal biometrics that incorporate context information in an adaptive way. These new MCS architectures exploit input quality measures and pattern-specific particularities that move apart from general population statistics, resulting in robust multimodal biometric systems. Similarly as in Part 1, methods here are described in a general way so they can be applied to other information fusion problems as well. Finally, we also discuss here open challenges in biometrics in which MCS can play a key roleThis work was funded by projects CogniMetrics (TEC2015-70627-R)
from MINECO/FEDER and RiskTrakc (JUST-2015-JCOO-AG-1). Part of
this work was conducted during a research visit of J.F. to Prof. Ludmila
Kuncheva at Bangor University (UK) with STSM funding from COST CA16101 (MULTI-FORESEE
Robust indoor speaker recognition in a network of audio and video sensors
AbstractSituational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method
Quality Measures for Speaker Verification with Short Utterances
The performances of the automatic speaker verification (ASV) systems degrade
due to the reduction in the amount of speech used for enrollment and
verification. Combining multiple systems based on different features and
classifiers considerably reduces speaker verification error rate with short
utterances. This work attempts to incorporate supplementary information during
the system combination process. We use quality of the estimated model
parameters as supplementary information. We introduce a class of novel quality
measures formulated using the zero-order sufficient statistics used during the
i-vector extraction process. We have used the proposed quality measures as side
information for combining ASV systems based on Gaussian mixture model-universal
background model (GMM-UBM) and i-vector. The proposed methods demonstrate
considerable improvement in speaker recognition performance on NIST SRE
corpora, especially in short duration conditions. We have also observed
improvement over existing systems based on different duration-based quality
measures.Comment: Accepted for publication in Digital Signal Processing: A Review
Journa
Speaker Recognition Based on Mutated Monarch Butterfly Optimization Configured Artificial Neural Network
Speaker recognition is the process of extracting speaker-specific details from voice waves to validate the features asserted by system users; in other words, it allows voice-controlled access to a range of services. The research initiates with extraction features from voice signals and employing those features in Artificial Neural Network (ANN) for speaker recognition. Increasing the number of hidden layers and their associated neurons reduces the training error and increases the computational process\u27s complexity. It is essential to have an optimal number of hidden layers and their corresponding, but attaining those optimal configurations through a manual or trial and the process takes time and makes the process more complex. This urges incorporating optimization approaches for finding optimal hidden layers and their corresponding neurons. The technique involve in configuring the ANN is Mutated Monarch Butterfly Optimization (MMBO). The proposed MMBO employed for configuring the ANN achieves the sensitivity of 97.5% in a real- time database that is superior to contest techniques
- …