Search CORE

213 research outputs found

Audio-Visual Speaker Identification using the CUAVE Database

Author: Dean David
Lucey Patrick
Sridharan Subramanian
Publication venue: AVSP '05
Publication date: 01/01/2005
Field of study

The freely available nature of the CUAVE database allows it to provide a valuable platform to form benchmarks and compare research. This paper shows that the CUAVE database can successfully be used to test speaker identifications systems, with performance comparable to existing systems implemented on other databases. Additionally, this research shows that the optimal configuration for decisionfusion of an audio-visual speaker identification system relies heavily on the video modality in all but clean speech conditions

CiteSeerX

Queensland University of Technology ePrints Archive

Multimodal Fusion of Polynomial Classifiers for Automatic Person Recognition

Author: Broun Charles C.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 17/04/2001
Field of study

With the prevalence of the information age, privacy and personalization are forefront in today\u27s society. As such, biometrics are viewed as essential components of current and evolving technological systems. Consumers demand unobtrusive and noninvasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals (in all environments) without encumbering the user with a head-mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multimodal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality–giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within a Markov random field (MRF) framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN (in the audio domain) over a range of signal-to-noise ratios

Crossref

DigitalCommons@CalPoly

Gender Classification via Lips: Static and Dynamic Features

Author: Pass Adrian
Stewart Darryl
Zhang Jianguo
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2013
Field of study

Queen's University Belfast Research Portal

University of Dundee Online Publications

Investigation into DCT Feature Selection for Visual Lip-Based Biometric Authentication

Author: Campbell-West F.
Miller P.
Stewart D.
Wright C.
Publication venue: Irish Pattern Recognition & Classification Society
Publication date: 01/01/2015
Field of study

Queen's University Belfast Research Portal

One-shot lip-based biometric authentication: extending behavioral features with authentication phrase information

Author: Grbić Ratko
Koch Brando
Publication venue
Publication date: 14/08/2023
Field of study

Lip-based biometric authentication (LBBA) is an authentication method based on a person's lip movements during speech in the form of video data captured by a camera sensor. LBBA can utilize both physical and behavioral characteristics of lip movements without requiring any additional sensory equipment apart from an RGB camera. State-of-the-art (SOTA) approaches use one-shot learning to train deep siamese neural networks which produce an embedding vector out of these features. Embeddings are further used to compute the similarity between an enrolled user and a user being authenticated. A flaw of these approaches is that they model behavioral features as style-of-speech without relation to what is being said. This makes the system vulnerable to video replay attacks of the client speaking any phrase. To solve this problem we propose a one-shot approach which models behavioral features to discriminate against what is being said in addition to style-of-speech. We achieve this by customizing the GRID dataset to obtain required triplets and training a siamese neural network based on 3D convolutions and recurrent neural network layers. A custom triplet loss for batch-wise hard-negative mining is proposed. Obtained results using an open-set protocol are 3.2% FAR and 3.8% FRR on the test set of the customized GRID dataset. Additional analysis of the results was done to quantify the influence and discriminatory power of behavioral and physical features for LBBA.Comment: 28 pages, 10 figures, 7 table

arXiv.org e-Print Archive

Automatic Speechreading with Applications to Human-Computer Interfaces

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Tied factor analysis for face recognition across large pose differences

Author: Elder James H.
Felisberti Fatima M.
Prince Simon J.D.
Warrell Jonathan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Face recognition algorithms perform very unreliably when the pose of the probe face is different from the gallery face: typical feature vectors vary more with pose than with identity. We propose a generative model that creates a one-to-many mapping from an idealized “identity” space to the observed data space. In identity space, the representation for each individual does not vary with pose. We model the measured feature vector as being generated by a pose-contingent linear transformation of the identity variable in the presence of Gaussian noise. We term this model “tied” factor analysis. The choice of linear transformation (factors) depends on the pose, but the loadings are constant (tied) for a given individual. We use the EM algorithm to estimate the linear transformations and the noise parameters from training data. We propose a probabilistic distance metric that allows a full posterior over possible matches to be established. We introduce a novel feature extraction process and investigate recognition performance by using the FERET, XM2VTS, and PIE databases. Recognition performance compares favorably with contemporary approaches

CiteSeerX

Kingston University Research Repository

Recommended from our members

Multimodal biometrics score level fusion using non-confidence information

Author: Chaw Poh C
Publication venue
Publication date: 01/01/2011
Field of study

Multimodal biometrics refers to automatic authentication methods that depend on multiple modalities of measurable physical characteristics. It alleviates most of the restrictions of single biometrics. To combine the multimodal biometrics scores, three different categories of fusion approaches including rule based, classification based and density based approaches are available. When choosing an approach, one has to consider not only the fusion performance, but also system requirements and other circumstances. In the context of verification, classification errors arise from samples in the overlapping region (or non- confidence region) between genuine users and impostors. In score space, a further separation of the samples outside the non-confidence region does not result in further verification improvements. Therefore, information contained in the non-confidence region might be useful for improving the fusion process. Up to this point, no attempts are reported in the literature that tries to enhance the fusion process using this additional information. In this work, the use of this information is explored in rule based and density based approaches mentioned above

Nottingham Trent Institutional Repository (IRep)