496 research outputs found
Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition
In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features
Bayesian analysis of fingerprint, face and signature evidences with automatic biometric systems
This is the author’s version of a work that was accepted for publication in Forensic Science International. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Forensic Science International, Vol 155, Issue 2 (20 December 2005) DOI: 10.1016/j.forsciint.2004.11.007The Bayesian approach provides a unified and logical framework for the analysis of evidence and to provide results in the form of likelihood ratios (LR) from the forensic laboratory to court. In this contribution we want to clarify how the biometric scientist or laboratory can adapt their conventional biometric systems or technologies to work according to this Bayesian approach. Forensic systems providing their results in the form of LR will be assessed through Tippett plots, which give a clear representation of the LR-based performance both for targets (the suspect is the author/source of the test pattern) and non-targets. However, the computation procedures of the LR values, especially with biometric evidences, are still an open issue. Reliable estimation techniques showing good generalization properties for the estimation of the between- and within-source variabilities of the test pattern are required, as variance restriction techniques in the within-source density estimation to stand for the variability of the source with the course of time. Fingerprint, face and on-line signature recognition systems will be adapted to work according to this Bayesian approach showing both the likelihood ratios range in each application and the adequacy of these biometric techniques to the daily forensic work.This work has been partially supported under MCYT Projects TIC2000-1683, TIC2000-1669, TIC2003-09068, TIC2003-08382 and Spanish Police Force ‘‘Guardia Civil’’ Research Program
What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison
In forensic voice comparison, it is essential to consider not only the similarity between samples, but also the typicality of the evidence in the relevant population. This is explicit within the likelihood ratio (LR) framework. A significant issue, however, is the definition of the relevant population. This paper explores the complexity of population selection for voice evidence. We evaluate the effects of population specificity in terms of regional background on LR output using combinations of the F1, F2, and F3 trajectories of the diphthong /aɪ/. LRs were computed using development and reference data which were regionally matched (Standard Southern British English) and mixed (general British English) relative to the test data. These conditions reflect the paradox that without knowing who the offender is, it is not possible to know the population of which he is a member. Results show that the more specific population produced stronger evidence and better system validity than the more general definition. However, as region-specific voice features (lower formants) were removed, the difference in the output from the matched and mixed systems was reduced. This shows that the effects of population selection are dependent on the sociolinguistic constraints on the feature analysed
Compensation of Nuisance Factors for Speaker and Language Recognition
The variability of the channel and environment is
one of the most important factors affecting the performance of
text-independent speaker verification systems. The best techniques
for channel compensation are model based. Most of them have
been proposed for Gaussian mixture models, while in the feature
domain blind channel compensation is usually performed. The
aim of this work is to explore techniques that allow more accurate
intersession compensation in the feature domain. Compensating
the features rather than the models has the advantage that the
transformed parameters can be used with models of a different
nature and complexity and for different tasks. In this paper,
we evaluate the effects of the compensation of the intersession
variability obtained by means of the channel factors approach. In
particular, we compare channel variability modeling in the usual
Gaussian mixture model domain, and our proposed feature domain
compensation technique. We show that the two approaches
lead to similar results on the NIST 2005 Speaker Recognition
Evaluation data with a reduced computation cost. We also report
the results of a system, based on the intersession compensation
technique in the feature space that was among the best participants
in the NIST 2006 Speaker Recognition Evaluation. Moreover, we
show how we obtained significant performance improvement in
language recognition by estimating and compensating, in the
feature domain, the distortions due to interspeaker variability
within the same language.
Index Terms—Factor anal
Frame-level features conveying phonetic information for language and speaker recognition
150 p.This Thesis, developed in the Software Technologies Working Group of the Departmentof Electricity and Electronics of the University of the Basque Country, focuseson the research eld of spoken language and speaker recognition technologies.More specically, the research carried out studies the design of a set of featuresconveying spectral acoustic and phonotactic information, searches for the optimalfeature extraction parameters, and analyses the integration and usage of the featuresin language recognition systems, and the complementarity of these approacheswith regard to state-of-the-art systems. The study reveals that systems trained onthe proposed set of features, denoted as Phone Log-Likelihood Ratios (PLLRs), arehighly competitive, outperforming in several benchmarks other state-of-the-art systems.Moreover, PLLR-based systems also provide complementary information withregard to other phonotactic and acoustic approaches, which makes them suitable infusions to improve the overall performance of spoken language recognition systems.The usage of this features is also studied in speaker recognition tasks. In this context,the results attained by the approaches based on PLLR features are not as remarkableas the ones of systems based on standard acoustic features, but they still providecomplementary information that can be used to enhance the overall performance ofthe speaker recognition systems
Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study
A system for speaker tracking in broadcast-news audio data is presented and the impacts of the main components of the system to the overall speaker-tracking performance are evaluated. The process of speaker tracking in continuous audio streams
involves several processing tasks and is therefore treated as a multistage process. The main building blocks of such system include the components for audio segmentation, speech detection, speaker clustering and speaker identification. The aim of the first three processes is to find homogeneous regions in continuous audio streams that belong to one speaker and to join each region of the same speaker together. The task of organizing the audio data in this way is known as speaker diarization and plays an important role in various speech-processing applications.
In our case the impact of speaker diarization
was assessed in a speaker-tracking system by performing a comparative study of how each of the component influenced the overall speaker-detection results. The evaluation experiments were performed on broadcast-news audio data with a speaker-tracking system,
which was capable of detecting 41 target speakers. We implemented several different approaches in each component of the system and compared their performances by inspecting the final speaker-tracking results. The evaluation results indicate the importance of the audio-segmentation and speech-detection components, while no significant improvement of the overall results was achieved by additionally including a speaker-clustering component to the speaker-tracking system
Quality-Based Conditional Processing in Multi-Biometrics: Application to Sensor Interoperability
As biometric technology is increasingly deployed, it will be common to
replace parts of operational systems with newer designs. The cost and
inconvenience of reacquiring enrolled users when a new vendor solution is
incorporated makes this approach difficult and many applications will require
to deal with information from different sources regularly. These
interoperability problems can dramatically affect the performance of biometric
systems and thus, they need to be overcome. Here, we describe and evaluate the
ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007
BioSecure Multimodal Evaluation Campaign, whose aim was to compare fusion
algorithms when biometric signals were generated using several biometric
devices in mismatched conditions. Quality measures from the raw biometric data
are available to allow system adjustment to changing quality conditions due to
device changes. This system adjustment is referred to as quality-based
conditional processing. The proposed fusion approach is based on linear
logistic regression, in which fused scores tend to be log-likelihood-ratios.
This allows the easy and efficient combination of matching scores from
different devices assuming low dependence among modalities. In our system,
quality information is used to switch between different system modules
depending on the data source (the sensor in our case) and to reject channels
with low quality data during the fusion. We compare our fusion approach to a
set of rule-based fusion schemes over normalized scores. Results show that the
proposed approach outperforms all the rule-based fusion schemes. We also show
that with the quality-based channel rejection scheme, an overall improvement of
25% in the equal error rate is obtained.Comment: Published at IEEE Transactions on Systems, Man, and Cybernetics -
Part A: Systems and Human
- …