496 research outputs found

    Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

    Get PDF
    In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features

    Bayesian analysis of fingerprint, face and signature evidences with automatic biometric systems

    Full text link
    This is the author’s version of a work that was accepted for publication in Forensic Science International. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Forensic Science International, Vol 155, Issue 2 (20 December 2005) DOI: 10.1016/j.forsciint.2004.11.007The Bayesian approach provides a unified and logical framework for the analysis of evidence and to provide results in the form of likelihood ratios (LR) from the forensic laboratory to court. In this contribution we want to clarify how the biometric scientist or laboratory can adapt their conventional biometric systems or technologies to work according to this Bayesian approach. Forensic systems providing their results in the form of LR will be assessed through Tippett plots, which give a clear representation of the LR-based performance both for targets (the suspect is the author/source of the test pattern) and non-targets. However, the computation procedures of the LR values, especially with biometric evidences, are still an open issue. Reliable estimation techniques showing good generalization properties for the estimation of the between- and within-source variabilities of the test pattern are required, as variance restriction techniques in the within-source density estimation to stand for the variability of the source with the course of time. Fingerprint, face and on-line signature recognition systems will be adapted to work according to this Bayesian approach showing both the likelihood ratios range in each application and the adequacy of these biometric techniques to the daily forensic work.This work has been partially supported under MCYT Projects TIC2000-1683, TIC2000-1669, TIC2003-09068, TIC2003-08382 and Spanish Police Force ‘‘Guardia Civil’’ Research Program

    What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison

    Get PDF
    In forensic voice comparison, it is essential to consider not only the similarity between samples, but also the typicality of the evidence in the relevant population. This is explicit within the likelihood ratio (LR) framework. A significant issue, however, is the definition of the relevant population. This paper explores the complexity of population selection for voice evidence. We evaluate the effects of population specificity in terms of regional background on LR output using combinations of the F1, F2, and F3 trajectories of the diphthong /aɪ/. LRs were computed using development and reference data which were regionally matched (Standard Southern British English) and mixed (general British English) relative to the test data. These conditions reflect the paradox that without knowing who the offender is, it is not possible to know the population of which he is a member. Results show that the more specific population produced stronger evidence and better system validity than the more general definition. However, as region-specific voice features (lower formants) were removed, the difference in the output from the matched and mixed systems was reduced. This shows that the effects of population selection are dependent on the sociolinguistic constraints on the feature analysed

    Compensation of Nuisance Factors for Speaker and Language Recognition

    Get PDF
    The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language. Index Terms—Factor anal

    Frame-level features conveying phonetic information for language and speaker recognition

    Get PDF
    150 p.This Thesis, developed in the Software Technologies Working Group of the Departmentof Electricity and Electronics of the University of the Basque Country, focuseson the research eld of spoken language and speaker recognition technologies.More specically, the research carried out studies the design of a set of featuresconveying spectral acoustic and phonotactic information, searches for the optimalfeature extraction parameters, and analyses the integration and usage of the featuresin language recognition systems, and the complementarity of these approacheswith regard to state-of-the-art systems. The study reveals that systems trained onthe proposed set of features, denoted as Phone Log-Likelihood Ratios (PLLRs), arehighly competitive, outperforming in several benchmarks other state-of-the-art systems.Moreover, PLLR-based systems also provide complementary information withregard to other phonotactic and acoustic approaches, which makes them suitable infusions to improve the overall performance of spoken language recognition systems.The usage of this features is also studied in speaker recognition tasks. In this context,the results attained by the approaches based on PLLR features are not as remarkableas the ones of systems based on standard acoustic features, but they still providecomplementary information that can be used to enhance the overall performance ofthe speaker recognition systems

    Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study

    Get PDF
    A system for speaker tracking in broadcast-news audio data is presented and the impacts of the main components of the system to the overall speaker-tracking performance are evaluated. The process of speaker tracking in continuous audio streams involves several processing tasks and is therefore treated as a multistage process. The main building blocks of such system include the components for audio segmentation, speech detection, speaker clustering and speaker identification. The aim of the first three processes is to find homogeneous regions in continuous audio streams that belong to one speaker and to join each region of the same speaker together. The task of organizing the audio data in this way is known as speaker diarization and plays an important role in various speech-processing applications. In our case the impact of speaker diarization was assessed in a speaker-tracking system by performing a comparative study of how each of the component influenced the overall speaker-detection results. The evaluation experiments were performed on broadcast-news audio data with a speaker-tracking system, which was capable of detecting 41 target speakers. We implemented several different approaches in each component of the system and compared their performances by inspecting the final speaker-tracking results. The evaluation results indicate the importance of the audio-segmentation and speech-detection components, while no significant improvement of the overall results was achieved by additionally including a speaker-clustering component to the speaker-tracking system

    Quality-Based Conditional Processing in Multi-Biometrics: Application to Sensor Interoperability

    Full text link
    As biometric technology is increasingly deployed, it will be common to replace parts of operational systems with newer designs. The cost and inconvenience of reacquiring enrolled users when a new vendor solution is incorporated makes this approach difficult and many applications will require to deal with information from different sources regularly. These interoperability problems can dramatically affect the performance of biometric systems and thus, they need to be overcome. Here, we describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign, whose aim was to compare fusion algorithms when biometric signals were generated using several biometric devices in mismatched conditions. Quality measures from the raw biometric data are available to allow system adjustment to changing quality conditions due to device changes. This system adjustment is referred to as quality-based conditional processing. The proposed fusion approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios. This allows the easy and efficient combination of matching scores from different devices assuming low dependence among modalities. In our system, quality information is used to switch between different system modules depending on the data source (the sensor in our case) and to reject channels with low quality data during the fusion. We compare our fusion approach to a set of rule-based fusion schemes over normalized scores. Results show that the proposed approach outperforms all the rule-based fusion schemes. We also show that with the quality-based channel rejection scheme, an overall improvement of 25% in the equal error rate is obtained.Comment: Published at IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Human
    corecore