13 research outputs found

    A simulated annealing approach to speaker segmentation in audio databases

    Get PDF
    In this paper we present a novel approach to the problem of speaker segmentation, which is an unavoidable previous step to audio indexing. Mutual information is used for evaluating the accuracy of the segmentation, as a function to be maximized by a simulated annealing (SA) algorithm. We introduce a novel mutation operator for the SA, the Consecutive Bits Mutation operator, which improves the performance of the SA in this problem. We also use the so-called Compaction Factor, which allows the SA to operate in a reduced search space. Our algorithm has been tested in the segmentation of real audio databases, and it has been compared to several existing algorithms for speaker segmentation, obtaining very good results in the test problems considered

    Offline speaker segmentation using genetic algorithms and mutual information

    Get PDF
    We present an evolutionary approach to speaker segmentation, an activity that is especially important prior to speaker recognition and audio content analysis tasks. Our approach consists of a genetic algorithm (GA), which encodes possible segmentations of an audio record, and a measure of mutual information between the audio data and possible segmentations, which is used as fitness function for the GA. We introduce a compact encoding of the problem into the GA which reduces the length of the GA individuals and improves the GA convergence properties. Our algorithm has been tested on the segmentation of real audio data, and its performance has been compared with several existing algorithms for speaker segmentation, obtaining very good results in all test problems.This work was supported in part by the Universidad de Alcalá under Project UAH PI2005/078

    Improving Single Modal and Multimodal Biometric Authentication Using F-ratio Client-Dependent Normalisation

    Get PDF
    This study investigates a new client-dependent normalisation to improve a single biometric authentication system, as well as its effects on fusion. There exists two families of client-dependent normalisation techniques, often applied to speaker authentication. They are client-dependent score and threshold normalisation techniques. Examples of the former family of techniques are Z-Norm, D-Norm and T-Norm. There is also a vast amount of literature on the latter family of techniques. Both families are surveyed in this study. Furthermore, we also provide a link between these two families of techniques and show that one is a dual representation of the other. These techniques are intended to adjust the variation across different client models. We propose ``F-ratio'' normalisation, or F-Norm, applied to face and speaker authentication systems in two contexts: single modal and fusion of multi-modal biometerics. This normalisation requires that only as few as two client-dependent accesses are available (the more the better). Different from previous normalisation techniques, F-Norm considers the client and impostor distributions simultaneously. We show that F-ratio is a natural choice because it is directly associated to Equal Error Rate. It has the effect of centering the client and impostor distributions such that a global threshold can be easily found. Another difference is that F-Norm actually ``interpolates'' between client-independent and client-dependent information by introducing two mixture parameters. These parameters can be optimised to maximise the class dispersion (the degree of separability between client and impostor distributions) while the aforementioned normalisation techniques cannot. The results of 13 single modal experiments and 32 fusion experiments carried out on the XM2VTS multimodal database show that in both contexts, F-Norm is advantageous over Z-Norm, client-dependent score normalisation with EER and no normalisation

    Compensating User-Specific Information with User-Independent Information in Biometric Authentication Tasks

    Get PDF
    Biometric authentication is a process of verifying an identity claim using a person's behavioral and physiological characteristics. This is in general a binary classification task because a system either accepts or rejects an identity claim. However, a biometric authentication system contains many users. By recognizing this fact, better decision can be made if user-specific information can be exploited. In this study, we propose to combine user-specific information with user-independent information such that the performance due to exploiting both information sources does not perform worse than either one and in some situations can improve significantly over either one. We show that this technique, motivated by a standard Bayesian framework, is applicable in two levels, i.e., fusion level where multiple (multimodal or intramodal) systems are involved, or, score normalization level, where only a single system is involved. The second approach can be considered a novel score normalization technique that combines both information sources. The fusion technique was tested on 32 fusion experiments whereas the normalization technique was tested on 13 single-system experiments. Both techniques that are originated from the same principal share a major advantage, i.e., due to prior knowledge as supported by experimental evidences, few or almost no free parameter are actually needed in order to employ the mentioned techniques. Previous works in this direction require at least 6 to 10 user-specific client accesses. However, in this work, as few as two user-specific client accesses are needed, hence overcoming the learning problem with extremely few user-specific client samples. Finally, but not the least, a non-exhaustive survey on the state-of-the-arts of incorporating user-specific information in biometric authentication is also presented

    Effects of Equipment Variations on Speaker Recognition Error Rates

    Get PDF
    The purpose of this study was to examine the effects that equipment variation has on speaker recognition performance. Specifically microphone variation is investigated. The study examines the error rates of a speaker recognition system when microphones vary between the enrollment and testing phases. The study also examines the error rates of a speaker recognition system when microphones differ in similar environments and conditions. The metric for evaluation of effect is the false identity acceptance and the false identity rejection error rates.School of Electrical & Computer Engineerin

    A non-linear polynomial approximation filter for robust speaker verification

    Get PDF
    Bibliography: leaves 101-109

    Text-independent bilingual speaker verification system.

    Get PDF
    Ma Bin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-102).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Biometrics --- p.2Chapter 1.2 --- Speaker Verification --- p.3Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4Chapter 1.4 --- Text Dependency --- p.4Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6Chapter 1.5 --- Language Dependency --- p.6Chapter 1.6 --- Normalization Techniques --- p.7Chapter 1.7 --- Objectives of the Thesis --- p.8Chapter 1.8 --- Thesis Organization --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Background Information --- p.11Chapter 2.1.1 --- Speech Signal Acquisition --- p.11Chapter 2.1.2 --- Speech Processing --- p.11Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14Chapter 2.1.5 --- Feature Parameters --- p.15Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18Chapter 2.1.5.3 --- Energy Measures --- p.20Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22Chapter 2.2 --- Common Techniques --- p.24Chapter 2.2.1 --- Template Model Matching Methods --- p.25Chapter 2.2.2 --- Statistical Model Methods --- p.26Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31Chapter 2.2.2.4 --- The Advantages of GMM --- p.32Chapter 2.2.3 --- Likelihood Scoring --- p.32Chapter 2.2.4 --- General Approach to Decision Making --- p.35Chapter 2.2.5 --- Cohort Normalization --- p.35Chapter 2.2.5.1 --- Probability Score Normalization --- p.36Chapter 2.2.5.2 --- Cohort Selection --- p.37Chapter 2.3 --- Chapter Summary --- p.38Chapter 3 --- Experimental Corpora --- p.39Chapter 3.1 --- The YOHO Corpus --- p.39Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44Chapter 3.3 --- Chapter Summary --- p.46Chapter 4 --- Text-Dependent Speaker Verification --- p.47Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48Chapter 4.2 --- Cohort Normalization Setup --- p.50Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53Chapter 4.3.1 --- Subword HMM Models --- p.53Chapter 4.3.2 --- Experimental Results --- p.55Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61Chapter 4.4.1 --- Experimental Setup --- p.61Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65Chapter 4.5 --- Comparison with Previous Systems --- p.67Chapter 4.6 --- Chapter Summary --- p.70Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71Chapter 5.1 --- Front-End Processing of the CUBS --- p.72Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73Chapter 5.3 --- Cohort Normalization --- p.74Chapter 5.4 --- Experimental Results and Analysis --- p.75Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79Chapter 5.4.3 --- Language Dependency --- p.80Chapter 5.4.4 --- Language-Independency --- p.83Chapter 5.5 --- Chapter Summary --- p.88Chapter 6 --- Conclusions and Future Work --- p.90Chapter 6.1 --- Summary --- p.90Chapter 6.1.1 --- Feature Comparison --- p.91Chapter 6.1.2 --- HMM Modeling --- p.91Chapter 6.1.3 --- GMM Modeling --- p.91Chapter 6.1.4 --- Cohort Normalization --- p.92Chapter 6.1.5 --- Language Dependency --- p.92Chapter 6.2 --- Future Work --- p.93Chapter 6.2.1 --- Feature Parameters --- p.93Chapter 6.2.2 --- Model Quality --- p.93Chapter 6.2.2.1 --- Variance Flooring --- p.93Chapter 6.2.2.2 --- Silence Detection --- p.94Chapter 6.2.3 --- Conversational Speaker Verification --- p.95Bibliography --- p.10
    corecore