18 research outputs found

    Performance evaluation in open-set speaker identification

    Get PDF
    This document is the Accepted Manuscript version of the following paper: Malegaonkar A., Ariyaeeinia A. (2011) Performance Evaluation in Open-Set Speaker Identification. In: Vielhauer C., Dittmann J., Drygajlo A., Juul N.C., Fairhurst M.C. (eds) Biometrics and ID Management. BioID 2011. Lecture Notes in Computer Science, vol 6583. Springer, Berlin, Heidelberg. The Version of Record is available online at doi: https://doi.org/10.1007/978-3-642-19530-3_10. Ā© Springer-Verlag Berlin Heidelberg 2011.The concern in this study is the approach to evaluating the performance of the open-set speaker identification process. In essence, such a process involves first identifying the speaker model in the database that best matches the given test utterance, and then determining if the test utterance has actually been produced by the speaker associated with the best-matched model. Whilst, conventionally, the performance of each of these two sub-processes is evaluated independently, it is argued that the use of a measure of performance for the complete process can provide a more useful basis for comparing the effectiveness of different systems. Based on this argument, an approach to assessing the performance of open-set speaker identification is considered in this paper, which is in principle similar to the method used for computing the diarisation error rate. The paper details the above approach for assessing the performance of open-set speaker identification and presents an analysis of its characteristics

    Efficient speaker change detection using adapted Gaussian mixture models

    Get PDF
    Original article can be found at: http://ieeexplore.ieee.org/xpl/RecentIssue.jsppunumber=10376-- This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.-- Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.A new approach to speaker change detection is proposed and investigated. The method, which is based on a probabilistic framework, provides an effective means for tackling the problem posed by phonetic variation in high-resolution speaker change detection. Additionally, the approach incorporates the capability for dealing with undesired effects of variations in speech characteristics. Using the experimental investigations conduced with clean and broadcast news audio, it is shown that the proposed method is significantly more effective than the currently popular techniques for speaker change detection. To enhance the computational efficiency of the proposed method, modified implementation algorithms are introduced which are based on the exploitation of the redundant operations and a fast scoring procedure. It is shown that, through the use of the proposed fast algorithm, the computational efficiency of the approach can be increased by over 77% without significant reduction in its accuracy. The paper discusses the principles and characteristics of the proposed speaker change detection method, and provides a detailed description of its efficient implementation. The experiments, investigating the performance of the proposed method and its effectiveness in relation to other approaches, are described and an analysis of the results is presented.Peer reviewe

    Fusion of cross stream information in speaker verification

    Get PDF
    This paper addresses the performance of various statistical data fusion techniques for combining the complementary score information in speaker verification. The complementary verification scores are based on the static and delta cepstral features. Both LPCC (Linear prediction-based cepstral coefficients) and MFCC (mel-frequency cepstral coefficients) are considered in the study. The experiments conducted using a GMM-based speaker verification system, provides valuable information on the relative effectiveness of different fusion methods applied at the score level. It is also demonstrated that a higher speaker discrimination capability can be achieved by applying the fusion at the score level rather than at the feature level

    Verification Effectiveness in Open-Set Speaker Identification

    Get PDF
    This paper is a postprint of a paper submitted to and accepted for publication in IEE Proceedings Vision, Image and Signal Processing and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IET Digital Library.This paper is concerned with the verification effectiveness in open-set, text-independent speaker identification. The study includes an analysis of the characteristics of this mode of speaker recognition and the potential causes of errors. The use of well-known score normalisation techniques for the purpose enhancing the reliability of the process is described and their relative effectiveness is experimentally investigated. The experiments are based on the dataset proposed for the 1-speaker detection task of the NIST Speaker Recognition Evaluation 2003. Based on the experimental results, it is demonstrated that significant benefits is achieved by using score normalisation in open-set identification, and that the level of this depends highly on the type of the approach adopted. The results also show that better performance can be achieved by using the cohort normalisation methods. In particular, the unconstrained cohort method with a relatively small cohort size appears to outperform all other approaches

    Unsupervised Speaker Change Detection using Probabilistic Pattern Matching

    Get PDF
    Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.This letter presents an investigation into the use of a probabilistic pattern matching approach for detecting speaker changes in audio streams. The experiments are conducted using clean speech as well as broadcast news material. It is shown that, in the proposed approach, the use of bilateral scoring is considerably more effective than unilateral scoring. Appropriate score normalization methods are considered in the study. It is observed that in all the cases, the bilateral scoring approach outperforms the currently popular method of Bayesian information criterion (BIC) for speaker change detection. This letter discusses the principles of the proposed approach and details the experimental investigations.Peer reviewe

    On the enhancement of speaker identification accuracy using weighted bilateral scoring

    Get PDF
    ā€œThis material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." ā€œCopyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.ā€ DOI: 10.1109/CCST.2008.4751310This paper presents investigations into an effective bilateral scoring method in open-set speaker identification. The approach is based on the fact that two different speakers usually are not reciprocal. A difficulty in deploying bilateral scoring is that test utterances are normally much shorter than training utterances. To tackle this problem, the proposed approach provides the final identification score based on a weighted combination of independently normalised forward and reverse scores. Based on the experimental results obtained using clean and telephone quality speech, it is shown that the proposed approach is more effective than the conventional scoring methods in open-set speaker identification.Peer reviewe

    On the use of decoupled and adapted Gaussian mixture models for open-set speaker identification

    Get PDF
    This paper presents a comparative analysis of the performance of decoupled and adapted Gaussian mixture models (GMMs) for open-set, text-independent speaker identification (OSTISI). The analysis is based on a set of experiments using an appropriate subset of the NIST-SRE 2003 database and various score normalisation methods. Based on the experimental results, it is concluded that the speaker identification performance is noticeably better with adapted-GMMs than with decoupled- GMMs. This difference in performance, however, appears to be of less significance in the second stage of OSTISI where the process involves classifying the test speakers as known or unknown speakers. In particular, when the score normalisation used in this stage is based on the unconstrained cohort approach, the two modelling techniques yield similar performance. The paper includes a detailed description of the experiments and discusses how the OSTI-SI performance is influenced by the characteristics of each of the two modelling techniques and the normalisation approaches adopted

    Multimodal Authentication using Qualitative Support Vector Machines

    No full text
    This paper proposes an approach to enhancing the accuracy of multimodal biometrics in uncontrolled environments. Variation in operating conditions results in mismatch between the training and test material, and thereby affects the biometric authentication performance regardless of this being unimodal or multimodal. ne paper proposes a technique to reduce the effects of such variations in multimodal fusion. The proposed technique is based on estimating the quality aspect of the test scores and then passing these aspects into the Support Vector Machine either as features or weights. Since the fusion process is based on the learning classifier of Support Vector Machine, the technique is termed Support Vector Machine with Quality Measurement (SVM-QM). The experimental investigation is conducted using face and speech modalities. The results clearly show the benefits gained from learning the quality aspects of the biometric data used for authentication

    Unsupervised speaker change detection using probabilistic pattern matching

    No full text
    corecore