336 research outputs found

    Singing speaker clustering based on subspace learning in the GMM mean supervector space

    Get PDF
    Abstract In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speaker's voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retrieval which improve robustness to intrinsic variations in speech production. Speaker clustering techniques such as k-means and hierarchical are explored for analysis of acoustic space differences of a corpus consisting of reading and singing of lyrics for each speaker. Furthermore, a distance based on fuzzy c-means membership degrees is proposed to more accurately measure clustering difficulty or speaker confusability. Two categories of subspace learning methods are studied: unsupervised based on LPP, and supervised based on PLDA. Our proposed clustering method based on PLDA is a two stage algorithm: where first, initial clusters are obtained using full dimension supervectors, and next, each cluster is refined in a PLDA subspace resulting in a more speaker dependent representation that is less sensitive to speaking style. It is shown that LPP improves average clustering accuracy by 5.1% absolute versus a hierarchical baseline for a mixture of reading and singing, and PLDA based clustering increases accuracy by 9.6% absolute versus a k-means baseline. The advancements offer novel techniques to improve model formulation for speech applications including speaker ID, audio search, and audio content analysis

    Speaker Recognition: Advancements and Challenges

    Get PDF

    Statistical Approaches for Signal Processing with Application to Automatic Singer Identification

    Get PDF
    In the music world, the oldest instrument is known as the singing voice that plays an important role in musical recordings. The singer\u27s identity serves as a primary aid for people to organize, browse, and retrieve music recordings. In this thesis, we focus on the problem of singer identification based on the acoustic features of singing voice. An automatic singer identification system is constructed and has achieved a very high identification accuracy. This system consists of three crucial parts: singing voice detection, background music removal and pattern recognition. These parts are introduced and explored in great details in this thesis. To be specific, in terms of the singing voice detection, we firstly study a traditional method, double GMM. Then an improved method, namely single GMM, is proposed. The experimental result shows that the detection accuracy of single GMM can be achieved as high as 96.42%. In terms of the background music removal, Non-negative Matrix Factorization (NMF) and Robust Principal Component Analysis (RPCA) are demonstrated. The evaluation result shows that RPCA outperforms NMF. In terms of pattern recognition, we explore the algorithms of Support Vector Machine (SVM) and Gaussian Mixture Model (GMM). Based on the experimental results, it turns out that the prediction accuracy of GMM classifier is about 16% higher than SVM

    Identity verification using voice and its use in a privacy preserving system

    Get PDF
    Since security has been a growing concern in recent years, the field of biometrics has gained popularity and became an active research area. Beside new identity authentication and recognition methods, protection against theft of biometric data and potential privacy loss are current directions in biometric systems research. Biometric traits which are used for verification can be grouped into two: physical and behavioral traits. Physical traits such as fingerprints and iris patterns are characteristics that do not undergo major changes over time. On the other hand, behavioral traits such as voice, signature, and gait are more variable; they are therefore more suitable to lower security applications. Behavioral traits such as voice and signature also have the advantage of being able to generate numerous different biometric templates of the same modality (e.g. different pass-phrases or signatures), in order to provide cancelability of the biometric template and to prevent crossmatching of different databases. In this thesis, we present three new biometric verification systems based mainly on voice modality. First, we propose a text-dependent (TD) system where acoustic features are extracted from individual frames of the utterances, after they are aligned via phonetic HMMs. Data from 163 speakers from the TIDIGITS database are employed for this work and the best equal error rate (EER) is reported as 0.49% for 6-digit user passwords. Second, a text-independent (TI) speaker verification method is implemented inspired by the feature extraction method utilized for our text-dependent system. Our proposed TI system depends on creating speaker specific phoneme codebooks. Once phoneme codebooks are created on the enrollment stage using HMM alignment and segmentation to extract discriminative user information, test utterances are verified by calculating the total dissimilarity/distance to the claimed codebook. For benchmarking, a GMM-based TI system is implemented as a baseline. The results of the proposed TD system (0.22% EER for 7-digit passwords) is superior compared to the GMM-based system (0.31% EER for 7-digit sequences) whereas the proposed TI system yields worse results (5.79% EER for 7-digit sequences) using the data of 163 people from the TIDIGITS database . Finally, we introduce a new implementation of the multi-biometric template framework of Yanikoglu and Kholmatov [12], using fingerprint and voice modalities. In this framework, two biometric data are fused at the template level to create a multi-biometric template, in order to increase template security and privacy. The current work aims to also provide cancelability by exploiting the behavioral aspect of the voice modality

    Security/privacy analysis of biometric hashing and template protection for fingerprint minutiae

    Get PDF
    This thesis has two main parts. The first part deals with security and privacy analysis of biometric hashing. The second part introduces a method for fixed-length feature vector extraction and hash generation from fingerprint minutiae. The upsurge of interest in biometric systems has led to development of biometric template protection methods in order to overcome security and privacy problems. Biometric hashing produces a secure binary template by combining a personal secret key and the biometric of a person, which leads to a two factor authentication method. This dissertation analyzes biometric hashing both from a theoretical point of view and in regards to its practical application. For theoretical evaluation of biohashes, a systematic approach which uses estimated entropy based on degree of freedom of a binomial distribution is outlined. In addition, novel practical security and privacy attacks against face image hashing are presented to quantify additional protection provided by biometrics in cases where the secret key is compromised (i.e., the attacker is assumed to know the user's secret key). Two of these attacks are based on sparse signal recovery techniques using one-bit compressed sensing in addition to two other minimum-norm solution based attacks. A rainbow attack based on a large database of faces is also introduced. The results show that biometric templates would be in serious danger of being exposed when the secret key is known by an attacker, and the system would be under a serious threat as well. Due to its distinctiveness and performance, fingerprint is preferred among various biometric modalities in many settings. Most fingerprint recognition systems use minutiae information, which is an unordered collection of minutiae locations and orientations Some advanced template protection algorithms (such as fuzzy commitment and other modern cryptographic alternatives) require a fixed-length binary template. However, such a template protection method is not directly applicable to fingerprint minutiae representation which by its nature is of variable size. This dissertation introduces a novel and empirically validated framework that represents a minutiae set with a rotation invariant fixed-length vector and hence enables using biometric template protection methods for fingerprint recognition without signi cant loss in verification performance. The introduced framework is based on using local representations around each minutia as observations modeled by a Gaussian mixture model called a universal background model (UBM). For each fingerprint, we extract a fixed length super-vector of rst order statistics through alignment with the UBM. These super-vectors are then used for learning linear support vector machine (SVM) models per person for verifiation. In addition, the xed-length vector and the linear SVM model are both converted into binary hashes and the matching process is reduced to calculating the Hamming distance between them so that modern cryptographic alternatives based on homomorphic encryption can be applied for minutiae template protection
    corecore