382 research outputs found

    Comparison GMM and SVM Classifier for Automatic Speaker Verification

    Get PDF
    The objective of this thesis is to develop automatic text-independent speaker verification systems using unconstrained telephone conversational speech. We began by performing a Gaussian Mixture Model Likelihood ratio verification task in speaker independent system as described by MIT Lincoln Lab. We next introduced a speaker dependent verification system based on speaker dependent thresholds. We then implemented the same system applying Support Vector Machine. In SVM, we used polynomial kernels and radial basis function kernels and compared the performance. For training and testing the system, we used low-level spectral features. Finally, we provided a performance assessment of these systems using the National Institute of Standards and technology (NIST) speaker recognition evaluation 2008 telephone corpora

    Speaker Recognition Using Machine Learning Techniques

    Get PDF
    Speaker recognition is a technique of identifying the person talking to a machine using the voice features and acoustics. It has multiple applications ranging in the fields of Human Computer Interaction (HCI), biometrics, security, and Internet of Things (IoT). With the advancements in technology, hardware is getting powerful and software is becoming smarter. Subsequently, the utilization of devices to interact effectively with humans and performing complex calculations is also increasing. This is where speaker recognition is important as it facilitates a seamless communication between humans and computers. Additionally, the field of security has seen a rise in biometrics. At present, multiple biometric techniques co-exist with each other, for instance, iris, fingerprint, voice, facial, and more. Voice is one metric which apart from being natural to the users, provides comparable and sometimes even higher levels of security when compared to some traditional biometric approaches. Hence, it is a widely accepted form of biometric technique and is constantly being studied by scientists for further improvements. This study aims to evaluate different pre-processing, feature extraction, and machine learning techniques on audios recorded in unconstrained and natural environments to determine which combination of these works well for speaker recognition and classification. Thus, the report presents several methods of audio pre- processing like trimming, split and merge, noise reduction, and vocal enhancements to enhance the audios obtained from real-world situations. Additionally, a text-independent approach is used in this research which makes the model flexible to multiple languages. Mel Frequency Cepstral Coefficients (MFCC) are extracted for each audio, along with their differentials and accelerations to evaluate machine learning classification techniques such as kNN, Support Vector Machines, and Random Forest Classifiers. Lastly, the approaches are evaluated against existing research to study which techniques performs well on these sets of audio recordings

    Temporal Pattern Classification using Kernel Methods for Speech

    Get PDF
    There are two paradigms for modelling the varying length temporal data namely, modelling the sequences of feature vectors as in the hidden Markov model-based approaches for speech recognition and modelling the sets of feature vectors as in the Gaussian mixture model (GMM)-based approaches for speech emotion recognition. In this paper, the methods using discrete hidden Markov models (DHMMs) in the kernel feature space and string kernel-based SVM classifier for classification of discretised representation of sequence of feature vectors obtained by clustering and vector quantisation in the kernel feature space are presented. The authors then present continuous density hidden Markov models (CDHMMs) in the explicit kernel feature space that use the continuous valued representation of features extracted from the temporal data. The methods for temporal pattern classification by mapping a varying length sequential pattern to a fixed-length sequential pattern and then using an SVM-based classifier for classification are also presented. The task of recognition of spoken letters in E-set, it is possible to build models that use a discretised representation and string kernel SVM based classification and obtain a classification performance better than that of models using the continuous valued representation is demonstrated. For modelling sets of vectors-based representation of temporal data, two approaches in a hybrid framework namely, the score vector-based approach and the segment modelling based approach are presented. In both approaches, a generative model-based method is used to obtain a fixed length pattern representation for a varying length temporal data and then a discriminative model is used for classification. These two approaches are studied for speech emotion recognition task. The segment modelling based approach gives a better performance than the score vector-based approach and the GMM-based classifiers for speech emotion recognition.Defence Science Journal, 2010, 60(4), pp.348-363, DOI:http://dx.doi.org/10.14429/dsj.60.49

    Acoustic Approaches to Gender and Accent Identification

    Get PDF
    There has been considerable research on the problems of speaker and language recognition from samples of speech. A less researched problem is that of accent recognition. Although this is a similar problem to language identification, di�erent accents of a language exhibit more fine-grained di�erences between classes than languages. This presents a tougher problem for traditional classification techniques. In this thesis, we propose and evaluate a number of techniques for gender and accent classification. These techniques are novel modifications and extensions to state of the art algorithms, and they result in enhanced performance on gender and accent recognition. The first part of the thesis focuses on the problem of gender identification, and presents a technique that gives improved performance in situations where training and test conditions are mismatched. The bulk of this thesis is concerned with the application of the i-Vector technique to accent identification, which is the most successful approach to acoustic classification to have emerged in recent years. We show that it is possible to achieve high accuracy accent identification without reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis describes various stages in the development of i-Vector based accent classification that improve the standard approaches usually applied for speaker or language identification, which are insu�cient. We demonstrate that very good accent identification performance is possible with acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can obtain from the same data. We claim to have achieved the best accent identification performance on the test corpus for acoustic methods, with up to 90% identification rate. This performance is even better than previously reported acoustic-phonotactic based systems on the same corpus, and is very close to performance obtained via transcription based accent identification. Finally, we demonstrate that the utilization of our techniques for speech recognition purposes leads to considerably lower word error rates. Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British English, Prosody, Speech Recognition

    Automatic Gender Detection Based on Characteristics of Vocal Folds for Mobile Healthcare System

    Get PDF
    An automatic gender detection may be useful in some cases of a mobile healthcare system. For example, there are some pathologies, such as vocal fold cyst, which mainly occur in female patients. If there is an automatic method for gender detection embedded into the system, it is easy for a healthcare professional to assess and prescribe appropriate medication to the patient. In human voice production system, contribution of the vocal folds is very vital. The length of the vocal folds is gender dependent; a male speaker has longer vocal folds than a female speaker. Due to longer vocal folds, the voice of a male becomes heavy and, therefore, contains more voice intensity. Based on this idea, a new type of time domain acoustic feature for automatic gender detection system is proposed in this paper. The proposed feature measures the voice intensity by calculating the area under the modified voice contour to make the differentiation between males and females. Two different databases are used to show that the proposed feature is independent of text, spoken language, dialect region, recording system, and environment. The obtained results for clean and noisy speech are 98.27% and 96.55%, respectively

    An application of an auditory periphery model in speaker identification

    Get PDF
    The number of applications of automatic Speaker Identification (SID) is growing due to the advanced technologies for secure access and authentication in services and devices. In 2016, in a study, the Cascade of Asymmetric Resonators with Fast Acting Compression (CAR FAC) cochlear model achieved the best performance among seven recent cochlear models to fit a set of human auditory physiological data. Motivated by the performance of the CAR-FAC, I apply this cochlear model in an SID task for the first time to produce a similar performance to a human auditory system. This thesis investigates the potential of the CAR-FAC model in an SID task. I investigate the capability of the CAR-FAC in text-dependent and text-independent SID tasks. This thesis also investigates contributions of different parameters, nonlinearities, and stages of the CAR-FAC that enhance SID accuracy. The performance of the CAR-FAC is compared with another recent cochlear model called the Auditory Nerve (AN) model. In addition, three FFT-based auditory features – Mel frequency Cepstral Coefficient (MFCC), Frequency Domain Linear Prediction (FDLP), and Gammatone Frequency Cepstral Coefficient (GFCC), are also included to compare their performance with cochlear features. This comparison allows me to investigate a better front-end for a noise-robust SID system. Three different statistical classifiers: a Gaussian Mixture Model with Universal Background Model (GMM-UBM), a Support Vector Machine (SVM), and an I-vector were used to evaluate the performance. These statistical classifiers allow me to investigate nonlinearities in the cochlear front-ends. The performance is evaluated under clean and noisy conditions for a wide range of noise levels. Techniques to improve the performance of a cochlear algorithm are also investigated in this thesis. It was found that the application of a cube root and DCT on cochlear output enhances the SID accuracy substantially

    NIST 2007 Language Recognition Evaluation: From the Perspective of IIR

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
    • …
    corecore