58 research outputs found

    Semi-continuous hidden Markov models for automatic speaker verification

    Get PDF

    Discriminating semi-continuous HMM for speaker verification

    Get PDF
    This paper describes the use of a multiple codebook SCHMM speaker verification system, which uses a novel technique for discriminative hidden Markov modelling known as discriminative observation probabilities (DOP). DOP can easily be added to a multiple codebook HMM system and require minimal additional computation and no additional training. The DOP technique can be applied to both speech and speaker recognition. Results are presented for text-dependent experiments on isolated digits from 27 true speakers and 84 casual imposters, recorded over the public telephone network in the United Kingdom. DOP are shown to significantly improve speaker verification performance for several commonly used parameter sets

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    On the automatic segmentation of transcribed words

    Get PDF

    Acoustic Approaches to Gender and Accent Identification

    Get PDF
    There has been considerable research on the problems of speaker and language recognition from samples of speech. A less researched problem is that of accent recognition. Although this is a similar problem to language identification, di�erent accents of a language exhibit more fine-grained di�erences between classes than languages. This presents a tougher problem for traditional classification techniques. In this thesis, we propose and evaluate a number of techniques for gender and accent classification. These techniques are novel modifications and extensions to state of the art algorithms, and they result in enhanced performance on gender and accent recognition. The first part of the thesis focuses on the problem of gender identification, and presents a technique that gives improved performance in situations where training and test conditions are mismatched. The bulk of this thesis is concerned with the application of the i-Vector technique to accent identification, which is the most successful approach to acoustic classification to have emerged in recent years. We show that it is possible to achieve high accuracy accent identification without reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis describes various stages in the development of i-Vector based accent classification that improve the standard approaches usually applied for speaker or language identification, which are insu�cient. We demonstrate that very good accent identification performance is possible with acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can obtain from the same data. We claim to have achieved the best accent identification performance on the test corpus for acoustic methods, with up to 90% identification rate. This performance is even better than previously reported acoustic-phonotactic based systems on the same corpus, and is very close to performance obtained via transcription based accent identification. Finally, we demonstrate that the utilization of our techniques for speech recognition purposes leads to considerably lower word error rates. Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British English, Prosody, Speech Recognition

    Methods for Addressing Data Diversity in Automatic Speech Recognition

    Get PDF
    The performance of speech recognition systems is known to degrade in mismatched conditions, where the acoustic environment and the speaker population significantly differ between the training and target test data. Performance degradation due to the mismatch is widely reported in the literature, particularly for diverse datasets. This thesis approaches the mismatch problem in diverse datasets with various strategies including data refinement, variability modelling and speech recognition model adaptation. These strategies are realised in six novel contributions. The first contribution is a data subset selection technique using likelihood ratio derived from a target test set quantifying mismatch. The second contribution is a multi-style training method using data augmentation. The existing training data is augmented using a distribution of variabilities learnt from a target dataset, resulting in a matched set. The third contribution is a new approach for genre identification in diverse media data with the aim of reducing the mismatch in an adaptation framework. The fourth contribution is a novel method which performs an unsupervised domain discovery using latent Dirichlet allocation. Since the latent domains have a high correlation with some subjective meta-data tags, such as genre labels of media data, features derived from the latent domains are successfully applied to the genre and broadcast show identification tasks. The fifth contribution extends the latent modelling technique for acoustic model adaptation, where latent-domain specific models are adapted from a base model. As the sixth contribution, an alternative adaptation approach is proposed where subspace adaptation of deep neural network acoustic models is performed using the proposed latent-domain aware training procedure. All of the proposed techniques for mismatch reduction are verified using diverse datasets. Using data selection, data augmentation and latent-domain model adaptation methods the mismatch between training and testing conditions of diverse ASR systems are reduced, resulting in more robust speech recognition systems

    A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems

    Get PDF
    A speaker's accent is the most important factor affecting the performance of Natural Language Call Routing (NLCR) systems because accents vary widely, even within the same country or community. This variation also occurs when non-native speakers start to learn a second language, the substitution of native language phonology being a common process. Such substitution leads to fuzziness between the phoneme boundaries and phoneme classes, which reduces out-of-class variations, and increases the similarities between the different sets of phonemes. Thus, this fuzziness is the main cause of reduced NLCR system performance. The main requirement for commercial enterprises using an NLCR system is to have a robust NLCR system that provides call understanding and routing to appropriate destinations. The chief motivation for this present work is to develop an NLCR system that eliminates multilayered menus and employs a sophisticated speaker accent-based automated voice response system around the clock. Currently, NLCRs are not fully equipped with accent classification capability. Our main objective is to develop both speaker-independent and speaker-dependent accent classification systems that understand a caller's query, classify the caller's accent, and route the call to the acoustic model that has been thoroughly trained on a database of speech utterances recorded by such speakers. In the field of accent classification, the dominant approaches are the Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Of the two, GMM is the most widely implemented for accent classification. However, GMM performance depends on the initial partitions and number of Gaussian mixtures, both of which can reduce performance if poorly chosen. To overcome these shortcomings, we propose a speaker-independent accent classification system based on a distance metric learning approach and evolution strategy. This approach depends on side information from dissimilar pairs of accent groups to transfer data points to a new feature space where the Euclidean distances between similar and dissimilar points are at their minimum and maximum, respectively. Finally, a Non-dominated Sorting Evolution Strategy (NSES)-based k-means clustering algorithm is employed on the training data set processed by the distance metric learning approach. The main objectives of the NSES-based k-means approach are to find the cluster centroids as well as the optimal number of clusters for a GMM classifier. In the case of a speaker-dependent application, a new method is proposed based on the fuzzy canonical correlation analysis to find appropriate Gaussian mixtures for a GMM-based accent classification system. In our proposed method, we implement a fuzzy clustering approach to minimize the within-group sum-of-square-error and canonical correlation analysis to maximize the correlation between the speech feature vectors and cluster centroids. We conducted a number of experiments using the TIMIT database, the speech accent archive, and the foreign accent English databases for evaluating the performance of speaker-independent and speaker-dependent applications. Assessment of the applications and analysis shows that our proposed methodologies outperform the HMM, GMM, vector quantization GMM, and radial basis neural networks
    corecore