472 research outputs found

    Generalized Hidden Filter Markov Models Applied to Speaker Recognition

    Get PDF
    Classification of time series has wide Air Force, DoD and commercial interest, from automatic target recognition systems on munitions to recognition of speakers in diverse environments. The ability to effectively model the temporal information contained in a sequence is of paramount importance. Toward this goal, this research develops theoretical extensions to a class of stochastic models and demonstrates their effectiveness on the problem of text-independent (language constrained) speaker recognition. Specifically within the hidden Markov model architecture, additional constraints are implemented which better incorporate observation correlations and context, where standard approaches fail. Two methods of modeling correlations are developed, and their mathematical properties of convergence and reestimation are analyzed. These differ in modeling correlation present in the time samples and those present in the processed features, such as Mel frequency cepstral coefficients. The system models speaker dependent phonemes, making use of word dictionary grammars, and recognition is based on normalized log-likelihood Viterbi decoding. Both closed set identification and speaker verification using cohorts are performed on the YOHO database. YOHO is the only large scale, multiple-session, high-quality speech database for speaker authentication and contains over one hundred speakers stating combination locks. Equal error rates of 0.21% for males and 0.31% for females are demonstrated. A critical error analysis using a hypothesis test formulation provides the maximum number of errors observable while still meeting the goal error rates of 1% False Reject and 0.1% False Accept. Our system achieves this goal

    Identification of Non-Linguistic Speech Features

    Get PDF
    Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications where the spoken query is to be recognized without even prior knowledge of the language being spoken, for example, information centers in public places such as train stations and airports. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions. Ideally, the speaker's identity can be verified continually during the transaction, in a manner completely transparent to the user. With these views in mind, this paper presents a unified approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. This technique is shown to be effective for text-independent language, sex, and speaker identification and can enable better and more friendly human-machine interaction. With 2s of speech, the language can be identified with better than 99 % accuracy. Error in sex-identification is about 1% on a per-sentence basis, and speaker identification accuracies of 98.5 % on TIMIT (168 speakers) and 99.2 % on BREF (65 speakers), were obtained with one utterance per speaker, and 100 % with 2 utterances for both corpora. An experiment using unsupervised adaptation for speaker identification on the 168 TIMIT speakers had the same identification accuracies obtained with supervised adaptation

    Automatic Identification of Arabic Dialects USING Hidden Markov Models

    Get PDF
    The Arabic language has many different dialects, they must beidentified before Automatic Speech Recognition can take place.This thesis examines the difficult task of properly identifyingvarious Arabic dialects. We also present a novel design of anArabic dialect identification system using Hidden Markov Models(HMM). Due to the similarities and the differences between Arabicdialects, we build a ergodic HMM that has two types of states; oneof them represents the common sounds across Arabic dialects, whilethe other represents the unique sounds of the specific dialect. Wetie the common states across all models since they share the samesounds. We focus only on two major dialects: Egyptian and theGulf. An improved initialization process is used to achieve betterArabic dialect identification. Moreover, we utilize many differentcombinations of speech features related to MFCC such as timederivatives, energy, and the Shifted Delta Cepstra in training andtesting the system. We present a detailed comparison of theperformance of our Arabic dialect identification system using thedifferent combinations. The best result of the Arabic dialectidentification system is 96.67\% correct identification

    Study of Speaker Verification Methods

    Get PDF
    Speaker verification is a process to accept or reject the identity claim of a speaker by comparing a set of measurements of the speaker’s utterances with a reference set of measurements of the utterance of the person whose identity is claimed.. In speaker verification, a person makes an identity claim. There are two main stages in this technique, feature extraction and feature matching. Feature extraction is the process in which we extract some useful data which can later to be used to represent the speaker. Feature matching involves identification of the unknown speaker by comparing the feature extracted from the voice with the enrolled voices of known speakers

    Spatio-temporal Pattern Recognition Using Hidden Markov Models

    Get PDF
    A new spatio-temporal method for identifying 3D objects found in 2D image sequences is presented. The Hidden Markov Model technique is used as a spatio-temporal classification algorithm to identify 3D objects by the temporal changes in observed shape features. A new information theoretic argument is developed that proves identifying objects based on image sequences can lead to higher classification accuracies than single look methods. A new distance measure is proposed that analyzes the performance of Hidden Markov Models in a multi-class pattern recognition problem. A three class problem identifying moving light display objects provides experimental verification of the sequence processing argument. Individual frames of a MLD image sequence contain very little spatial information. The single look classification rate for the moving light display imagery was observed to be near 50%. In contrast, the Hidden Markov Model classification rate was above 93 %. The alternate nearest neighbor multiple frame technique classification rate was 20% below the Hidden Markov Models. A one sided t-test revealed a highly statistically significant difference between the Hidden Markov Model and multiple frame technique at a 0. 01 level of significance. A five class problem consisting of tactical military ground vehicles is considered to provide verification using imagery with both spatial and temporal information. Results confirmed the new spatio-temporal pattern recognition method produces superior results by accessing the temporal information in the image sequences. A prototype automatic target recognition system is demonstrated

    Speech and crosstalk detection in multichannel audio

    Get PDF
    The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation
    • …
    corecore