234 research outputs found

    Real time speaker recognition using MFCC and VQ

    Get PDF
    Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook. In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming operations

    Nasality in automatic speaker verification

    Get PDF

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Histogram equalization for robust text-independent speaker verification in telephone environments

    Get PDF
    Word processed copy. Includes bibliographical references

    Joint estimation of vocal tract and source parameters of a speech production model

    Get PDF
    This thesis describes algorithms developed to jointly estimate vocal tract shapes and source signals from real speech. The methodology was developed and evaluated using simple articulatory models of the vocal tract, coupled with lumped parametric models of the loss mechanisms in the tract. The vocal tract is modelled by a five parameter area function model [Lm, 1990] Energy losses due to wall vibration and glottal resistance are modelled as a pole- zero filter placed at the glottis. A model described in [Lame, 1982] is used to approximate the lip radiation characteristic. An articulatory-to-acoustic "linked codebook" of approximately 1600 shapes is generated and exhaustively searched to estimate the vocal tract parameters. Glottal waveforms (input signals) are obtained by inverse filtering real speech using the estimated vocal tract parameters. The inverse filter is constructed using the estimated area function. A new method is proposed to fit the Liljencrants - Fant glottal flow model [Fant, Liljencrants and Lm, 1985] to the inverse filtered signals Estimates of the parameters are found from both the inverse filtered signal and its derivative. The descnbed model successfully estimates articulatory parameters for artificial speech waveforms. Tests on recorded vowels suggest that the technique is applicable to real speech. The technique has applications in the development of natural sounding speech synthesis, the treatment of speech disorders and the reduction of data bit rates in speech codin

    Speaker gender recognition system

    Get PDF
    Abstract. Automatic gender recognition through speech is one of the fundamental mechanisms in human-machine interaction. Typical application areas of this technology range from gender-targeted advertising to gender-specific IoT (Internet of Things) applications. It can also be used to narrow down the scope of investigations in crime scenarios. There are many possible methods of recognizing the gender of a speaker. In machine learning applications, the first step is to acquire and convert the natural human voice into a form of machine understandable signal. Useful voice features then could be extracted and labelled with gender information so that are then trained by machines. After that, new input voice can be captured and processed and the machine is able to extract the features by pattern modelling. In this thesis, a real-time speaker gender recognition system was designed within Matlab environment. This system could automatically identify the gender of a speaker by voice. The implementation work utilized voice processing and feature extraction techniques to deal with an input speech coming from a microphone or a recorded speech file. The response features are extracted and classified. Then the machine learning classification method (Naïve Bayes Classifier) is used to distinguish the gender features. The recognition result with gender information is then finally displayed. The evaluation of the speaker gender recognition systems was done in an experiment with 40 participants (half male and half female) in a quite small room. The experiment recorded 400 speech samples by speakers from 16 countries in 17 languages. These 400 speech samples were tested by the gender recognition system and showed a considerably good performance, with only 29 errors of recognition (92.75% accuracy). In comparison with previous speaker gender recognition systems, most of them obtained the accuracy no more than 90% and only one obtained 100% accuracy with very limited testers. We can then conclude that the performance of the speaker gender recognition system designed in this thesis is reliable
    corecore