5 research outputs found

    MFCC AND CMN BASED SPEAKER RECOGNITION IN NOISY ENVIRONMENT

    Get PDF
    The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognition systems is their unsatisfactory performance in noisy environments. In this experimental research, we have studied a combination of Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and Cepstral Mean Normalization (CMN) techniques for speech enhancement. Our system uses a Gaussian Mixture Models (GMM) classifier and is implemented under MATLABÂź7 programming environment. The process involves the use of speaker data for both training and testing. The data used for testing is matched up against a speaker model, which is trained with the training data using GMM modeling. Finally, experiments are carried out to test the new model for ASR given limited training data and with differing levels and types of realistic background noise. The results have demonstrated the robustness of the new system

    You just do not understand me! Speech Recognition in Human Robot Interaction

    Full text link
    Abstract — Speech Recognition has not fully permeated in our interaction with devices. Therefore we advocate a speech recognition friendly artificial language (ROILA) that initially was shown to outperform English, however under constraints. ROILA is intended to be used to talk to robots and therefore in this paper we present an experimental study where the recognition of ROILA is compared to English when speech is input using a robot’s microphones and both when the robot’s head is moving and stationary. Our results show that there was no significant difference between ROILA and English but that the type of microphone and robot’s head movement had a significant effect. In conclusion we suggest implications for Human Robot (Speech) Interaction. I

    Experiments Of Speech Recognition In A Noisy And Reverberant Environment Using A Microphone Array And Hmm Adaptation

    No full text
    The use of a microphone array for hands-free continuous speech recognition in noisy and reverberantenvironmentis investigated. An array of four omnidirectional microphones is placed at 1.5 m distance from thetalker; given the array signals, a Time Delay Compensation #TDC# module provides a beamformed signal, thatisshown e#ective as inputtoa Hidden MarkovModel #HMM# based recognizer. Given a small amountofsentences collected from a new speaker in a real environment, HMM adaptation further improves recognition rate. These results are con#rmed bothby experiments conducted in a noisy o#ce environmentandbysimulations. In thelatter case, di#erent SNR and reverberation conditions were recreated byusingtheimage method to reproduce synthetic array microphone signals

    Digital Microphone Array - Design, Implementation and Speech Recognition Experiments

    Get PDF
    The instrumented meeting room of the future will help meetings to be more efficient and productive. One of the basic components of the instrumented meeting room is the speech recording device, in most cases a microphone array. The two basic requirements for this microphone array are portability and cost-efficiency, neither of which are provided by current commercially available arrays. This will change in the near future thanks to the availability of new digital MEMS microphones. This dissertation reports on the first successful implementation of a digital MEMS microphone array. This digital MEMS microphone array was designed, implemented, tested and evaluated and successfully compared with an existing analogue microphone array using a state-of-the-art ASR system and adaptation algorithms. The newly built digital MEMS microphone array compares well with the analogue microphone array on the basis of the word error rate achieved in an automated speech recognition system and is highly portable and economical

    A psychoacoustic engineering approach to machine sound source separation in reverberant environments

    Get PDF
    Reverberation continues to present a major problem for sound source separation algorithms, due to its corruption of many of the acoustical cues on which these algorithms rely. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. This thesis therefore considers the research question: can the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation be improved? The precedence effect is a perceptual mechanism that aids our ability to localise sounds in reverberant environments. Despite this, relatively little work has been done on incorporating the precedence effect into automated sound source separation. Consequently, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm. The algorithm included a precedence model, which was replaced with the other precedence models during the investigation. The models were tested using a novel metric in a range of reverberant rooms and with a range of other mixture parameters. The metric, termed Ideal Binary Mask Ratio, is shown to be robust to the effects of reverberation and facilitates meaningful and direct comparison between algorithms across different acoustic conditions. Large differences between the performances of the models were observed. The results showed that a separation algorithm incorporating a model based on interaural coherence produces the greatest performance gain over the baseline algorithm. The results from the study also indicated that it may be necessary to adapt the precedence model to the acoustic conditions in which the model is utilised. This effect is analogous to the perceptual Clifton effect, which is a dynamic component of the precedence effect that appears to adapt precedence to a given acoustic environment in order to maximise its effectiveness. However, no work has been carried out on adapting a precedence model to the acoustic conditions under test. Specifically, although the necessity for such a component has been suggested in the literature, neither its necessity nor benefit has been formally validated. Consequently, a further study was conducted in which parameters of each of the previously compared precedence models were varied in each room in order to identify if, and to what extent, the separation performance varied with these parameters. The results showed that the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation can be improved and can yield significant gains in separation performance.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore