147 research outputs found

    ASR Systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness

    Get PDF
    This paper deals with the analysis of Automatic Speech Recognition (ASR) suitable for usage within noisy environment and suggests optimum configuration under various noisy conditions. The behavior of standard parameterization techniques was analyzed from the viewpoint of robustness against background noise. It was done for Melfrequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP) coefficients, and their modified forms combining main blocks of PLP and MFCC. The second part is devoted to the analysis and contribution of modified techniques containing frequency-domain noise suppression and voice activity detection. The above-mentioned techniques were tested with signals in real noisy environment within Czech digit recognition task and AURORA databases. Finally, the contribution of special VAD selective training and MLLR adaptation of acoustic models were studied for various signal features

    Wavelet-based techniques for speech recognition

    Get PDF
    In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

    Full text link
    This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules leading to a unified view on known derivations as well as to new formulations for certain approaches. The generic Bayesian perspective provided in this contribution thus highlights structural differences and similarities between the analyzed approaches

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Automatic speech recognition: from study to practice

    Get PDF
    Today, automatic speech recognition (ASR) is widely used for different purposes such as robotics, multimedia, medical and industrial application. Although many researches have been performed in this field in the past decades, there is still a lot of room to work. In order to start working in this area, complete knowledge of ASR systems as well as their weak points and problems is inevitable. Besides that, practical experience improves the theoretical knowledge understanding in a reliable way. Regarding to these facts, in this master thesis, we have first reviewed the principal structure of the standard HMM-based ASR systems from technical point of view. This includes, feature extraction, acoustic modeling, language modeling and decoding. Then, the most significant challenging points in ASR systems is discussed. These challenging points address different internal components characteristics or external agents which affect the ASR systems performance. Furthermore, we have implemented a Spanish language recognizer using HTK toolkit. Finally, two open research lines according to the studies of different sources in the field of ASR has been suggested for future work

    Hidden Markov model-based speech enhancement

    Get PDF
    This work proposes a method of model-based speech enhancement that uses a network of HMMs to first decode noisy speech and to then synthesise a set of features that enables a speech production model to reconstruct clean speech. The motivation is to remove the distortion and residual and musical noises that are associated with conventional filteringbased methods of speech enhancement. STRAIGHT forms the speech production model for speech reconstruction and requires a time-frequency spectral surface, aperiodicity and a fundamental frequency contour. The technique of HMM-based synthesis is used to create the estimate of the timefrequency surface, and aperiodicity after the model and state sequence is obtained from HMM decoding of the input noisy speech. Fundamental frequency were found to be best estimated using the PEFAC method rather than synthesis from the HMMs. For the robust HMM decoding in noisy conditions it is necessary for the HMMs to model noisy speech and consequently noise adaptation is investigated to achieve this and its resulting effect on the reconstructed speech measured. Even with such noise adaptation to match the HMMs to the noisy conditions, decoding errors arise, both in terms of incorrect decoding and time alignment errors. Confidence measures are developed to identify such errors and then compensation methods developed to conceal these errors in the enhanced speech signal. Speech quality and intelligibility analysis is first applied in terms of PESQ and NCM showing the superiority of the proposed method against conventional methods at low SNRs. Three way subjective MOS listening test then discovers the performance of the proposed method overwhelmingly surpass the conventional methods over all noise conditions and then a subjective word recognition test shows an advantage of the proposed method over speech intelligibility to the conventional methods at low SNRs
    corecore