5,057 research outputs found

    New methods for robust speech recognition

    Get PDF
    Ankara : Department of Electrical and Electronics Engineering and the Institute of Engineering and Science of Bilkent University, 1995.Thesis (Ph.D.) -- Bilkent University, 1995.Includes bibliographical references leaves 86-92.New methods of feature extraction, end-point detection and speech enhcincement are developed for a robust speech recognition system. The methods of feature extraction and end-point detection are based on wavelet analysis or subband analysis of the speech signal. Two new sets of speech feature parameters, SUBLSF’s and SUBCEP’s, are introduced. Both parameter sets are based on subband analysis. The SUBLSF feature parameters are obtained via linear predictive analysis on subbands. These speech feature parameters can produce better results than the full-band parameters when the noise is colored. The SUBCEP parameters are based on wavelet analysis or equivalently the multirate subband analysis of the speech signal. The SUBCEP parameters also provide robust recognition performance by appropriately deemphasizing the frequency bands corrupted by noise. It is experimentally observed that the subband analysis based feature parameters are more robust than the commonly used full-band analysis based parameters in the presence of car noise. The a-stable random processes can be used to model the impulsive nature of the public network telecommunication noise. Adaptive filtering are developed for Q-stable random processes. Adaptive noise cancelation techniques are used to reduce the mismacth between training and testing conditions of the recognition system over telephone lines. Another important problem in isolated speech recognition is to determine the boundaries of the speech utterances or words. Precise boundary detection of utterances improves the performance of speech recognition systems. A new distance measure based on the subband energy levels is introduced for endpoint detection.Erzin, EnginPh.D

    Robust Speech Detection for Noisy Environments

    Get PDF
    This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM) to improve speech recognition systems in stationary and non-stationary noise environments: inside motor vehicles (like cars or planes) or inside buildings close to high traffic places (like in a control tower for air traffic control (ATC)). In these environments, there is a high stationary noise level caused by vehicle motors and additionally, there could be people speaking at certain distance from the main speaker producing non-stationary noise. The VAD presented in this paper is characterized by a new front-end and a noise level adaptation process that increases significantly the VAD robustness for different signal to noise ratios (SNRs). The feature vector used by the VAD includes the most relevant Mel Frequency Cepstral Coefficients (MFCC), normalized log energy and delta log energy. The proposed VAD has been evaluated and compared to other well-known VADs using three databases containing different noise conditions: speech in clean environments (SNRs mayor que 20 dB), speech recorded in stationary noise environments (inside or close to motor vehicles), and finally, speech in non stationary environments (including noise from bars, television and far-field speakers). In the three cases, the detection error obtained with the proposed VAD is the lowest for all SNRs compared to AceroÂżs VAD (reference of this work) and other well-known VADs like AMR, AURORA or G729 annex b

    Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

    Get PDF
    In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Neuro-Genetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, ?MFCC, ??MFCC, LPC and LPCC have been used to extract the features. After feature extraction of the speech, Neuro-Genetic hybrid algorithm has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. According to the VALID speech database, the highest speaker identification rate of 100.000% for studio environment and 82.33% for office environmental conditions have been achieved in the close set text dependent speaker identification system

    Exploring Non-linear Transformations for an Entropybased Voice Activity Detector

    Get PDF
    In this paper we explore the use of non-linear transformations in order to improve the performance of an entropy based voice activity detector (VAD). The idea of using a non-linear transformation comes from some previous work done in speech linear prediction (LPC) field based in source separation techniques, where the score function was added into the classical equations in order to take into account the real distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if signal is clean, estimated entropy is essentially the same; but if signal is noisy transformed frames (with score function) are able to give different entropy if the frame is voiced against unvoiced ones. Experimental results show that this fact permits to detect voice activity under high noise, where simple entropy method fails

    A Robust Multiple Feature Approach To Endpoint Detection In Car Environment Based On Advanced Classifiers

    Get PDF
    In this paper we propose an endpoint detection system based on the use of several features extracted from each speech frame, followed by a robust classifier (i.e Adaboost and Bagging of decision trees, and a multilayer perceptron) and a finite state automata (FSA). We present results for four different classifiers. The FSA module consisted of a 4-state decision logic that filtered false alarms and false positives. We compare the use of four different classifiers in this task. The look ahead of the method that we propose was of 7 frames, which are the number of frames that maximized the accuracy of the system. The system was tested with real signals recorded inside a car, with signal to noise ratio that ranged from 6 dB to 30dB. Finally we present experimental results demonstrating that the system yields robust endpoint detection

    A new robust algorithm for isolated word endpoint detection

    Full text link
    Teager Energy and Energy-Entropy Features are two approaches, which have recently been used for locating the endpoints of an utterance. However, each of them has some drawbacks for speech in noisy environments. This paper proposes a novel method to combine these two approaches to locate endpoint intervals and yet make a final decision based on energy, which requires far less time than the feature based methods. After the algorithm description, an experimental evaluation is presented, comparing the automatically determined endpoints with those determined by skilled personnel. It is shown that the accuracy of this algorithm is quite satisfactory and acceptable

    A non-linear VAD for noisy environments

    Get PDF
    This paper deals with non-linear transformations for improving the performance of an entropy-based voice activity detector (VAD). The idea to use a non-linear transformation has already been applied in the field of speech linear prediction, or linear predictive coding (LPC), based on source separation techniques, where a score function is added to classical equations in order to take into account the true distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if the signal is clean, the estimated entropy is essentially the same; if the signal is noisy, however, the frames transformed using the score function may give entropy that is different in voiced frames as compared to nonvoiced ones. Experimental evidence is given to show that this fact enables voice activity detection under high noise, where the simple entropy method fails
    • …
    corecore