108 research outputs found

    Speech Recognition of Isolated Arabic words via using Wavelet Transformation and Fuzzy Neural Network

    Get PDF
    In this paper two new methods for feature extraction are presented for speech recognition the first method use a combination of  linear predictive coding technique(LPC) and skewness equation. The second one(WLPCC) use a combination of linear predictive coding technique(LPC),  discrete wavelet transform(DWT), and cpestrum analysis. The objective of this method is to enhance the performance of the proposed method by introducing more features from the signal. Neural Network(NN) and Neuro-Fuzzy Network are used in the proposed methods for classification. Test result show that the WLPCC method in the process of features extraction, and the neuro fuzzy network in the classification process had highest recognition rate for both the trained and non trained data. The proposed system has been built using MATLAB software and the data involve ten isolated Arabic words that are (الله، محمد، خديجة، ياسين، يتكلم، الشارقة، لندن، يسار، يمين، أحزان), for fifteen male speakers. The recognition rate of trained data is (97.8%) and non-trained data  is (81.1%). Keywords: Speech Recognition, Feature Extraction, Linear Predictive Coding (LPC),Neural Network, Fuzzy networ

    Recognition of Isolated Marathi words from Side Pose for multi-pose Audio Visual Speech Recognition

    Get PDF
    Abstract: This paper presents a new multi pose audio visual speech recognition system based on fusion of side pose visual features and acoustic signals. The proposed method improved robustness and circumvention of conventional multimodal speech recognition system. The work was implemented on ‘vVISWA’ (Visual Vocabulary of Independent Standard Words) dataset comprised of full frontal, 45degree and side pose visual streams.The feature sets originating from the visual feature for Side pose are extracted using 2D Stationary Wavelet Transform (2D-SWT) and acoustic features extracted using (Linear Predictive Coding) LPC were fused and classified using KNN algorithm resulted in 90 % accuracy. This work facilitates approach of automatic recognition of isolated words from side pose in Multipose audio visual speech recognition domainwhere partial visual features of face were exists.Keywords: Side pose face detection, stationary wavelet transform, linear predictive analysis, Feature level fusion, KNN classifier

    Arabic Isolated Word Speaker Dependent Recognition System

    Get PDF
    In this thesis we designed a new Arabic isolated word speaker dependent recognition system based on a combination of several features extraction and classifications techniques. Where, the system combines the methods outputs using a voting rule. The system is implemented with a graphic user interface under Matlab using G62 Core I3/2.26 Ghz processor laptop. The dataset used in this system include 40 Arabic words recorded in a calm environment with 5 different speakers using laptop microphone. Each speaker will read each word 8 times. 5 of them are used in training and the remaining are used in the test phase. First in the preprocessing step we used an endpoint detection technique based on energy and zero crossing rates to identify the start and the end of each word and remove silences then we used a discrete wavelet transform to remove noise from signal. In order to accelerate the system and reduce the execution time we make the system first to recognize the speaker and load only the reference model of that user. We compared 5 different methods which are pairwise Euclidean distance with MelFrequency cepstral coefficients (MFCC), Dynamic Time Warping (DTW) with Formants features, Gaussian Mixture Model (GMM) with MFCC, MFCC+DTW and Itakura distance with Linear Predictive Coding features (LPC) and we got a recognition rate of 85.23%, 57% , 87%, 90%, 83% respectively. In order to improve the accuracy of the system, we tested several combinations of these 5 methods. We find that the best combination is MFCC | Euclidean + Formant | DTW + MFCC | DTW + LPC | Itakura with an accuracy of 94.39% but with large computation time of 2.9 seconds. In order to reduce the computation time of this hybrid, we compare several subcombination of it and find that the best performance in trade off computation time is by first combining MFCC | Euclidean + LPC | Itakura and only when the two methods do not match the system will add Formant | DTW + MFCC | DTW methods to the combination, where the average computation time is reduced to the half to 1.56 seconds and the system accuracy is improved to 94.56%. Finally, the proposed system is good and competitive compared with other previous researches

    Wavelet Based Feature Extraction for The Indonesian CV Syllables Sound

    Get PDF
    This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance (ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research aims to find the best properties in effectiveness and efficiency on performing feature extraction of each syllable sound to be applied in the speech recognition systems. This proposed approach which is the state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is segmented and normalized. In the second phase, the signal is transformed into frequency domain by using the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result shows the list of features of each syllables can be used for the next research, and some recommendations on the most effective and efficient WT to be used in performing syllable sound recognition

    Continuous kannada speech segmentation and speech recognition based on threshold using MFCC And VQ

    Get PDF
    Continuous speech segmentation and its  recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends  on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal  for recognition system is quite exciting for researchers. In this paper proposed method  is  divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out  by computing  average short term energy and its spectral centroid coefficients of  the speech signal present in the specified window. The segmented outputs are completely  meaningful  segmentation  for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less  number of codebooks  using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient  and effective segmentation    with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data

    A motion-based approach for audio-visual automatic speech recognition

    Get PDF
    The research work presented in this thesis introduces novel approaches for both visual region of interest extraction and visual feature extraction for use in audio-visual automatic speech recognition. In particular, the speaker‘s movement that occurs during speech is used to isolate the mouth region in video sequences and motionbased features obtained from this region are used to provide new visual features for audio-visual automatic speech recognition. The mouth region extraction approach proposed in this work is shown to give superior performance compared with existing colour-based lip segmentation methods. The new features are obtained from three separate representations of motion in the region of interest, namely the difference in luminance between successive images, block matching based motion vectors and optical flow. The new visual features are found to improve visual-only and audiovisual speech recognition performance when compared with the commonly-used appearance feature-based methods. In addition, a novel approach is proposed for visual feature extraction from either the discrete cosine transform or discrete wavelet transform representations of the mouth region of the speaker. In this work, the image transform is explored from a new viewpoint of data discrimination; in contrast to the more conventional data preservation viewpoint. The main findings of this work are that audio-visual automatic speech recognition systems using the new features extracted from the frequency bands selected according to their discriminatory abilities generally outperform those using features designed for data preservation. To establish the noise robustness of the new features proposed in this work, their performance has been studied in presence of a range of different types of noise and at various signal-to-noise ratios. In these experiments, the audio-visual automatic speech recognition systems based on the new approaches were found to give superior performance both to audio-visual systems using appearance based features and to audio-only speech recognition systems

    Identification of Transient Speech Using Wavelet Transforms

    Get PDF
    It is generally believed that abrupt stimulus changes, which in speech may be time-varying frequency edges associated with consonants, transitions between consonants and vowels and transitions within vowels are critical to the perception of speech by humans and for speech recognition by machines. Noise affects speech transitions more than it affects quasi-steady-state speech. I believe that identifying and selectively amplifying speech transitions may enhance the intelligibility of speech in noisy conditions. The purpose of this study is to evaluate the use of wavelet transforms to identify speech transitions. Using wavelet transforms may be computationally efficient and allow for real-time applications. The discrete wavelet transform (DWT), stationary wavelet transform (SWT) and wavelet packets (WP) are evaluated. Wavelet analysis is combined with variable frame rate processing to improve the identification process. Variable frame rate can identify time segments when speech feature vectors are changing rapidly and when they are relatively stationary. Energy profiles for words, which show the energy in each node of a speech signal decomposed using wavelets, are used to identify nodes that include predominately transient information and nodes that include predominately quasi-steady-state information, and these are used to synthesize transient and quasi-steady-state speech components. These speech components are estimates of the tonal and nontonal speech components, which Yoo et al identified using time-varying band-pass filters. Comparison of spectra, a listening test and mean-squared-errors between the transient components synthesized using wavelets and Yoo's nontonal components indicated that wavelet packets identified the best estimates of Yoo's components. An algorithm that incorporates variable frame rate analysis into wavelet packet analysis is proposed. The development of this algorithm involves the processes of choosing a wavelet function and a decomposition level to be used. The algorithm itself has 4 steps: wavelet packet decomposition; classification of terminal nodes; incorporation of variable frame rate processing; synthesis of speech components. Combining wavelet analysis with variable frame rate analysis provides the best estimates of Yoo's speech components
    corecore