4,204 research outputs found

    Combining Multiple Views for Visual Speech Recognition

    Get PDF
    Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (3030^\circ) to up to 83% when combining this view with the frontal and 6060^\circ view angles

    Analysing the importance of different visual feature coefficients

    Get PDF
    A study is presented to determine the relative importance of different visual features for speech recognition which includes pixel-based, model-based, contour-based and physical features. Analysis to determine the discriminability of features is per- formed through F-ratio and J-measures for both static and tem- poral derivatives, the results of which were found to correlate highly with speech recognition accuracy (r = 0.97). Princi- pal component analysis is then used to combine all visual fea- tures into a single feature vector, of which further analysis is performed on the resulting basis functions. An optimal feature vector is obtained which outperforms the best individual feature (AAM) with 93.5 % word accuracy

    A statistical multiresolution approach for face recognition using structural hidden Markov models

    Get PDF
    This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy

    Hubungan gaya pembelajaran dengan pencapaian akademik pelajar aliran vokasional

    Get PDF
    Analisis keputusan Sijil Pelajaran Malaysia (SPM) 2011 menunjukkan penurunan pencapaian bagi Sekolah Menengah Vokasional. Oleh itu, kajian ini dilaksanakan bertujuan untuk mengkaji hubungan di antara gaya pembelajaran dengan pencapaian akademik pelajar. Kajian ini juga ingin mengenalpasti gaya pembelajaran paling dominan yang diamalkan oleh pelajar serta melihat perbezaan gaya pembelajaran dengan jantina pelajar. Seramai 131 orang Pelajar Tingkatan Empat Kursus Vokasional Di Sekolah Menengah Vokasional Segamat di Johor telah terlibat dalam kajian ini. Soal selidik Index of Learning Style (ILS) yang dibangunkan oleh Felder dan Silverman (1991) yang mengandungi 44 soalan telah digunakan untukh menjalankan kajian ini. Gaya pembelajaran pelajar dapat dilihat melalui empat dimensi gaya pembelajaran yang terdiri dari dua sub-skala yang bertentangan iaitu dimensi pelajar Aktif dan Reflektif, dimensi pelajar Konkrit dan Intuitif, dimensi pelajar Verbal dan Visual, serta dimensi pelajar Tersusun dan Global. Data yang diperolehi dianalisis dengan menggunakan perisian Statistical Package for Social Science for WINDOW release 20.0 (SPSS.20.0). Ujian Korelasi Pearson digunakan untuk menganalisis data dalam mengkaji hubungan gaya pembelajaran dengan pencapaian akademik pelajar. Nilai pekali p yang diperolehi di antara gaya pembelajaran dengan pencapaian pelajar adalah (p=0.1 hingga 0.4). Ini menunjukkan tidak terdapat hubungan yang signifikan di antara dua pembolehubah tersebut. Kajian ini juga mendapati bahawa gaya pembelajaran yang menjadi amalan pelajar ialah gaya pembelajaran Tersusun. Hasil kajian juga mendapati bahawa tidak terdapat perbezaan yang signifikan di antara gaya pembelajaran dengan jantina pelajar

    Generalized multi-stream hidden Markov models.

    Get PDF
    For complex classification systems, data is usually gathered from multiple sources of information that have varying degree of reliability. In fact, assuming that the different sources have the same relevance in describing all the data might lead to an erroneous behavior. The classification error accumulates and can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is compelling evidence that learning algorithms should include a relevance weight for each source of information (stream) as a parameter that needs to be learned. In this dissertation, we assumed that the multi-stream temporal data is generated by independent and synchronous streams. Using this assumption, we develop, implement, and test multi- stream continuous and discrete hidden Markov model (HMM) algorithms. For the discrete case, we propose two new approaches to generalize the baseline discrete HMM. The first one combines unsupervised learning, feature discrimination, standard discrete HMMs and weighted distances to learn the codebook with feature-dependent weights for each symbol. The second approach consists of modifying the HMM structure to include stream relevance weights, generalizing the standard discrete Baum-Welch learning algorithm, and deriving the necessary conditions to optimize all model parameters simultaneously. We also generalize the minimum classification error (MCE) discriminative training algorithm to include stream relevance weights. For the continuous HMM, we introduce a. new approach that integrates the stream relevance weights in the objective function. Our approach is based on the linearization of the probability density function. Two variations are proposed: the mixture and state level variations. As in the discrete case, we generalize the continuous Baum-Welch learning algorithm to accommodate these changes, and we derive the necessary conditions for updating the model parameters. We also generalize the MCE learning algorithm to derive the necessary conditions for the model parameters\u27 update. The proposed discrete and continuous HMM are tested on synthetic data sets. They are also validated on various applications including Australian Sign Language, audio classification, face classification, and more extensively on the problem of landmine detection using ground penetrating radar data. For all applications, we show that considerable improvement can be achieved compared to the baseline HMM and the existing multi-stream HMM algorithms

    Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models

    Get PDF
    A semi-continuous hidden Markov model based on the muluple vector quantization codebooks is used here for large.vocabulary speaker-independent continuous speech recognition in the techn,ques employed here. the semi-continuous output probab~hty densHy function for each codebook is represented by a comhinat,on of the corre,~ponding discrete output probablhttes of the hidden Markov model end the continuous Gauss,an den. stay functions of each individual codebook. Parameters of vec. tor qusnttzation codebook and hidden Markov model are mutuully optimized to achJeve an optimal model'codebook comb,nation under a untried probab,hshc framework Another advantages of thts approach is the enhanced robustness of the semi. continuous output probability by the combination of multiple codewords and multtple codebooks For a 1000.word speakermdependen

    Exploration and Optimization of Noise Reduction Algorithms for Speech Recognition in Embedded Devices

    Get PDF
    Environmental noise present in real-life applications substantially degrades the performance of speech recognition systems. An example is an in-car scenario where a speech recognition system has to support the man-machine interface. Several sources of noise coming from the engine, wipers, wheels etc., interact with speech. Special challenge is given in an open window scenario, where noise of traffic, park noise, etc., has to be regarded. The main goal of this thesis is to improve the performance of a speech recognition system based on a state-of-the-art hidden Markov model (HMM) using noise reduction methods. The performance is measured with respect to word error rate and with the method of mutual information. The noise reduction methods are based on weighting rules. Least-squares weighting rules in the frequency domain have been developed to enable a continuous development based on the existing system and also to guarantee its low complexity and footprint for applications in embedded devices. The weighting rule parameters are optimized employing a multidimensional optimization task method of Monte Carlo followed by a compass search method. Root compression and cepstral smoothing methods have also been implemented to boost the recognition performance. The additional complexity and memory requirements of the proposed system are minimum. The performance of the proposed system was compared to the European Telecommunications Standards Institute (ETSI) standardized system. The proposed system outperforms the ETSI system by up to 8.6 % relative increase in word accuracy and achieves up to 35.1 % relative increase in word accuracy compared to the existing baseline system on the ETSI Aurora 3 German task. A relative increase of up to 18 % in word accuracy over the existing baseline system is also obtained from the proposed weighting rules on large vocabulary databases. An entropy-based feature vector analysis method has also been developed to assess the quality of feature vectors. The entropy estimation is based on the histogram approach. The method has the advantage to objectively asses the feature vector quality regardless of the acoustic modeling assumption used in the speech recognition system
    corecore