4,204 research outputs found
Combining Multiple Views for Visual Speech Recognition
Visual speech recognition is a challenging research problem with a particular
practical application of aiding audio speech recognition in noisy scenarios.
Multiple camera setups can be beneficial for the visual speech recognition
systems in terms of improved performance and robustness. In this paper, we
explore this aspect and provide a comprehensive study on combining multiple
views for visual speech recognition. The thorough analysis covers fusion of all
possible view angle combinations both at feature level and decision level. The
employed visual speech recognition system in this study extracts features
through a PCA-based convolutional neural network, followed by an LSTM network.
Finally, these features are processed in a tandem system, being fed into a
GMM-HMM scheme. The decision fusion acts after this point by combining the
Viterbi path log-likelihoods. The results show that the complementary
information contained in recordings from different view angles improves the
results significantly. For example, the sentence correctness on the test set is
increased from 76% for the highest performing single view () to up to
83% when combining this view with the frontal and view angles
Analysing the importance of different visual feature coefficients
A study is presented to determine the relative importance of different visual features for speech recognition which includes pixel-based, model-based, contour-based and physical features. Analysis to determine the discriminability of features is per- formed through F-ratio and J-measures for both static and tem- poral derivatives, the results of which were found to correlate highly with speech recognition accuracy (r = 0.97). Princi- pal component analysis is then used to combine all visual fea- tures into a single feature vector, of which further analysis is performed on the resulting basis functions. An optimal feature vector is obtained which outperforms the best individual feature (AAM) with 93.5 % word accuracy
A statistical multiresolution approach for face recognition using structural hidden Markov models
This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy
Hubungan gaya pembelajaran dengan pencapaian akademik pelajar aliran vokasional
Analisis keputusan Sijil Pelajaran Malaysia (SPM) 2011 menunjukkan penurunan
pencapaian bagi Sekolah Menengah Vokasional. Oleh itu, kajian ini dilaksanakan
bertujuan untuk mengkaji hubungan di antara gaya pembelajaran dengan pencapaian
akademik pelajar. Kajian ini juga ingin mengenalpasti gaya pembelajaran paling
dominan yang diamalkan oleh pelajar serta melihat perbezaan gaya pembelajaran
dengan jantina pelajar. Seramai 131 orang Pelajar Tingkatan Empat Kursus
Vokasional Di Sekolah Menengah Vokasional Segamat di Johor telah terlibat dalam
kajian ini. Soal selidik Index of Learning Style (ILS) yang dibangunkan oleh Felder
dan Silverman (1991) yang mengandungi 44 soalan telah digunakan untukh
menjalankan kajian ini. Gaya pembelajaran pelajar dapat dilihat melalui empat
dimensi gaya pembelajaran yang terdiri dari dua sub-skala yang bertentangan iaitu
dimensi pelajar Aktif dan Reflektif, dimensi pelajar Konkrit dan Intuitif, dimensi
pelajar Verbal dan Visual, serta dimensi pelajar Tersusun dan Global. Data yang
diperolehi dianalisis dengan menggunakan perisian Statistical Package for Social
Science for WINDOW release 20.0 (SPSS.20.0). Ujian Korelasi Pearson digunakan
untuk menganalisis data dalam mengkaji hubungan gaya pembelajaran dengan
pencapaian akademik pelajar. Nilai pekali p yang diperolehi di antara gaya
pembelajaran dengan pencapaian pelajar adalah (p=0.1 hingga 0.4). Ini menunjukkan
tidak terdapat hubungan yang signifikan di antara dua pembolehubah tersebut. Kajian
ini juga mendapati bahawa gaya pembelajaran yang menjadi amalan pelajar ialah
gaya pembelajaran Tersusun. Hasil kajian juga mendapati bahawa tidak terdapat
perbezaan yang signifikan di antara gaya pembelajaran dengan jantina pelajar
Generalized multi-stream hidden Markov models.
For complex classification systems, data is usually gathered from multiple sources of information that have varying degree of reliability. In fact, assuming that the different sources have the same relevance in describing all the data might lead to an erroneous behavior. The classification error accumulates and can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is compelling evidence that learning algorithms should include a relevance weight for each source of information (stream) as a parameter that needs to be learned. In this dissertation, we assumed that the multi-stream temporal data is generated by independent and synchronous streams. Using this assumption, we develop, implement, and test multi- stream continuous and discrete hidden Markov model (HMM) algorithms. For the discrete case, we propose two new approaches to generalize the baseline discrete HMM. The first one combines unsupervised learning, feature discrimination, standard discrete HMMs and weighted distances to learn the codebook with feature-dependent weights for each symbol. The second approach consists of modifying the HMM structure to include stream relevance weights, generalizing the standard discrete Baum-Welch learning algorithm, and deriving the necessary conditions to optimize all model parameters simultaneously. We also generalize the minimum classification error (MCE) discriminative training algorithm to include stream relevance weights. For the continuous HMM, we introduce a. new approach that integrates the stream relevance weights in the objective function. Our approach is based on the linearization of the probability density function. Two variations are proposed: the mixture and state level variations. As in the discrete case, we generalize the continuous Baum-Welch learning algorithm to accommodate these changes, and we derive the necessary conditions for updating the model parameters. We also generalize the MCE learning algorithm to derive the necessary conditions for the model parameters\u27 update. The proposed discrete and continuous HMM are tested on synthetic data sets. They are also validated on various applications including Australian Sign Language, audio classification, face classification, and more extensively on the problem of landmine detection using ground penetrating radar data. For all applications, we show that considerable improvement can be achieved compared to the baseline HMM and the existing multi-stream HMM algorithms
Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models
A semi-continuous hidden Markov model based on the muluple vector quantization codebooks is used here for large.vocabulary speaker-independent continuous speech recognition in the techn,ques employed here. the semi-continuous output probab~hty densHy function for each codebook is represented by a comhinat,on of the corre,~ponding discrete output probablhttes of the hidden Markov model end the continuous Gauss,an den. stay functions of each individual codebook. Parameters of vec. tor qusnttzation codebook and hidden Markov model are mutuully optimized to achJeve an optimal model'codebook comb,nation under a untried probab,hshc framework Another advantages of thts approach is the enhanced robustness of the semi. continuous output probability by the combination of multiple codewords and multtple codebooks For a 1000.word speakermdependen
Exploration and Optimization of Noise Reduction Algorithms for Speech Recognition in Embedded Devices
Environmental noise present in real-life applications substantially degrades the performance of speech recognition systems. An example is an in-car scenario where a speech recognition system has to support the man-machine interface. Several sources of noise coming from the engine, wipers, wheels etc., interact with speech. Special challenge is given in an open window scenario, where noise of traffic, park noise, etc., has to be regarded. The main goal of this thesis is to improve the performance of a speech recognition system based on a state-of-the-art hidden Markov model (HMM) using noise reduction methods. The performance is measured with respect to word error rate and with the method of mutual information. The noise reduction methods are based on weighting rules. Least-squares weighting rules in the frequency domain have been developed to enable a continuous development based on the existing system and also to guarantee its low complexity and footprint for applications in embedded devices. The weighting rule parameters are optimized employing a multidimensional optimization task method of Monte Carlo followed by a compass search method. Root compression and cepstral smoothing methods have also been implemented to boost the recognition performance. The additional complexity and memory requirements of the proposed system are minimum. The performance of the proposed system was compared to the European Telecommunications Standards Institute (ETSI) standardized system. The proposed system outperforms the ETSI system by up to 8.6 % relative increase in word accuracy and achieves up to 35.1 % relative increase in word accuracy compared to the existing baseline system on the ETSI Aurora 3 German task. A relative increase of up to 18 % in word accuracy over the existing baseline system is also obtained from the proposed weighting rules on large vocabulary databases. An entropy-based feature vector analysis method has also been developed to assess the quality of feature vectors. The entropy estimation is based on the histogram approach. The method has the advantage to objectively asses the feature vector quality regardless of the acoustic modeling assumption used in the speech recognition system
- …