1,782 research outputs found
Face recognition in different subspaces - A comparative study
Face recognition is one of the most successful applications of image analysis and understanding and has gained much attention in recent years. Among many approaches to the problem of face recognition, appearance-based subspace analysis still gives the most promising results. In this paper we study the three most popular appearance-based face recognition projection methods (PCA, LDA and ICA). All methods are tested in equal working conditions regarding preprocessing and algorithm implementation on the FERET data set with its standard tests. We also compare the ICA method with its whitening preprocess and find out that there is no significant difference between them. When we compare different projection with different metrics we found out that the LDA+COS combination is the most promising for all tasks. The L1 metric gives the best results in
combination with PCA and ICA1, and COS is superior to any other metric when used with LDA and ICA2. Our results are compared to other studies and some discrepancies are pointed ou
Deep Multimodal Learning for Audio-Visual Speech Recognition
In this paper, we present methods in deep multimodal learning for fusing
speech and visual modalities for Audio-Visual Automatic Speech Recognition
(AV-ASR). First, we study an approach where uni-modal deep networks are trained
separately and their final hidden layers fused to obtain a joint feature space
in which another deep network is built. While the audio network alone achieves
a phone error rate (PER) of under clean condition on the IBM large
vocabulary audio-visual studio dataset, this fusion model achieves a PER of
demonstrating the tremendous value of the visual channel in phone
classification even in audio with high signal to noise ratio. Second, we
present a new deep network architecture that uses a bilinear softmax layer to
account for class specific correlations between modalities. We show that
combining the posteriors from the bilinear networks with those from the fused
model mentioned above results in a further significant phone error rate
reduction, yielding a final PER of .Comment: ICASSP 201
- …