Search CORE

58,539 research outputs found

Improving acoustic vehicle classification by information fusion

Author: Damarla Thyagaraju
Guo Baofeng
Nixon Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/03/2011
Field of study

We present an information fusion approach for ground vehicle classification based on the emitted acoustic signal. Many acoustic factors can contribute to the classification accuracy of working ground vehicles. Classification relying on a single feature set may lose some useful information if its underlying sound production model is not comprehensive. To improve classification accuracy, we consider an information fusion diagram, in which various aspects of an acoustic signature are taken into account and emphasized separately by two different feature extraction methods. The first set of features aims to represent internal sound production, and a number of harmonic components are extracted to characterize the factors related to the vehicle’s resonance. The second set of features is extracted based on a computationally effective discriminatory analysis, and a group of key frequency components are selected by mutual information, accounting for the sound production from the vehicle’s exterior parts. In correspondence with this structure, we further put forward a modifiedBayesian fusion algorithm, which takes advantage of matching each specific feature set with its favored classifier. To assess the proposed approach, experiments are carried out based on a data set containing acoustic signals from different types of vehicles. Results indicate that the fusion approach can effectively increase classification accuracy compared to that achieved using each individual features set alone. The Bayesian-based decision level fusion is found fusion is found to be improved than a feature level fusion approac

Southampton (e-Prints Soton)

Deep Multimodal Learning for Audio-Visual Speech Recognition

Author: Goel Vaibhava
Marcheret Etienne
Mroueh Youssef
Publication venue
Publication date: 22/01/2015
Field of study

In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of

41\%

under clean condition on the IBM large vocabulary audio-visual studio dataset, this fusion model achieves a PER of

35.83\%

demonstrating the tremendous value of the visual channel in phone classification even in audio with high signal to noise ratio. Second, we present a new deep network architecture that uses a bilinear softmax layer to account for class specific correlations between modalities. We show that combining the posteriors from the bilinear networks with those from the fused model mentioned above results in a further significant phone error rate reduction, yielding a final PER of

34.03\%

.Comment: ICASSP 201

arXiv.org e-Print Archive

Crossref

Feature Level Fusion of Face and Fingerprint Biometrics

Author: Bicego Manuele
Kisku Dakshina Ranjan
Rattani Ajita
Tistarelli Massimo
Publication venue
Publication date: 01/01/2010
Field of study

The aim of this paper is to study the fusion at feature extraction level for face and fingerprint biometrics. The proposed approach is based on the fusion of the two traits by extracting independent feature pointsets from the two modalities, and making the two pointsets compatible for concatenation. Moreover, to handle the problem of curse of dimensionality, the feature pointsets are properly reduced in dimension. Different feature reduction techniques are implemented, prior and after the feature pointsets fusion, and the results are duly recorded. The fused feature pointset for the database and the query face and fingerprint images are matched using techniques based on either the point pattern matching, or the Delaunay triangulation. Comparative experiments are conducted on chimeric and real databases, to assess the actual advantage of the fusion performed at the feature extraction level, in comparison to the matching score level.Comment: 6 pages, 7 figures, conferenc

arXiv.org e-Print Archive

CiteSeerX

Facial emotion recognition using min-max similarity classifier

Author: James Alex Pappachen
Krestinskaya Olga
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Recognition of human emotions from the imaging templates is useful in a wide variety of human-computer interaction and intelligent systems applications. However, the automatic recognition of facial expressions using image template matching techniques suffer from the natural variability with facial features and recording conditions. In spite of the progress achieved in facial emotion recognition in recent years, the effective and computationally simple feature selection and classification technique for emotion recognition is still an open problem. In this paper, we propose an efficient and straightforward facial emotion recognition algorithm to reduce the problem of inter-class pixel mismatch during classification. The proposed method includes the application of pixel normalization to remove intensity offsets followed-up with a Min-Max metric in a nearest neighbor classifier that is capable of suppressing feature outliers. The results indicate an improvement of recognition performance from 92.85% to 98.57% for the proposed Min-Max classification method when tested on JAFFE database. The proposed emotion recognition technique outperforms the existing template matching methods

arXiv.org e-Print Archive

Crossref