58,539 research outputs found
Improving acoustic vehicle classification by information fusion
We present an information fusion approach for ground vehicle classification based on the emitted acoustic signal. Many acoustic factors can contribute to the classification accuracy of working ground vehicles. Classification relying on a single feature set may lose some useful information if its underlying sound production model is not comprehensive. To improve classification accuracy, we consider an information fusion diagram, in which various aspects of an acoustic signature are taken into account and emphasized separately by two different feature extraction methods. The first set of features aims to represent internal sound production, and a number of harmonic components are extracted to characterize the factors related to the vehicle’s resonance. The second set of features is extracted based on a computationally effective discriminatory analysis, and a group of key frequency components are selected by mutual information, accounting for the sound production from the vehicle’s exterior parts. In correspondence with this structure, we further put forward a modifiedBayesian fusion algorithm, which takes advantage of matching each specific feature set with its favored classifier. To assess the proposed approach, experiments are carried out based on a data set containing acoustic signals from different types of vehicles. Results indicate that the fusion approach can effectively increase classification accuracy compared to that achieved using each individual features set alone. The Bayesian-based decision level fusion is found fusion is found to be improved than a feature level fusion approac
Deep Multimodal Learning for Audio-Visual Speech Recognition
In this paper, we present methods in deep multimodal learning for fusing
speech and visual modalities for Audio-Visual Automatic Speech Recognition
(AV-ASR). First, we study an approach where uni-modal deep networks are trained
separately and their final hidden layers fused to obtain a joint feature space
in which another deep network is built. While the audio network alone achieves
a phone error rate (PER) of under clean condition on the IBM large
vocabulary audio-visual studio dataset, this fusion model achieves a PER of
demonstrating the tremendous value of the visual channel in phone
classification even in audio with high signal to noise ratio. Second, we
present a new deep network architecture that uses a bilinear softmax layer to
account for class specific correlations between modalities. We show that
combining the posteriors from the bilinear networks with those from the fused
model mentioned above results in a further significant phone error rate
reduction, yielding a final PER of .Comment: ICASSP 201
Feature Level Fusion of Face and Fingerprint Biometrics
The aim of this paper is to study the fusion at feature extraction level for
face and fingerprint biometrics. The proposed approach is based on the fusion
of the two traits by extracting independent feature pointsets from the two
modalities, and making the two pointsets compatible for concatenation.
Moreover, to handle the problem of curse of dimensionality, the feature
pointsets are properly reduced in dimension. Different feature reduction
techniques are implemented, prior and after the feature pointsets fusion, and
the results are duly recorded. The fused feature pointset for the database and
the query face and fingerprint images are matched using techniques based on
either the point pattern matching, or the Delaunay triangulation. Comparative
experiments are conducted on chimeric and real databases, to assess the actual
advantage of the fusion performed at the feature extraction level, in
comparison to the matching score level.Comment: 6 pages, 7 figures, conferenc
Facial emotion recognition using min-max similarity classifier
Recognition of human emotions from the imaging templates is useful in a wide
variety of human-computer interaction and intelligent systems applications.
However, the automatic recognition of facial expressions using image template
matching techniques suffer from the natural variability with facial features
and recording conditions. In spite of the progress achieved in facial emotion
recognition in recent years, the effective and computationally simple feature
selection and classification technique for emotion recognition is still an open
problem. In this paper, we propose an efficient and straightforward facial
emotion recognition algorithm to reduce the problem of inter-class pixel
mismatch during classification. The proposed method includes the application of
pixel normalization to remove intensity offsets followed-up with a Min-Max
metric in a nearest neighbor classifier that is capable of suppressing feature
outliers. The results indicate an improvement of recognition performance from
92.85% to 98.57% for the proposed Min-Max classification method when tested on
JAFFE database. The proposed emotion recognition technique outperforms the
existing template matching methods
- …