356 research outputs found
Fusion for Audio-Visual Laughter Detection
Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus.
Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTA-PLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio.
We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification
Learning Deep Representations of Appearance and Motion for Anomalous Event Detection
We present a novel unsupervised deep learning framework for anomalous event
detection in complex video scenes. While most existing works merely use
hand-crafted appearance and motion features, we propose Appearance and Motion
DeepNet (AMDN) which utilizes deep neural networks to automatically learn
feature representations. To exploit the complementary information of both
appearance and motion patterns, we introduce a novel double fusion framework,
combining both the benefits of traditional early fusion and late fusion
strategies. Specifically, stacked denoising autoencoders are proposed to
separately learn both appearance and motion features as well as a joint
representation (early fusion). Based on the learned representations, multiple
one-class SVM models are used to predict the anomaly scores of each input,
which are then integrated with a late fusion strategy for final anomaly
detection. We evaluate the proposed method on two publicly available video
surveillance datasets, showing competitive performance with respect to state of
the art approaches.Comment: Oral paper in BMVC 201
Insights from Classifying Visual Concepts with Multiple Kernel Learning
Combining information from various image features has become a standard
technique in concept recognition tasks. However, the optimal way of fusing the
resulting kernel functions is usually unknown in practical applications.
Multiple kernel learning (MKL) techniques allow to determine an optimal linear
combination of such similarity matrices. Classical approaches to MKL promote
sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often
observed to be outperformed by an unweighted sum kernel. The contribution of
this paper is twofold: We apply a recently developed non-sparse MKL variant to
state-of-the-art concept recognition tasks within computer vision. We provide
insights on benefits and limits of non-sparse MKL and compare it against its
direct competitors, the sum kernel SVM and the sparse MKL. We report empirical
results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo
Annotation challenge data sets. About to be submitted to PLoS ONE.Comment: 18 pages, 8 tables, 4 figures, format deviating from plos one
submission format requirements for aesthetic reason
Artifact removal from electroencephalograms using a hybrid BSS-SVM algorithm
Artifacts such as eye blinks and heart rhythm (ECG) cause the main interfering signals within electroencephalogram (EEG) measurements. Therefore, we propose a method for artifact removal based on exploitation of certain carefully chosen statistical features of independent components extracted from the EEGs, by fusing support vector machines (SVMs) and blind source separation (BSS). We use the second-order blind identification (SOBI) algorithm to separate the EEG into statistically independent sources and SVMs to identify the artifact components and thereby to remove such signals. The remaining independent components are remixed to reproduce the artifact-free EEGs. Objective and subjective assessment of the simulation results shows that the algorithm is successful in mitigating the interference within EEGs
GOGMA: Globally-Optimal Gaussian Mixture Alignment
Gaussian mixture alignment is a family of approaches that are frequently used
for robustly solving the point-set registration problem. However, since they
use local optimisation, they are susceptible to local minima and can only
guarantee local optimality. Consequently, their accuracy is strongly dependent
on the quality of the initialisation. This paper presents the first
globally-optimal solution to the 3D rigid Gaussian mixture alignment problem
under the L2 distance between mixtures. The algorithm, named GOGMA, employs a
branch-and-bound approach to search the space of 3D rigid motions SE(3),
guaranteeing global optimality regardless of the initialisation. The geometry
of SE(3) was used to find novel upper and lower bounds for the objective
function and local optimisation was integrated into the scheme to accelerate
convergence without voiding the optimality guarantee. The evaluation
empirically supported the optimality proof and showed that the method performed
much more robustly on two challenging datasets than an existing
globally-optimal registration solution.Comment: Manuscript in press 2016 IEEE Conference on Computer Vision and
Pattern Recognitio
Comparison of different classification algorithms for underwater target discrimination
Includes bibliographical references.Classification of underwater targets from the acoustic backscattered signals is considered here. Several different classification algorithms are tested and benchmarked not only for their performance but also to gain insight to the properties of the feature space. Results on a wideband 80-kHz acoustic backscattered data set collected for six different objects are presented in terms of the receiver operating characteristic (ROC) and robustness of the classifiers wrt reverberation.This work was supported by the Office of Naval Research, Biosonar Program, under Grant N00014-99-1-0166 and Grant N00014-01-1-0307. Data and technical support were provided by the NSWC, Coastal Systems Station, Panama City, FL
- …