41,487 research outputs found
CentralNet: a Multilayer Approach for Multimodal Fusion
This paper proposes a novel multimodal fusion approach, aiming to produce
best possible decisions by integrating information coming from multiple media.
While most of the past multimodal approaches either work by projecting the
features of different modalities into the same space, or by coordinating the
representations of each modality through the use of constraints, our approach
borrows from both visions. More specifically, assuming each modality can be
processed by a separated deep convolutional network, allowing to take decisions
independently from each modality, we introduce a central network linking the
modality specific networks. This central network not only provides a common
feature embedding but also regularizes the modality specific networks through
the use of multi-task learning. The proposed approach is validated on 4
different computer vision tasks on which it consistently improves the accuracy
of existing multimodal fusion approaches
Deep Architectures and Ensembles for Semantic Video Classification
This work addresses the problem of accurate semantic labelling of short
videos. To this end, a multitude of different deep nets, ranging from
traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks
(FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others.
Additionally, we also propose a residual architecture-based DNN for video
classification, with state-of-the art classification performance at
significantly reduced complexity. Furthermore, we propose four new approaches
to diversity-driven multi-net ensembling, one based on fast correlation measure
and three incorporating a DNN-based combiner. We show that significant
performance gains can be achieved by ensembling diverse nets and we investigate
factors contributing to high diversity. Based on the extensive YouTube8M
dataset, we provide an in-depth evaluation and analysis of their behaviour. We
show that the performance of the ensemble is state-of-the-art achieving the
highest accuracy on the YouTube-8M Kaggle test data. The performance of the
ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets,
and show that the resulting method achieves comparable accuracy with
state-of-the-art methods using similar input features
Generalised Decision Level Ensemble Method for Classifying Multi-media Data
In recent decades, multimedia data have been commonly generated and used in various domains, such as in healthcare and social media due to their ability of capturing rich information. But as they are unstructured and separated, how to fuse and integrate multimedia datasets and then learn from them eectively have been a main challenge to machine learning. We present a novel generalised decision level ensemble method (GDLEM) that combines the multimedia datasets at decision level. After extracting features from each of multimedia datasets separately, the method trains models independently on each media dataset and then employs a generalised selection function to choose the appropriate models to construct a heterogeneous ensemble. The selection function is dened as a weighted combination of two criteria: the accuracy of individual models and the diversity among the models. The framework is tested on multimedia data and compared with other heterogeneous ensembles. The results show that the GDLEM is more exible and eective
Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models
Background: In this paper we present the approaches and methods employed in
order to deal with a large scale multi-label semantic indexing task of
biomedical papers. This work was mainly implemented within the context of the
BioASQ challenge of 2014. Methods: The main contribution of this work is a
multi-label ensemble method that incorporates a McNemar statistical
significance test in order to validate the combination of the constituent
machine learning algorithms. Some secondary contributions include a study on
the temporal aspects of the BioASQ corpus (observations apply also to the
BioASQ's super-set, the PubMed articles collection) and the proper adaptation
of the algorithms used to deal with this challenging classification task.
Results: The ensemble method we developed is compared to other approaches in
experimental scenarios with subsets of the BioASQ corpus giving positive
results. During the BioASQ 2014 challenge we obtained the first place during
the first batch and the third in the two following batches. Our success in the
BioASQ challenge proved that a fully automated machine-learning approach, which
does not implement any heuristics and rule-based approaches, can be highly
competitive and outperform other approaches in similar challenging contexts
Structured learning of metric ensembles with application to person re-identification
Matching individuals across non-overlapping camera networks, known as person
re-identification, is a fundamentally challenging problem due to the large
visual appearance changes caused by variations of viewpoints, lighting, and
occlusion. Approaches in literature can be categoried into two streams: The
first stream is to develop reliable features against realistic conditions by
combining several visual features in a pre-defined way; the second stream is to
learn a metric from training data to ensure strong inter-class differences and
intra-class similarities. However, seeking an optimal combination of visual
features which is generic yet adaptive to different benchmarks is a unsoved
problem, and metric learning models easily get over-fitted due to the scarcity
of training data in person re-identification. In this paper, we propose two
effective structured learning based approaches which explore the adaptive
effects of visual features in recognizing persons in different benchmark data
sets. Our framework is built on the basis of multiple low-level visual features
with an optimal ensemble of their metrics. We formulate two optimization
algorithms, CMCtriplet and CMCstruct, which directly optimize evaluation
measures commonly used in person re-identification, also known as the
Cumulative Matching Characteristic (CMC) curve.Comment: 16 pages. Extended version of "Learning to Rank in Person
Re-Identification With Metric Ensembles", at
http://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Paisitkriangkrai_Learning_to_Rank_2015_CVPR_paper.html.
arXiv admin note: text overlap with arXiv:1503.0154
- …