2 research outputs found
Multimodal Prediction based on Graph Representations
This paper proposes a learning model, based on rank-fusion graphs, for
general applicability in multimodal prediction tasks, such as multimodal
regression and image classification. Rank-fusion graphs encode information from
multiple descriptors and retrieval models, thus being able to capture
underlying relationships between modalities, samples, and the collection
itself. The solution is based on the encoding of multiple ranks for a query (or
test sample), defined according to different criteria, into a graph. Later, we
project the generated graph into an induced vector space, creating fusion
vectors, targeting broader generality and efficiency. A fusion vector estimator
is then built to infer whether a multimodal input object refers to a class or
not. Our method is capable of promoting a fusion model better than early-fusion
and late-fusion alternatives. Performed experiments in the context of multiple
multimodal and visual datasets, as well as several descriptors and retrieval
models, demonstrate that our learning model is highly effective for different
prediction scenarios involving visual, textual, and multimodal features,
yielding better effectiveness than state-of-the-art methods
Automatic detection of passable roads after floods in remote sensed and social media data
This paper addresses the problem of floods classification and floods aftermath detection based on both social
media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task.
The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which
were developed for two corresponding tasks at the MediaEval 2018 benchmarking challenge. The tasks are (i)
identification of images providing evidence for road passability and (ii) differentiation and detection of passable
and non-passable roads in images from two complementary sources of information. For the first challenge,
we mainly rely on object and scene-level features extracted through multiple deep models pre-trained on the
ImageNet and Places datasets. The object and scene-level features are then combined using early, late and double
fusion techniques. To identify whether or not it is possible for a vehicle to pass a road in satellite images, we rely
on Convolutional Neural Networks and a transfer learning-based classification approach. The evaluation of the
proposed methods is carried out on the large-scale datasets provided for the benchmark competition. The results
demonstrate significant improvement in the performance over the recent state-of-art approaches