10,873 research outputs found
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
Learning Task Relatedness in Multi-Task Learning for Images in Context
Multimedia applications often require concurrent solutions to multiple tasks.
These tasks hold clues to each-others solutions, however as these relations can
be complex this remains a rarely utilized property. When task relations are
explicitly defined based on domain knowledge multi-task learning (MTL) offers
such concurrent solutions, while exploiting relatedness between multiple tasks
performed over the same dataset. In most cases however, this relatedness is not
explicitly defined and the domain expert knowledge that defines it is not
available. To address this issue, we introduce Selective Sharing, a method that
learns the inter-task relatedness from secondary latent features while the
model trains. Using this insight, we can automatically group tasks and allow
them to share knowledge in a mutually beneficial way. We support our method
with experiments on 5 datasets in classification, regression, and ranking tasks
and compare to strong baselines and state-of-the-art approaches showing a
consistent improvement in terms of accuracy and parameter counts. In addition,
we perform an activation region analysis showing how Selective Sharing affects
the learned representation.Comment: To appear in ICMR 2019 (Oral + Lightning Talk + Poster
CASSL: Curriculum Accelerated Self-Supervised Learning
Recent self-supervised learning approaches focus on using a few thousand data
points to learn policies for high-level, low-dimensional action spaces.
However, scaling this framework for high-dimensional control require either
scaling up the data collection efforts or using a clever sampling strategy for
training. We present a novel approach - Curriculum Accelerated Self-Supervised
Learning (CASSL) - to train policies that map visual information to high-level,
higher- dimensional action spaces. CASSL orders the sampling of training data
based on control dimensions: the learning and sampling are focused on few
control parameters before other parameters. The right curriculum for learning
is suggested by variance-based global sensitivity analysis of the control
space. We apply our CASSL framework to learning how to grasp using an adaptive,
underactuated multi-fingered gripper, a challenging system to control. Our
experimental results indicate that CASSL provides significant improvement and
generalization compared to baseline methods such as staged curriculum learning
(8% increase) and complete end-to-end learning with random exploration (14%
improvement) tested on a set of novel objects
- …