13,214 research outputs found
Zero-Shot Deep Domain Adaptation
Domain adaptation is an important tool to transfer knowledge about a task
(e.g. classification) learned in a source domain to a second, or target domain.
Current approaches assume that task-relevant target-domain data is available
during training. We demonstrate how to perform domain adaptation when no such
task-relevant target-domain data is available. To tackle this issue, we propose
zero-shot deep domain adaptation (ZDDA), which uses privileged information from
task-irrelevant dual-domain pairs. ZDDA learns a source-domain representation
which is not only tailored for the task of interest but also close to the
target-domain representation. Therefore, the source-domain task of interest
solution (e.g. a classifier for classification tasks) which is jointly trained
with the source-domain representation can be applicable to both the source and
target representations. Using the MNIST, Fashion-MNIST, NIST, EMNIST, and SUN
RGB-D datasets, we show that ZDDA can perform domain adaptation in
classification tasks without access to task-relevant target-domain training
data. We also extend ZDDA to perform sensor fusion in the SUN RGB-D scene
classification task by simulating task-relevant target-domain representations
with task-relevant source-domain data. To the best of our knowledge, ZDDA is
the first domain adaptation and sensor fusion method which requires no
task-relevant target-domain data. The underlying principle is not particular to
computer vision data, but should be extensible to other domains.Comment: This paper is accepted to the European Conference on Computer Vision
(ECCV), 201
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Image annotation aims to annotate a given image with a variable number of
class labels corresponding to diverse visual concepts. In this paper, we
address two main issues in large-scale image annotation: 1) how to learn a rich
feature representation suitable for predicting a diverse set of visual concepts
ranging from object, scene to abstract concept; 2) how to annotate an image
with the optimal number of class labels. To address the first issue, we propose
a novel multi-scale deep model for extracting rich and discriminative features
capable of representing a wide range of visual concepts. Specifically, a novel
two-branch deep neural network architecture is proposed which comprises a very
deep main network branch and a companion feature fusion network branch designed
for fusing the multi-scale features computed from the main branch. The deep
model is also made multi-modal by taking noisy user-provided tags as model
input to complement the image input. For tackling the second issue, we
introduce a label quantity prediction auxiliary task to the main label
prediction task to explicitly estimate the optimal label number for a given
image. Extensive experiments are carried out on two large-scale image
annotation benchmark datasets and the results show that our method
significantly outperforms the state-of-the-art.Comment: Submited to IEEE TI
- …