3,791 research outputs found
Bidirectional One-Shot Unsupervised Domain Mapping
We study the problem of mapping between a domain , in which there is a
single training sample and a domain , for which we have a richer training
set. The method we present is able to perform this mapping in both directions.
For example, we can transfer all MNIST images to the visual domain captured by
a single SVHN image and transform the SVHN image to the domain of the MNIST
images. Our method is based on employing one encoder and one decoder for each
domain, without utilizing weight sharing. The autoencoder of the single sample
domain is trained to match both this sample and the latent space of domain .
Our results demonstrate convincing mapping between domains, where either the
source or the target domain are defined by a single sample, far surpassing
existing solutions. Our code is made publicly available at
https://github.com/tomercohen11/BiOSTComment: Accepted to ICCV 201
Transductive Zero-Shot Learning with a Self-training dictionary approach
As an important and challenging problem in computer vision, zero-shot
learning (ZSL) aims at automatically recognizing the instances from unseen
object classes without training data. To address this problem, ZSL is usually
carried out in the following two aspects: 1) capturing the domain distribution
connections between seen classes data and unseen classes data; and 2) modeling
the semantic interactions between the image feature space and the label
embedding space. Motivated by these observations, we propose a bidirectional
mapping based semantic relationship modeling scheme that seeks for crossmodal
knowledge transfer by simultaneously projecting the image features and label
embeddings into a common latent space. Namely, we have a bidirectional
connection relationship that takes place from the image feature space to the
latent space as well as from the label embedding space to the latent space. To
deal with the domain shift problem, we further present a transductive learning
approach that formulates the class prediction problem in an iterative refining
process, where the object classification capacity is progressively reinforced
through bootstrapping-based model updating over highly reliable instances.
Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate
the effectiveness of the proposed approach against the state-of-the-art
approaches
Text2Node: a Cross-Domain System for Mapping Arbitrary Phrases to a Taxonomy
Electronic health record (EHR) systems are used extensively throughout the
healthcare domain. However, data interchangeability between EHR systems is
limited due to the use of different coding standards across systems. Existing
methods of mapping coding standards based on manual human experts mapping,
dictionary mapping, symbolic NLP and classification are unscalable and cannot
accommodate large scale EHR datasets.
In this work, we present Text2Node, a cross-domain mapping system capable of
mapping medical phrases to concepts in a large taxonomy (such as SNOMED CT).
The system is designed to generalize from a limited set of training samples and
map phrases to elements of the taxonomy that are not covered by training data.
As a result, our system is scalable, robust to wording variants between coding
systems and can output highly relevant concepts when no exact concept exists in
the target taxonomy. Text2Node operates in three main stages: first, the
lexicon is mapped to word embeddings; second, the taxonomy is vectorized using
node embeddings; and finally, the mapping function is trained to connect the
two embedding spaces. We compared multiple algorithms and architectures for
each stage of the training, including GloVe and FastText word embeddings, CNN
and Bi-LSTM mapping functions, and node2vec for node embeddings. We confirmed
the robustness and generalisation properties of Text2Node by mapping ICD-9-CM
Diagnosis phrases to SNOMED CT and by zero-shot training at comparable
accuracy.
This system is a novel methodological contribution to the task of normalizing
and linking phrases to a taxonomy, advancing data interchangeability in
healthcare. When applied, the system can use electronic health records to
generate an embedding that incorporates taxonomical medical knowledge to
improve clinical predictive models
Class label autoencoder for zero-shot learning
Existing zero-shot learning (ZSL) methods usually learn a projection function
between a feature space and a semantic embedding space(text or attribute space)
in the training seen classes or testing unseen classes. However, the projection
function cannot be used between the feature space and multi-semantic embedding
spaces, which have the diversity characteristic for describing the different
semantic information of the same class. To deal with this issue, we present a
novel method to ZSL based on learning class label autoencoder (CLA). CLA can
not only build a uniform framework for adapting to multi-semantic embedding
spaces, but also construct the encoder-decoder mechanism for constraining the
bidirectional projection between the feature space and the class label space.
Moreover, CLA can jointly consider the relationship of feature classes and the
relevance of the semantic classes for improving zero-shot classification. The
CLA solution can provide both unseen class labels and the relation of the
different classes representation(feature or semantic information) that can
encode the intrinsic structure of classes. Extensive experiments demonstrate
the CLA outperforms state-of-art methods on four benchmark datasets, which are
AwA, CUB, Dogs and ImNet-2
Domain-Invariant Projection Learning for Zero-Shot Recognition
Zero-shot learning (ZSL) aims to recognize unseen object classes without any
training samples, which can be regarded as a form of transfer learning from
seen classes to unseen ones. This is made possible by learning a projection
between a feature space and a semantic space (e.g. attribute space). Key to ZSL
is thus to learn a projection function that is robust against the often large
domain gap between the seen and unseen classes. In this paper, we propose a
novel ZSL model termed domain-invariant projection learning (DIPL). Our model
has two novel components: (1) A domain-invariant feature self-reconstruction
task is introduced to the seen/unseen class data, resulting in a simple linear
formulation that casts ZSL into a min-min optimization problem. Solving the
problem is non-trivial, and a novel iterative algorithm is formulated as the
solver, with rigorous theoretic algorithm analysis provided. (2) To further
align the two domains via the learned projection, shared semantic structure
among seen and unseen classes is explored via forming superclasses in the
semantic space. Extensive experiments show that our model outperforms the
state-of-the-art alternatives by significant margins.Comment: Accepted to NIPS 201
ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching
We investigate the non-identifiability issues associated with bidirectional
adversarial training for joint distribution matching. Within a framework of
conditional entropy, we propose both adversarial and non-adversarial approaches
to learn desirable matched joint distributions for unsupervised and supervised
tasks. We unify a broad family of adversarial models as joint distribution
matching problems. Our approach stabilizes learning of unsupervised
bidirectional adversarial learning methods. Further, we introduce an extension
for semi-supervised learning tasks. Theoretical results are validated in
synthetic data and real-world applications.Comment: NIPS 2017 (22 pages); short version (9 pages):
http://people.duke.edu/~cl319/doc/papers/nips_2017_alice.pd
Zero and Few Shot Learning with Semantic Feature Synthesis and Competitive Learning
Zero-shot learning (ZSL) is made possible by learning a projection function
between a feature space and a semantic space (e.g.,~an attribute space). Key to
ZSL is thus to learn a projection that is robust against the often large domain
gap between the seen and unseen class domains. In this work, this is achieved
by unseen class data synthesis and robust projection function learning.
Specifically, a novel semantic data synthesis strategy is proposed, by which
semantic class prototypes (e.g., attribute vectors) are used to simply perturb
seen class data for generating unseen class ones. As in any data
synthesis/hallucination approach, there are ambiguities and uncertainties on
how well the synthesised data can capture the targeted unseen class data
distribution. To cope with this, the second contribution of this work is a
novel projection learning model termed competitive bidirectional projection
learning (BPL) designed to best utilise the ambiguous synthesised data.
Specifically, we assume that each synthesised data point can belong to any
unseen class; and the most likely two class candidates are exploited to learn a
robust projection function in a competitive fashion. As a third contribution,
we show that the proposed ZSL model can be easily extended to few-shot learning
(FSL) by again exploiting semantic (class prototype guided) feature synthesis
and competitive BPL. Extensive experiments show that our model achieves the
state-of-the-art results on both problems.Comment: Submitted to IEEE TPAM
Similarity-preserving Image-image Domain Adaptation for Person Re-identification
This article studies the domain adaptation problem in person
re-identification (re-ID) under a "learning via translation" framework,
consisting of two components, 1) translating the labeled images from the source
to the target domain in an unsupervised manner, 2) learning a re-ID model using
the translated images. The objective is to preserve the underlying human
identity information after image translation, so that translated images with
labels are effective for feature learning on the target domain. To this end, we
propose a similarity preserving generative adversarial network (SPGAN) and its
end-to-end trainable version, eSPGAN. Both aiming at similarity preserving,
SPGAN enforces this property by heuristic constraints, while eSPGAN does so by
optimally facilitating the re-ID model learning. More specifically, SPGAN
separately undertakes the two components in the "learning via translation"
framework. It first preserves two types of unsupervised similarity, namely,
self-similarity of an image before and after translation, and
domain-dissimilarity of a translated source image and a target image. It then
learns a re-ID model using existing networks. In comparison, eSPGAN seamlessly
integrates image translation and re-ID model learning. During the end-to-end
training of eSPGAN, re-ID learning guides image translation to preserve the
underlying identity information of an image. Meanwhile, image translation
improves re-ID learning by providing identity-preserving training samples of
the target domain style. In the experiment, we show that identities of the fake
images generated by SPGAN and eSPGAN are well preserved. Based on this, we
report the new state-of-the-art domain adaptation results on two large-scale
person re-ID datasets.Comment: 14 pages, 7 tables, 14 figures, this version is not fully edited and
will be updated soon. arXiv admin note: text overlap with arXiv:1711.0702
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Unsupervised shape transformer for image translation and cross-domain retrieval
We address the problem of unsupervised geometric image-to-image translation.
Rather than transferring the style of an image as a whole, our goal is to
translate the geometry of an object as depicted in different domains while
preserving its appearance characteristics. Our model is trained in an
unsupervised fashion, i.e. without the need of paired images during training.
It performs all steps of the shape transfer within a single model and without
additional post-processing stages. Extensive experiments on the VITON,
CMU-Multi-PIE and our own FashionStyle datasets show the effectiveness of the
method. In addition, we show that despite their low-dimensionality, the
features learned by our model are useful to the item retrieval task
- …