5,045 research outputs found
Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Recent papers have shown that neural networks obtain state-of-the-art
performance on several different sequence tagging tasks. One appealing property
of such systems is their generality, as excellent performance can be achieved
with a unified architecture and without task-specific feature engineering.
However, it is unclear if such systems can be used for tasks without large
amounts of training data. In this paper we explore the problem of transfer
learning for neural sequence taggers, where a source task with plentiful
annotations (e.g., POS tagging on Penn Treebank) is used to improve performance
on a target task with fewer available annotations (e.g., POS tagging for
microblogs). We examine the effects of transfer learning for deep hierarchical
recurrent networks across domains, applications, and languages, and show that
significant improvement can often be obtained. These improvements lead to
improvements over the current state-of-the-art on several well-studied tasks.Comment: Accepted as a conference paper at ICLR 2017. This is an extended
version of the original paper (https://arxiv.org/abs/1603.06270). The
original paper proposes a new architecture, while this version focuses on
transfer learning for a general model clas
Taskonomy: Disentangling Task Transfer Learning
Do visual tasks have a relationship, or are they unrelated? For instance,
could having surface normals simplify estimating the depth of an image?
Intuition answers these questions positively, implying existence of a structure
among visual tasks. Knowing this structure has notable values; it is the
concept underlying transfer learning and provides a principled way for
identifying redundancies across tasks, e.g., to seamlessly reuse supervision
among related tasks or solve many tasks in one system without piling up the
complexity.
We proposes a fully computational approach for modeling the structure of
space of visual tasks. This is done via finding (first and higher-order)
transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D,
and semantic tasks in a latent space. The product is a computational taxonomic
map for task transfer learning. We study the consequences of this structure,
e.g. nontrivial emerged relationships, and exploit them to reduce the demand
for labeled data. For example, we show that the total number of labeled
datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3
(compared to training independently) while keeping the performance nearly the
same. We provide a set of tools for computing and probing this taxonomical
structure including a solver that users can employ to devise efficient
supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at
http://taskonomy.vision
Learning to cluster in order to transfer across domains and tasks
This paper introduces a novel method to perform transfer learning across
domains and tasks, formulating it as a problem of learning to cluster. The key
insight is that, in addition to features, we can transfer similarity
information and this is sufficient to learn a similarity function and
clustering network to perform both domain adaptation and cross-task transfer
learning. We begin by reducing categorical information to pairwise constraints,
which only considers whether two instances belong to the same class or not.
This similarity is category-agnostic and can be learned from data in the source
domain using a similarity network. We then present two novel approaches for
performing transfer learning using this similarity function. First, for
unsupervised domain adaptation, we design a new loss function to regularize
classification with a constrained clustering loss, hence learning a clustering
network with the transferred similarity metric generating the training inputs.
Second, for cross-task learning (i.e., unsupervised clustering with unseen
categories), we propose a framework to reconstruct and estimate the number of
semantic clusters, again using the clustering network. Since the similarity
network is noisy, the key is to use a robust clustering algorithm, and we show
that our formulation is more robust than the alternative constrained and
unconstrained clustering approaches. Using this method, we first show state of
the art results for the challenging cross-task problem, applied on Omniglot and
ImageNet. Our results show that we can reconstruct semantic clusters with high
accuracy. We then evaluate the performance of cross-domain transfer using
images from the Office-31 and SVHN-MNIST tasks and present top accuracy on both
datasets. Our approach doesn't explicitly deal with domain discrepancy. If we
combine with a domain adaptation loss, it shows further improvement.Comment: ICLR 201
Transductive Zero-Shot Hashing via Coarse-to-Fine Similarity Mining
Zero-shot Hashing (ZSH) is to learn hashing models for novel/target classes
without training data, which is an important and challenging problem. Most
existing ZSH approaches exploit transfer learning via an intermediate shared
semantic representations between the seen/source classes and novel/target
classes. However, due to having disjoint, the hash functions learned from the
source dataset are biased when applied directly to the target classes. In this
paper, we study the transductive ZSH, i.e., we have unlabeled data for novel
classes. We put forward a simple yet efficient joint learning approach via
coarse-to-fine similarity mining which transfers knowledges from source data to
target data. It mainly consists of two building blocks in the proposed deep
architecture: 1) a shared two-streams network, which the first stream operates
on the source data and the second stream operates on the unlabeled data, to
learn the effective common image representations, and 2) a coarse-to-fine
module, which begins with finding the most representative images from target
classes and then further detect similarities among these images, to transfer
the similarities of the source data to the target data in a greedy fashion.
Extensive evaluation results on several benchmark datasets demonstrate that the
proposed hashing method achieves significant improvement over the
state-of-the-art methods
Unsupervised Domain Adaptation on Reading Comprehension
Reading comprehension (RC) has been studied in a variety of datasets with the
boosted performance brought by deep neural networks. However, the
generalization capability of these models across different domains remains
unclear. To alleviate this issue, we are going to investigate unsupervised
domain adaptation on RC, wherein a model is trained on labeled source domain
and to be applied to the target domain with only unlabeled samples. We first
show that even with the powerful BERT contextual representation, the
performance is still unsatisfactory when the model trained on one dataset is
directly applied to another target dataset. To solve this, we provide a novel
conditional adversarial self-training method (CASe). Specifically, our approach
leverages a BERT model fine-tuned on the source dataset along with the
confidence filtering to generate reliable pseudo-labeled samples in the target
domain for self-training. On the other hand, it further reduces domain
distribution discrepancy through conditional adversarial learning across
domains. Extensive experiments show our approach achieves comparable accuracy
to supervised models on multiple large-scale benchmark datasets.Comment: 8 pages, 6 figures, 5 tables, Accepted by AAAI 202
Adaptive Deep Learning through Visual Domain Localization
A commercial robot, trained by its manufacturer to recognize a predefined number and type of objects, might be used in many settings, that will in general differ in their illumination conditions, background, type and degree of clutter, and so on. Recent computer vision works tackle this generalization issue through domain adaptation methods, assuming as source the visual domain where the system is trained and as target the domain of deployment. All approaches assume to have access to images from all classes of the target during training, an unrealistic condition in robotics applications. We address this issue proposing an algorithm that takes into account the specific needs of robot vision. Our intuition is that the nature of the domain shift experienced mostly in robotics is local. We exploit this through the learning of maps that spatially ground the domain and quantify the degree of shift, embedded into an end-to-end deep domain adaptation architecture. By explicitly localizing the roots of the domain shift we significantly reduce the number of parameters of the architecture to tune, we gain the flexibility necessary to deal with subset of categories in the target domain at training time, and we provide a clear feedback on the rationale behind any classification decision, which can be exploited in human-robot interactions. Experiments on two different settings of the iCub World database confirm the suitability of our method for robot vision
Transfer Learning using Representation Learning in Massive Open Online Courses
In a Massive Open Online Course (MOOC), predictive models of student behavior
can support multiple aspects of learning, including instructor feedback and
timely intervention. Ongoing courses, when the student outcomes are yet
unknown, must rely on models trained from the historical data of previously
offered courses. It is possible to transfer models, but they often have poor
prediction performance. One reason is features that inadequately represent
predictive attributes common to both courses. We present an automated
transductive transfer learning approach that addresses this issue. It relies on
problem-agnostic, temporal organization of the MOOC clickstream data, where,
for each student, for multiple courses, a set of specific MOOC event types is
expressed for each time unit. It consists of two alternative transfer methods
based on representation learning with auto-encoders: a passive approach using
transductive principal component analysis and an active approach that uses a
correlation alignment loss term. With these methods, we investigate the
transferability of dropout prediction across similar and dissimilar MOOCs and
compare with known methods. Results show improved model transferability and
suggest that the methods are capable of automatically learning a feature
representation that expresses common predictive characteristics of MOOCs.Comment: 10 pages, 11 figures, accepted at LAK'1
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
RDPD: Rich Data Helps Poor Data via Imitation
In many situations, we need to build and deploy separate models in related
environments with different data qualities. For example, an environment with
strong observation equipments (e.g., intensive care units) often provides
high-quality multi-modal data, which are acquired from multiple sensory devices
and have rich-feature representations. On the other hand, an environment with
poor observation equipment (e.g., at home) only provides low-quality, uni-modal
data with poor-feature representations. To deploy a competitive model in a
poor-data environment without requiring direct access to multi-modal data
acquired from a rich-data environment, this paper develops and presents a
knowledge distillation (KD) method (RDPD) to enhance a predictive model trained
on poor data using knowledge distilled from a high-complexity model trained on
rich, private data. We evaluated RDPD on three real-world datasets and shown
that its distilled model consistently outperformed all baselines across all
datasets, especially achieving the greatest performance improvement over a
model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on
ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and
4.44% on ROC-AUC.Comment: Published in IJCAI 201
All-Transfer Learning for Deep Neural Networks and its Application to Sepsis Classification
In this article, we propose a transfer learning method for deep neural
networks (DNNs). Deep learning has been widely used in many applications.
However, applying deep learning is problematic when a large amount of training
data are not available. One of the conventional methods for solving this
problem is transfer learning for DNNs. In the field of image recognition,
state-of-the-art transfer learning methods for DNNs re-use parameters trained
on source domain data except for the output layer. However, this method may
result in poor classification performance when the amount of target domain data
is significantly small. To address this problem, we propose a method called
All-Transfer Deep Learning, which enables the transfer of all parameters of a
DNN. With this method, we can compute the relationship between the source and
target labels by the source domain knowledge. We applied our method to actual
two-dimensional electrophoresis image~(2-DE image) classification for
determining if an individual suffers from sepsis; the first attempt to apply a
classification approach to 2-DE images for proteomics, which has attracted
considerable attention as an extension beyond genomics. The results suggest
that our proposed method outperforms conventional transfer learning methods for
DNNs.Comment: Long version of article published at ECAI 2016 (9 pages, 13 figures,
8 tables
- …