Search CORE

5,045 research outputs found

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Author: Cohen William W.
Salakhutdinov Ruslan
Yang Zhilin
Publication venue
Publication date: 18/03/2017
Field of study

Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.Comment: Accepted as a conference paper at ICLR 2017. This is an extended version of the original paper (https://arxiv.org/abs/1603.06270). The original paper proposes a new architecture, while this version focuses on transfer learning for a general model clas

arXiv.org e-Print Archive

Taskonomy: Disentangling Task Transfer Learning

Author: Guibas Leonidas
Malik Jitendra
Savarese Silvio
Sax Alexander
Shen William
Zamir Amir
Publication venue
Publication date: 23/04/2018
Field of study

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity. We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at http://taskonomy.vision

arXiv.org e-Print Archive

Learning to cluster in order to transfer across domains and tasks

Author: Hsu Yen-Chang
Kira Zsolt
Lv Zhaoyang
Publication venue
Publication date: 17/03/2018
Field of study

This paper introduces a novel method to perform transfer learning across domains and tasks, formulating it as a problem of learning to cluster. The key insight is that, in addition to features, we can transfer similarity information and this is sufficient to learn a similarity function and clustering network to perform both domain adaptation and cross-task transfer learning. We begin by reducing categorical information to pairwise constraints, which only considers whether two instances belong to the same class or not. This similarity is category-agnostic and can be learned from data in the source domain using a similarity network. We then present two novel approaches for performing transfer learning using this similarity function. First, for unsupervised domain adaptation, we design a new loss function to regularize classification with a constrained clustering loss, hence learning a clustering network with the transferred similarity metric generating the training inputs. Second, for cross-task learning (i.e., unsupervised clustering with unseen categories), we propose a framework to reconstruct and estimate the number of semantic clusters, again using the clustering network. Since the similarity network is noisy, the key is to use a robust clustering algorithm, and we show that our formulation is more robust than the alternative constrained and unconstrained clustering approaches. Using this method, we first show state of the art results for the challenging cross-task problem, applied on Omniglot and ImageNet. Our results show that we can reconstruct semantic clusters with high accuracy. We then evaluate the performance of cross-domain transfer using images from the Office-31 and SVHN-MNIST tasks and present top accuracy on both datasets. Our approach doesn't explicitly deal with domain discrepancy. If we combine with a domain adaptation loss, it shows further improvement.Comment: ICLR 201

arXiv.org e-Print Archive

Transductive Zero-Shot Hashing via Coarse-to-Fine Similarity Mining

Author: Lai Hanjiang
Pan Yan
Publication venue
Publication date: 08/11/2017
Field of study

Zero-shot Hashing (ZSH) is to learn hashing models for novel/target classes without training data, which is an important and challenging problem. Most existing ZSH approaches exploit transfer learning via an intermediate shared semantic representations between the seen/source classes and novel/target classes. However, due to having disjoint, the hash functions learned from the source dataset are biased when applied directly to the target classes. In this paper, we study the transductive ZSH, i.e., we have unlabeled data for novel classes. We put forward a simple yet efficient joint learning approach via coarse-to-fine similarity mining which transfers knowledges from source data to target data. It mainly consists of two building blocks in the proposed deep architecture: 1) a shared two-streams network, which the first stream operates on the source data and the second stream operates on the unlabeled data, to learn the effective common image representations, and 2) a coarse-to-fine module, which begins with finding the most representative images from target classes and then further detect similarities among these images, to transfer the similarities of the source data to the target data in a greedy fashion. Extensive evaluation results on several benchmark datasets demonstrate that the proposed hashing method achieves significant improvement over the state-of-the-art methods

arXiv.org e-Print Archive

Unsupervised Domain Adaptation on Reading Comprehension

Author: Cao Yu
Fang Meng
Yu Baosheng
Zhou Joey Tianyi
Publication venue
Publication date: 03/04/2020
Field of study

Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate this issue, we are going to investigate unsupervised domain adaptation on RC, wherein a model is trained on labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, the performance is still unsatisfactory when the model trained on one dataset is directly applied to another target dataset. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable accuracy to supervised models on multiple large-scale benchmark datasets.Comment: 8 pages, 6 figures, 5 tables, Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Adaptive Deep Learning through Visual Domain Localization

Author: Angeletti Gabriele
Caputo Barbara
Tommasi Tatiana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

A commercial robot, trained by its manufacturer to recognize a predefined number and type of objects, might be used in many settings, that will in general differ in their illumination conditions, background, type and degree of clutter, and so on. Recent computer vision works tackle this generalization issue through domain adaptation methods, assuming as source the visual domain where the system is trained and as target the domain of deployment. All approaches assume to have access to images from all classes of the target during training, an unrealistic condition in robotics applications. We address this issue proposing an algorithm that takes into account the specific needs of robot vision. Our intuition is that the nature of the domain shift experienced mostly in robotics is local. We exploit this through the learning of maps that spatially ground the domain and quantify the degree of shift, embedded into an end-to-end deep domain adaptation architecture. By explicitly localizing the roots of the domain shift we significantly reduce the number of parameters of the architecture to tune, we gain the flexibility necessary to deal with subset of categories in the target domain at training time, and we provide a clear feedback on the rationale behind any classification decision, which can be exploited in human-robot interactions. Experiments on two different settings of the iCub World database confirm the suitability of our method for robot vision

Archivio della ricerca- Università di Roma La Sapienza

Transfer Learning using Representation Learning in Massive Open Online Courses

Author: Ding Mucong
Hemberg Erik
O'Reilly Una-May
Wang Yanbang
Publication venue
Publication date: 18/12/2018
Field of study

In a Massive Open Online Course (MOOC), predictive models of student behavior can support multiple aspects of learning, including instructor feedback and timely intervention. Ongoing courses, when the student outcomes are yet unknown, must rely on models trained from the historical data of previously offered courses. It is possible to transfer models, but they often have poor prediction performance. One reason is features that inadequately represent predictive attributes common to both courses. We present an automated transductive transfer learning approach that addresses this issue. It relies on problem-agnostic, temporal organization of the MOOC clickstream data, where, for each student, for multiple courses, a set of specific MOOC event types is expressed for each time unit. It consists of two alternative transfer methods based on representation learning with auto-encoders: a passive approach using transductive principal component analysis and an active approach that uses a correlation alignment loss term. With these methods, we investigate the transferability of dropout prediction across similar and dissimilar MOOCs and compare with known methods. Results show improved model transferability and suggest that the methods are capable of automatically learning a feature representation that expresses common predictive characteristics of MOOCs.Comment: 10 pages, 11 figures, accepted at LAK'1

arXiv.org e-Print Archive

Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

Author: Li Wanqing
Ogunbona Philip
Xu Dong
Zhang Jing
Publication venue
Publication date: 01/01/2019
Field of study

This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly

arXiv.org e-Print Archive

Research Online

RDPD: Rich Data Helps Poor Data via Imitation

Author: Hoang Trong Nghia
Hong Shenda
Li Hongyan
Ma Tengfei
Sun Jimeng
Xiao Cao
Publication venue
Publication date: 24/08/2019
Field of study

In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.Comment: Published in IJCAI 201

arXiv.org e-Print Archive

All-Transfer Learning for Deep Neural Networks and its Application to Sepsis Classification

Author: Hayashi Nobuhiro
Nakada Toru
Sato Yoshikuni
Sawada Yoshihide
Ujimoto Kei
Publication venue
Publication date: 13/11/2017
Field of study

In this article, we propose a transfer learning method for deep neural networks (DNNs). Deep learning has been widely used in many applications. However, applying deep learning is problematic when a large amount of training data are not available. One of the conventional methods for solving this problem is transfer learning for DNNs. In the field of image recognition, state-of-the-art transfer learning methods for DNNs re-use parameters trained on source domain data except for the output layer. However, this method may result in poor classification performance when the amount of target domain data is significantly small. To address this problem, we propose a method called All-Transfer Deep Learning, which enables the transfer of all parameters of a DNN. With this method, we can compute the relationship between the source and target labels by the source domain knowledge. We applied our method to actual two-dimensional electrophoresis image~(2-DE image) classification for determining if an individual suffers from sepsis; the first attempt to apply a classification approach to 2-DE images for proteomics, which has attracted considerable attention as an extension beyond genomics. The results suggest that our proposed method outperforms conventional transfer learning methods for DNNs.Comment: Long version of article published at ECAI 2016 (9 pages, 13 figures, 8 tables

arXiv.org e-Print Archive