4,396 research outputs found
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Data Augmentation with norm-VAE and Selective Pseudo-Labelling for Unsupervised Domain Adaptation
We address the Unsupervised Domain Adaptation (UDA) problem in image classification from a new perspective. In contrast to most existing works which either align the data distributions or learn domain-invariant features, we directly learn a unified classifier for both the source and target domains in the high-dimensional homogeneous feature space without explicit domain alignment. To this end, we employ the effective Selective Pseudo-Labelling (SPL) technique to take advantage of the unlabelled samples in the target domain. Surprisingly, data distribution discrepancy across the source and target domains can be well handled by a computationally simple classifier (e.g., a shallow Multi-Layer Perceptron) trained in the original feature space. Besides, we propose a novel generative model norm-AE to generate synthetic features for the target domain as a data augmentation strategy to enhance the classifier training. Experimental results on several benchmark datasets demonstrate the pseudo-labelling strategy itself can lead to comparable performance to many state-of-the-art methods whilst the use of norm-AE for feature augmentation can further improve the performance in most cases. As a result, our proposed methods (i.e. naiveSPL and norm-AE-SPL) can achieve comparable performance with state-of-the-art methods with the average accuracy of 93.4% and 90.4% on Office-Caltech and ImageCLEF-DA datasets, and achieve competitive performance on Digits, Office31 and Office-Home datasets with the average accuracy of 97.2%, 87.6% and 68.6% respectively
- …