8,921 research outputs found
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Weak consistency of the 1-nearest neighbor measure with applications to missing data
When data is partially missing at random, imputation and importance weighting
are often used to estimate moments of the unobserved population. In this paper,
we study 1-nearest neighbor (1NN) importance weighting, which estimates moments
by replacing missing data with the complete data that is the nearest neighbor
in the non-missing covariate space. We define an empirical measure, the 1NN
measure, and show that it is weakly consistent for the measure of the missing
data. The main idea behind this result is that the 1NN measure is performing
inverse probability weighting in the limit. We study applications to missing
data and mitigating the impact of covariate shift in prediction tasks
Bias Reduction via End-to-End Shift Learning: Application to Citizen Science
Citizen science projects are successful at gathering rich datasets for
various applications. However, the data collected by citizen scientists are
often biased --- in particular, aligned more with the citizens' preferences
than with scientific objectives. We propose the Shift Compensation Network
(SCN), an end-to-end learning scheme which learns the shift from the scientific
objectives to the biased data while compensating for the shift by re-weighting
the training data. Applied to bird observational data from the citizen science
project eBird, we demonstrate how SCN quantifies the data distribution shift
and outperforms supervised learning models that do not address the data bias.
Compared with competing models in the context of covariate shift, we further
demonstrate the advantage of SCN in both its effectiveness and its capability
of handling massive high-dimensional data
- …