126 research outputs found
Effects of sampling skewness of the importance-weighted risk estimator on model selection
Importance-weighting is a popular and well-researched technique for dealing
with sample selection bias and covariate shift. It has desirable
characteristics such as unbiasedness, consistency and low computational
complexity. However, weighting can have a detrimental effect on an estimator as
well. In this work, we empirically show that the sampling distribution of an
importance-weighted estimator can be skewed. For sample selection bias
settings, and for small sample sizes, the importance-weighted risk estimator
produces overestimates for datasets in the body of the sampling distribution,
i.e. the majority of cases, and large underestimates for data sets in the tail
of the sampling distribution. These over- and underestimates of the risk lead
to suboptimal regularization parameters when used for importance-weighted
validation.Comment: Conference paper, 6 pages, 5 figure
On Regularization Parameter Estimation under Covariate Shift
This paper identifies a problem with the usual procedure for
L2-regularization parameter estimation in a domain adaptation setting. In such
a setting, there are differences between the distributions generating the
training data (source domain) and the test data (target domain). The usual
cross-validation procedure requires validation data, which can not be obtained
from the unlabeled target data. The problem is that if one decides to use
source validation data, the regularization parameter is underestimated. One
possible solution is to scale the source validation data through importance
weighting, but we show that this correction is not sufficient. We conclude the
paper with an empirical analysis of the effect of several importance weight
estimators on the estimation of the regularization parameter.Comment: 6 pages, 2 figures, 2 tables. Accepted to ICPR 201
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Target Contrastive Pessimistic Discriminant Analysis
Domain-adaptive classifiers learn from a source domain and aim to generalize
to a target domain. If the classifier's assumptions on the relationship between
domains (e.g. covariate shift) are valid, then it will usually outperform a
non-adaptive source classifier. Unfortunately, it can perform substantially
worse when its assumptions are invalid. Validating these assumptions requires
labeled target samples, which are usually not available. We argue that, in
order to make domain-adaptive classifiers more practical, it is necessary to
focus on robust methods; robust in the sense that the model still achieves a
particular level of performance without making strong assumptions on the
relationship between domains. With this objective in mind, we formulate a
conservative parameter estimator that only deviates from the source classifier
when a lower or equal risk is guaranteed for all possible labellings of the
given target samples. We derive the corresponding estimator for a discriminant
analysis model, and show that its risk is actually strictly smaller than that
of the source classifier. Experiments indicate that our classifier outperforms
state-of-the-art classifiers for geographically biased samples.Comment: 9 pages, no figures, 2 tables. arXiv admin note: substantial text
overlap with arXiv:1706.0808
Diplomacy in action: Latourian Politics and the Intergovernmental Panel on Climate Change
The Intergovernmental Panel on Climate Change (IPCC) reviews scientific literature on climate change in an attempt to make scientific knowledge about climate change accessible to a wide audience that includes policymakers. Documents produced by the IPCC are subject to negotiations in plenary sessions, which can be frustrating for the scientists and government delegations involved, who all have stakes in getting their respective interests met. This paper draws on the work of Bruno Latour in order to formulate a so-called ‘diplomatic’ approach to knowledge assessment in global climate governance. Such an approach, we argue, helps to make climate governance more inclusive by helping to identify values of parties involved with the IPCC plenaries, and allowing those parties to recognize their mutual interests and perspectives on climate change. Drawing on observations during IPCC plenaries, this paper argues that a Latourian form of diplomacy can lead to more inclusive negotiations in climate governance
Robust importance-weighted cross-validation under sample selection bias
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces suboptimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights
- …