189 research outputs found
Effects of sampling skewness of the importance-weighted risk estimator on model selection
Importance-weighting is a popular and well-researched technique for dealing
with sample selection bias and covariate shift. It has desirable
characteristics such as unbiasedness, consistency and low computational
complexity. However, weighting can have a detrimental effect on an estimator as
well. In this work, we empirically show that the sampling distribution of an
importance-weighted estimator can be skewed. For sample selection bias
settings, and for small sample sizes, the importance-weighted risk estimator
produces overestimates for datasets in the body of the sampling distribution,
i.e. the majority of cases, and large underestimates for data sets in the tail
of the sampling distribution. These over- and underestimates of the risk lead
to suboptimal regularization parameters when used for importance-weighted
validation.Comment: Conference paper, 6 pages, 5 figure
On Regularization Parameter Estimation under Covariate Shift
This paper identifies a problem with the usual procedure for
L2-regularization parameter estimation in a domain adaptation setting. In such
a setting, there are differences between the distributions generating the
training data (source domain) and the test data (target domain). The usual
cross-validation procedure requires validation data, which can not be obtained
from the unlabeled target data. The problem is that if one decides to use
source validation data, the regularization parameter is underestimated. One
possible solution is to scale the source validation data through importance
weighting, but we show that this correction is not sufficient. We conclude the
paper with an empirical analysis of the effect of several importance weight
estimators on the estimation of the regularization parameter.Comment: 6 pages, 2 figures, 2 tables. Accepted to ICPR 201
The Law of Total Odds
The law of total probability may be deployed in binary classification
exercises to estimate the unconditional class probabilities if the class
proportions in the training set are not representative of the population class
proportions. We argue that this is not a conceptually sound approach and
suggest an alternative based on the new law of total odds. We quantify the bias
of the total probability estimator of the unconditional class probabilities and
show that the total odds estimator is unbiased. The sample version of the total
odds estimator is shown to coincide with a maximum-likelihood estimator known
from the literature. The law of total odds can also be used for transforming
the conditional class probabilities if independent estimates of the
unconditional class probabilities of the population are available.
Keywords: Total probability, likelihood ratio, Bayes' formula, binary
classification, relative odds, unbiased estimator, supervised learning, dataset
shift.Comment: 12 pages, 1 figure, new reference
Discriminative Density-ratio Estimation
The covariate shift is a challenging problem in supervised learning that
results from the discrepancy between the training and test distributions. An
effective approach which recently drew a considerable attention in the research
community is to reweight the training samples to minimize that discrepancy. In
specific, many methods are based on developing Density-ratio (DR) estimation
techniques that apply to both regression and classification problems. Although
these methods work well for regression problems, their performance on
classification problems is not satisfactory. This is due to a key observation
that these methods focus on matching the sample marginal distributions without
paying attention to preserving the separation between classes in the reweighted
space. In this paper, we propose a novel method for Discriminative
Density-ratio (DDR) estimation that addresses the aforementioned problem and
aims at estimating the density-ratio of joint distributions in a class-wise
manner. The proposed algorithm is an iterative procedure that alternates
between estimating the class information for the test data and estimating new
density ratio for each class. To incorporate the estimated class information of
the test data, a soft matching technique is proposed. In addition, we employ an
effective criterion which adopts mutual information as an indicator to stop the
iterative procedure while resulting in a decision boundary that lies in a
sparse region. Experiments on synthetic and benchmark datasets demonstrate the
superiority of the proposed method in terms of both accuracy and robustness
- …