39,370 research outputs found
Correcting for Selection Bias in Learning-to-rank Systems
Click data collected by modern recommendation systems are an important source
of observational data that can be utilized to train learning-to-rank (LTR)
systems. However, these data suffer from a number of biases that can result in
poor performance for LTR systems. Recent methods for bias correction in such
systems mostly focus on position bias, the fact that higher ranked results
(e.g., top search engine results) are more likely to be clicked even if they
are not the most relevant results given a user's query. Less attention has been
paid to correcting for selection bias, which occurs because clicked documents
are reflective of what documents have been shown to the user in the first
place. Here, we propose new counterfactual approaches which adapt Heckman's
two-stage method and accounts for selection and position bias in LTR systems.
Our empirical evaluation shows that our proposed methods are much more robust
to noise and have better accuracy compared to existing unbiased LTR algorithms,
especially when there is moderate to no position bias.Comment: This paper appeared in The Web Conference (WWW'20), April 20-24,
2020, Taipei, Taiwa
On the design of an ECOC-compliant genetic algorithm
Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
- …