48,669 research outputs found
Unbiased Learning to Rank with Unbiased Propensity Estimation
Learning to rank with biased click data is a well-known challenge. A variety
of methods has been explored to debias click data for learning to rank such as
click models, result interleaving and, more recently, the unbiased
learning-to-rank framework based on inverse propensity weighting. Despite their
differences, most existing studies separate the estimation of click bias
(namely the \textit{propensity model}) from the learning of ranking algorithms.
To estimate click propensities, they either conduct online result
randomization, which can negatively affect the user experience, or offline
parameter estimation, which has special requirements for click data and is
optimized for objectives (e.g. click likelihood) that are not directly related
to the ranking performance of the system. In this work, we address those
problems by unifying the learning of propensity models and ranking models. We
find that the problem of estimating a propensity model from click data is a
dual problem of unbiased learning to rank. Based on this observation, we
propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker
and an \textit{unbiased propensity model}. DLA is an automatic unbiased
learning-to-rank framework as it directly learns unbiased ranking models from
biased click data without any preprocessing. It can adapt to the change of bias
distributions and is applicable to online learning. Our empirical experiments
with synthetic and real-world data show that the models trained with DLA
significantly outperformed the unbiased learning-to-rank algorithms based on
result randomization and the models trained with relevance signals extracted by
click models
Regression and Learning to Rank Aggregation for User Engagement Evaluation
User engagement refers to the amount of interaction an instance (e.g., tweet,
news, and forum post) achieves. Ranking the items in social media websites
based on the amount of user participation in them, can be used in different
applications, such as recommender systems. In this paper, we consider a tweet
containing a rating for a movie as an instance and focus on ranking the
instances of each user based on their engagement, i.e., the total number of
retweets and favorites it will gain.
For this task, we define several features which can be extracted from the
meta-data of each tweet. The features are partitioned into three categories:
user-based, movie-based, and tweet-based. We show that in order to obtain good
results, features from all categories should be considered. We exploit
regression and learning to rank methods to rank the tweets and propose to
aggregate the results of regression and learning to rank methods to achieve
better performance. We have run our experiments on an extended version of
MovieTweeting dataset provided by ACM RecSys Challenge 2014. The results show
that learning to rank approach outperforms most of the regression models and
the combination can improve the performance significantly.Comment: In Proceedings of the 2014 ACM Recommender Systems Challenge,
RecSysChallenge '1
Solving the G-problems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control
Constrained optimization of high-dimensional numerical problems plays an
important role in many scientific and industrial applications. Function
evaluations in many industrial applications are severely limited and no
analytical information about objective function and constraint functions is
available. For such expensive black-box optimization tasks, the constraint
optimization algorithm COBRA was proposed, making use of RBF surrogate modeling
for both the objective and the constraint functions. COBRA has shown remarkable
success in solving reliably complex benchmark problems in less than 500
function evaluations. Unfortunately, COBRA requires careful adjustment of
parameters in order to do so.
In this work we present a new self-adjusting algorithm SACOBRA, which is
based on COBRA and capable to achieve high-quality results with very few
function evaluations and no parameter tuning. It is shown with the help of
performance profiles on a set of benchmark problems (G-problems, MOPTA08) that
SACOBRA consistently outperforms any COBRA algorithm with fixed parameter
setting. We analyze the importance of the several new elements in SACOBRA and
find that each element of SACOBRA plays a role to boost up the overall
optimization performance. We discuss the reasons behind and get in this way a
better understanding of high-quality RBF surrogate modeling
Selection Bias in News Coverage: Learning it, Fighting it
News entities must select and filter the coverage they broadcast through
their respective channels since the set of world events is too large to be
treated exhaustively. The subjective nature of this filtering induces biases
due to, among other things, resource constraints, editorial guidelines,
ideological affinities, or even the fragmented nature of the information at a
journalist's disposal. The magnitude and direction of these biases are,
however, widely unknown. The absence of ground truth, the sheer size of the
event space, or the lack of an exhaustive set of absolute features to measure
make it difficult to observe the bias directly, to characterize the leaning's
nature and to factor it out to ensure a neutral coverage of the news. In this
work, we introduce a methodology to capture the latent structure of media's
decision process on a large scale. Our contribution is multi-fold. First, we
show media coverage to be predictable using personalization techniques, and
evaluate our approach on a large set of events collected from the GDELT
database. We then show that a personalized and parametrized approach not only
exhibits higher accuracy in coverage prediction, but also provides an
interpretable representation of the selection bias. Last, we propose a method
able to select a set of sources by leveraging the latent representation. These
selected sources provide a more diverse and egalitarian coverage, all while
retaining the most actively covered events
- …