229,602 research outputs found
Learning to Rank based on Analogical Reasoning
Object ranking or "learning to rank" is an important problem in the realm of
preference learning. On the basis of training data in the form of a set of
rankings of objects represented as feature vectors, the goal is to learn a
ranking function that predicts a linear order of any new set of objects. In
this paper, we propose a new approach to object ranking based on principles of
analogical reasoning. More specifically, our inference pattern is formalized in
terms of so-called analogical proportions and can be summarized as follows:
Given objects , if object is known to be preferred to , and
relates to as relates to , then is (supposedly) preferred to
. Our method applies this pattern as a main building block and combines it
with ideas and techniques from instance-based learning and rank aggregation.
Based on first experimental results for data sets from various domains (sports,
education, tourism, etc.), we conclude that our approach is highly competitive.
It appears to be specifically interesting in situations in which the objects
are coming from different subdomains, and which hence require a kind of
knowledge transfer.Comment: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 8
page
Factorizing LambdaMART for cold start recommendations
Recommendation systems often rely on point-wise loss metrics such as the mean
squared error. However, in real recommendation settings only few items are
presented to a user. This observation has recently encouraged the use of
rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to
rank which relies on such a metric. Despite its success it does not have a
principled regularization mechanism relying in empirical approaches to control
model complexity leaving it thus prone to overfitting.
Motivated by the fact that very often the users' and items' descriptions as
well as the preference behavior can be well summarized by a small number of
hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization
(LambdaMART-MF), that learns a low rank latent representation of users and
items using gradient boosted trees. The algorithm factorizes lambdaMART by
defining relevance scores as the inner product of the learned representations
of the users and items. The low rank is essentially a model complexity
controller; on top of it we propose additional regularizers to constraint the
learned latent representations that reflect the user and item manifolds as
these are defined by their original feature based descriptors and the
preference behavior. Finally we also propose to use a weighted variant of NDCG
to reduce the penalty for similar items with large rating discrepancy.
We experiment on two very different recommendation datasets, meta-mining and
movies-users, and evaluate the performance of LambdaMART-MF, with and without
regularization, in the cold start setting as well as in the simpler matrix
completion setting. In both cases it outperforms in a significant manner
current state of the art algorithms
Learning from User Interactions with Rankings: A Unification of the Field
Ranking systems form the basis for online search engines and recommendation
services. They process large collections of items, for instance web pages or
e-commerce products, and present the user with a small ordered selection. The
goal of a ranking system is to help a user find the items they are looking for
with the least amount of effort. Thus the rankings they produce should place
the most relevant or preferred items at the top of the ranking. Learning to
rank is a field within machine learning that covers methods which optimize
ranking systems w.r.t. this goal. Traditional supervised learning to rank
methods utilize expert-judgements to evaluate and learn, however, in many
situations such judgements are impossible or infeasible to obtain. As a
solution, methods have been introduced that perform learning to rank based on
user clicks instead. The difficulty with clicks is that they are not only
affected by user preferences, but also by what rankings were displayed.
Therefore, these methods have to prevent being biased by other factors than
user preference. This thesis concerns learning to rank methods based on user
clicks and specifically aims to unify the different families of these methods.
As a whole, the second part of this thesis proposes a framework that bridges
many gaps between areas of online, counterfactual, and supervised learning to
rank. It has taken approaches, previously considered independent, and unified
them into a single methodology for widely applicable and effective learning to
rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of
Amsterdam on November 27th 202
How to Query Human Feedback Efficiently in RL?
Reinforcement Learning with Human Feedback (RLHF) is a paradigm in which an
RL agent learns to optimize a task using pair-wise preference-based feedback
over trajectories, rather than explicit reward signals. While RLHF has
demonstrated practical success in fine-tuning language models, existing
empirical work does not address the challenge of how to efficiently sample
trajectory pairs for querying human feedback. In this study, we propose an
efficient sampling approach to acquiring exploratory trajectories that enable
accurate learning of hidden reward functions before collecting any human
feedback. Theoretical analysis demonstrates that our algorithm requires less
human feedback for learning the optimal policy under preference-based models
with linear parameterization and unknown transitions, compared to the existing
literature. Specifically, our framework can incorporate linear and low-rank
MDPs. Additionally, we investigate RLHF with action-based comparison feedback
and introduce an efficient querying algorithm tailored to this scenario
Learning to Order Things
There are many applications in which it is desirable to order rather than
classify instances. Here we consider the problem of learning how to order
instances given feedback in the form of preference judgments, i.e., statements
to the effect that one instance should be ranked ahead of another. We outline a
two-stage approach in which one first learns by conventional means a binary
preference function indicating whether it is advisable to rank one instance
before another. Here we consider an on-line algorithm for learning preference
functions that is based on Freund and Schapire's 'Hedge' algorithm. In the
second stage, new instances are ordered so as to maximize agreement with the
learned preference function. We show that the problem of finding the ordering
that agrees best with a learned preference function is NP-complete.
Nevertheless, we describe simple greedy algorithms that are guaranteed to find
a good approximation. Finally, we show how metasearch can be formulated as an
ordering problem, and present experimental results on learning a combination of
'search experts', each of which is a domain-specific query expansion strategy
for a web search engine
- …