Search CORE

229,602 research outputs found

Preference-based learning to rank

Author: C. Cortes
C. Hoare
C. Kenyon-Mathieu
C. Rudin
D. Ariely
E. L. Lehmann
J. A. Hanley
K. Crammer
K. J. Arrow
M. H. Montague
M.-F. Balcan
M.-F. Balcan
Mehryar Mohri
N. Ailon
N. Ailon
N. Alon
Nir Ailon
S. Agarwal
S. Clémençon
T. Joachims
W. W. Cohen
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Learning to Rank based on Analogical Reasoning

Author: Fahandar Mohsen Ahmadi
Hüllermeier Eyke
Publication venue
Publication date: 28/11/2017
Field of study

Object ranking or "learning to rank" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects

A,B,C,D

, if object

A

is known to be preferred to

B

, and

C

relates to

D

A

relates to

B

, then

C

is (supposedly) preferred to

D

. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.Comment: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 8 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Factorizing LambdaMART for cold start recommendations

Author: Alexandros Kalousis
CJ Burges
CJ Burges
D Cai
J Fürnkranz
JH Friedman
Jun Wang
M Hilario
N Srebro
Phong Nguyen
Publication venue
Publication date: 04/11/2015
Field of study

Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Despite its success it does not have a principled regularization mechanism relying in empirical approaches to control model complexity leaving it thus prone to overfitting. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns a low rank latent representation of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. The low rank is essentially a model complexity controller; on top of it we propose additional regularizers to constraint the learned latent representations that reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. In both cases it outperforms in a significant manner current state of the art algorithms

arXiv.org e-Print Archive

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Archive ouverte UNIGE

Learning from User Interactions with Rankings: A Unification of the Field

Author: Oosterhuis Harrie
Publication venue
Publication date: 01/01/2020
Field of study

Ranking systems form the basis for online search engines and recommendation services. They process large collections of items, for instance web pages or e-commerce products, and present the user with a small ordered selection. The goal of a ranking system is to help a user find the items they are looking for with the least amount of effort. Thus the rankings they produce should place the most relevant or preferred items at the top of the ranking. Learning to rank is a field within machine learning that covers methods which optimize ranking systems w.r.t. this goal. Traditional supervised learning to rank methods utilize expert-judgements to evaluate and learn, however, in many situations such judgements are impossible or infeasible to obtain. As a solution, methods have been introduced that perform learning to rank based on user clicks instead. The difficulty with clicks is that they are not only affected by user preferences, but also by what rankings were displayed. Therefore, these methods have to prevent being biased by other factors than user preference. This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks.Comment: PhD Thesis of Harrie Oosterhuis defended at the University of Amsterdam on November 27th 202

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

How to Query Human Feedback Efficiently in RL?

Author: Lee Jason D.
Sun Wen
Uehara Masatoshi
Zhan Wenhao
Publication venue
Publication date: 29/05/2023
Field of study

Reinforcement Learning with Human Feedback (RLHF) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While RLHF has demonstrated practical success in fine-tuning language models, existing empirical work does not address the challenge of how to efficiently sample trajectory pairs for querying human feedback. In this study, we propose an efficient sampling approach to acquiring exploratory trajectories that enable accurate learning of hidden reward functions before collecting any human feedback. Theoretical analysis demonstrates that our algorithm requires less human feedback for learning the optimal policy under preference-based models with linear parameterization and unknown transitions, compared to the existing literature. Specifically, our framework can incorporate linear and low-rank MDPs. Additionally, we investigate RLHF with action-based comparison feedback and introduce an efficient querying algorithm tailored to this scenario

arXiv.org e-Print Archive

Learning to Order Things

Author: Cohen W. W.
Schapire R. E.
Singer Y.
Publication venue: 'AI Access Foundation'
Publication date: 26/05/2011
Field of study

There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary preference function indicating whether it is advisable to rank one instance before another. Here we consider an on-line algorithm for learning preference functions that is based on Freund and Schapire's 'Hedge' algorithm. In the second stage, new instances are ordered so as to maximize agreement with the learned preference function. We show that the problem of finding the ordering that agrees best with a learned preference function is NP-complete. Nevertheless, we describe simple greedy algorithms that are guaranteed to find a good approximation. Finally, we show how metasearch can be formulated as an ordering problem, and present experimental results on learning a combination of 'search experts', each of which is a domain-specific query expansion strategy for a web search engine

arXiv.org e-Print Archive

Crossref