Search CORE

13 research outputs found

Recommended from our members

Maxing, Ranking and Preference Learning

Author: Pichapati Venkatadheeraj
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

PAC maximum selection (maxing) and ranking of

n

elements via randompairwise comparisons have diverse applications and have been studiedunder many models and assumptions. We consider

(\epsilon,\delta)

-PACmaxing and ranking using pairwise comparisons for \nobreak{general}probabilistic models. We present a comprehensive understanding ofthree important problems in PAC preference learning: maxing, ranking,and estimating \emph{all} pairwise preference probabilities, in theadaptive setting.{\bf SST + STI:} We consider

(\epsilon,\delta)

-PAC maximum-selectionand ranking using pairwise comparisons for \nobreak{general}probabilistic models whose comparison probabilities satisfy\emph{strong stochastic transitivity (SST)} and \emph{stochastic triangle inequality (STI)}. Modifying the popular knockouttournament, we propose a simple maximum-selection algorithm that uses

\mathcal{O}\left(\frac{n}{\epsilon^2} \log\frac1{\delta}\right)

comparisons, optimal up to a constantfactor. We then derive a general framework that uses noisy binarysearch to speed up many ranking algorithms, and combine it with mergesort to obtain a ranking algorithm that uses \mathcal{O}\left(\fracn{\epsilon^2}\log n(\log \log n)^3\right) comparisons for

\delta=\frac1n

, optimal up to a

(\log \log n)^3

factor.{\bf SST +/- STI and Borda:} With just one simple natural assumption:\emph{strong stochastic transitivity (SST)}, we show that maxing canbe performed with linearly many comparisons yet ranking requiresquadratically many. With no assumptions at all, we show that for theBorda-score metric, maximum selection can be performed with linearlymany comparisons and ranking can be performed with \cO(n\log n)comparisons.{\bf General Transitive Models} With just \emph{Weak Stochastic Transitivity (WST)}, we show that maxing requires

\Omega(n^2)

comparisons and with slightly more restrictive \emph{Medium Stochastic Transitivity (MST)}, we present a linear complexity maxingalgorithm. With \emph{Strong Stochastic Transitivity (SST)} and\emph{Stochastic Triangle Inequality (STI)}, we derive a rankingalgorithm with optimal

\mathcal{O}(n\log n)

complexity and anoptimal algorithm that estimates all pairwise preferenceprobabilities.{\bf Sequential and Competitive} We extend the well-known\emph{secretary problem} to a probabilistic setting, and apply theintuition gained to derive the first query-optimal sequentialalgorithm for probabilistic-maxing. Furthermore, departing fromprevious assumptions, the algorithm and performance guarantees applyeven for infinitely many items, hence in particular do not requirea-priori knowledge of the number of items. The algorithm has linearcomplexity, and is optimal also in the streaming setting and for bothtraditional- and dueling-bandits. In a non-streaming setting, amodification of the algorithm is \emph{competitive} in that itrequires essentially the lowest number of queries not just in theworst case, but for every underlying distribution

eScholarship - University of California

Ranking a set of objects: a graph based least-square approach

Author: Christoforou Evgenia
Leonardi Emilio
Nordio Alessandro
Tarable Alberto
Publication venue
Publication date: 26/02/2020
Field of study

We consider the problem of ranking

N

objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We assume that objects are endowed with intrinsic qualities and that the probability with which an object is preferred to another depends only on the difference between the qualities of the two competitors. We propose a class of non-adaptive ranking algorithms that rely on a least-squares optimization criterion for the estimation of qualities. Such algorithms are shown to be asymptotically optimal (i.e., they require

O(\frac{N}{\epsilon^2}\log \frac{N}{\delta})

comparisons to be

(\epsilon, \delta)

-PAC). Numerical results show that our schemes are very efficient also in many non-asymptotic scenarios exhibiting a performance similar to the maximum-likelihood algorithm. Moreover, we show how they can be extended to adaptive schemes and test them on real-world datasets

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Borda Regret Minimization for Generalized Linear Dueling Bandits

Author: Farnoud Farzad
Gu Quanquan
Jin Tao
Lou Hao
Wu Yue
Publication venue
Publication date: 25/09/2023
Field of study

Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. We first prove a regret lower bound of order

\Omega(d^{2/3} T^{2/3})

for the Borda regret minimization problem, where

d

is the dimension of contextual vectors and

T

is the time horizon. To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound

\tilde{O}(d^{2/3} T^{2/3})

. We also propose an EXP3-type algorithm for the adversarial linear setting, where the underlying model parameter can change at each round. Our algorithm achieves an

\tilde{O}(d^{2/3} T^{2/3})

regret, which is also optimal. Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis.Comment: 33 pages, 5 figure. This version includes new results for dueling bandits in the adversarial settin

arXiv.org e-Print Archive

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

Author: Gaillard Pierre
Saha Aadirupa
Publication venue: HAL CCSD
Publication date: 14/02/2022
Field of study

International audienceWe study the problem of

K

-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in dueling bandits. In particular, \emph{we give the first best-of-both world result for the dueling bandits regret minimization problem} -- a unified framework that is guaranteed to perform optimally for both stochastic and adversarial preferences simultaneously. Moreover, our algorithm is also the first to achieve an optimal

O(\sum_{i = 1}^K \frac{\log T}{\Delta_i})

regret bound against the Condorcet-winner benchmark, which scales optimally both in terms of the arm-size

K

and the instance-specific suboptimality gaps

\{\Delta_i\}_{i = 1}^K

. This resolves the long-standing problem of designing an instancewise gap-dependent order optimal regret algorithm for dueling bandits (with matching lower bounds up to small constant factors). We further justify the robustness of our proposed algorithm by proving its optimal regret rate under adversarially corrupted preferences -- this outperforms the existing state-of-the-art corrupted dueling results by a large margin. In summary, we believe our reduction idea will find a broader scope in solving a diverse class of dueling bandits setting, which are otherwise studied separately from multi-armed bandits with often more complex solutions and worse guarantees. The efficacy of our proposed algorithms is empirically corroborated against the existing dueling bandit methods

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server