Search CORE

18 research outputs found

Bandit Online Optimization Over the Permutahedron

Author: D. Suehiro
D.P. Helmbold
J. Yellott
L.G. Valiant
M. Jerrum
N. Cesa-Bianchi
P. Auer
S. Beggs
S. Yasutake
Publication venue
Publication date: 01/01/2014
Field of study

The permutahedron is the convex polytope with vertex set consisting of the vectors

(\pi(1),\dots, \pi(n))

for all permutations (bijections)

\pi

over

\{1,\dots, n\}

. We study a bandit game in which, at each step

t

, an adversary chooses a hidden weight weight vector

s_t

, a player chooses a vertex

\pi_t

of the permutahedron and suffers an observed loss of

\sum_{i=1}^n \pi(i) s_t(i)

. A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of

O(n\sqrt{T \log n})

for a time horizon of

T

. Unfortunately, CombBand requires at each step an

n

-by-

n

matrix permanent approximation to within improved accuracy as

T

grows, resulting in a total running time that is super linear in

T

, making it impractical for large time horizons. We provide an algorithm of regret

O(n^{3/2}\sqrt{T})

with total time complexity

O(n^3T)

. The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters

arXiv.org e-Print Archive

Crossref

Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

Author: Ailon Nir
Publication venue
Publication date: 14/10/2013
Field of study

Given a set

V

n

objects, an online ranking system outputs at each time step a full ranking of the set, observes a feedback of some form and suffers a loss. We study the setting in which the (adversarial) feedback is an element in

V

, and the loss is the position (0th, 1st, 2nd...) of the item in the outputted ranking. More generally, we study a setting in which the feedback is a subset

U

of at most

k

elements in

V

, and the loss is the sum of the positions of those elements. We present an algorithm of expected regret

O(n^{3/2}\sqrt{Tk})

over a time horizon of

T

steps with respect to the best single ranking in hindsight. This improves previous algorithms and analyses either by a factor of either

\Omega(\sqrt{k})

, a factor of

\Omega(\sqrt{\log n})

or by improving running time from quadratic to

O(n\log n)

per round. We also prove a matching lower bound. Our techniques also imply an improved regret bound for online rank aggregation over the Spearman correlation measure, and to other more complex ranking loss functions

arXiv.org e-Print Archive

CiteSeerX

Optimal algorithms for group distributionally robust optimization and beyond

Author: Gatmiry Khashayar
Jegelka Stefanie
Soma Tasuku
Publication venue
Publication date: 27/12/2022
Field of study

Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known method

arXiv.org e-Print Archive

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Author: Chen Cheng
Li Shuai
Zhao Canzhe
Publication venue
Publication date: 12/07/2022
Field of study

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves

O(\log{T})

regret in the stochastic environment and

O(m\sqrt{nT})

regret in the adversarial environment, where

T

is the number of rounds,

n

is the number of items and

m

is the number of positions. We also provide a lower bound of order

\Omega(m\sqrt{nT})

for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment

arXiv.org e-Print Archive

Universal Algorithms: Beyond the Simplex

Author: Anderson Daron
Leith Douglas
Publication venue
Publication date: 03/04/2020
Field of study

The bulk of universal algorithms in the online convex optimisation literature are variants of the Hedge (exponential weights) algorithm on the simplex. While these algorithms extend to polytope domains by assigning weights to the vertices, this process is computationally unfeasible for many important classes of polytopes where the number

V

of vertices depends exponentially on the dimension

d

. In this paper we show the Subgradient algorithm is universal, meaning it has

O(\sqrt N)

regret in the antagonistic setting and

O(1)

pseudo-regret in the i.i.d setting, with two main advantages over Hedge: (1) The update step is more efficient as the action vectors have length only

d

rather than

V

; and (2) Subgradient gives better performance if the cost vectors satisfy Euclidean rather than sup-norm bounds. This paper extends the authors' recent results for Subgradient on the simplex. We also prove the same

O(\sqrt N)

and

O(1)

bounds when the domain is the unit ball. To the authors' knowledge this is the first instance of these bounds on a domain other than a polytope.Comment: 1 figure, 40 page

arXiv.org e-Print Archive