18 research outputs found
Bandit Online Optimization Over the Permutahedron
The permutahedron is the convex polytope with vertex set consisting of the
vectors for all permutations (bijections) over
. We study a bandit game in which, at each step , an
adversary chooses a hidden weight weight vector , a player chooses a
vertex of the permutahedron and suffers an observed loss of
.
A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a
regret of for a time horizon of . Unfortunately,
CombBand requires at each step an -by- matrix permanent approximation to
within improved accuracy as grows, resulting in a total running time that
is super linear in , making it impractical for large time horizons.
We provide an algorithm of regret with total time
complexity . The ideas are a combination of CombBand and a recent
algorithm by Ailon (2013) for online optimization over the permutahedron in the
full information setting. The technical core is a bound on the variance of the
Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by
establishing positive semi-definiteness of a family of 3-by-3 matrices
generated from rational functions of exponentials of 3 parameters
Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback
Given a set of objects, an online ranking system outputs at each time
step a full ranking of the set, observes a feedback of some form and suffers a
loss. We study the setting in which the (adversarial) feedback is an element in
, and the loss is the position (0th, 1st, 2nd...) of the item in the
outputted ranking. More generally, we study a setting in which the feedback is
a subset of at most elements in , and the loss is the sum of the
positions of those elements.
We present an algorithm of expected regret over a time
horizon of steps with respect to the best single ranking in hindsight. This
improves previous algorithms and analyses either by a factor of either
, a factor of or by improving running
time from quadratic to per round. We also prove a matching lower
bound. Our techniques also imply an improved regret bound for online rank
aggregation over the Spearman correlation measure, and to other more complex
ranking loss functions
Optimal algorithms for group distributionally robust optimization and beyond
Distributionally robust optimization (DRO) can improve the robustness and
fairness of learning methods. In this paper, we devise stochastic algorithms
for a class of DRO problems including group DRO, subpopulation fairness, and
empirical conditional value at risk (CVaR) optimization. Our new algorithms
achieve faster convergence rates than existing algorithms for multiple DRO
settings. We also provide a new information-theoretic lower bound that implies
our bounds are tight for group DRO. Empirically, too, our algorithms outperform
known method
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model
Online learning to rank (OLTR) interactively learns to choose lists of items
from a large collection based on certain click models that describe users'
click behaviors. Most recent works for this problem focus on the stochastic
environment where the item attractiveness is assumed to be invariant during the
learning process. In many real-world scenarios, however, the environment could
be dynamic or even arbitrarily changing. This work studies the OLTR problem in
both stochastic and adversarial environments under the position-based model
(PBM). We propose a method based on the follow-the-regularized-leader (FTRL)
framework with Tsallis entropy and develop a new self-bounding constraint
especially designed for PBM. We prove the proposed algorithm simultaneously
achieves regret in the stochastic environment and
regret in the adversarial environment, where is the number of rounds,
is the number of items and is the number of positions. We also provide a
lower bound of order for adversarial PBM, which matches
our upper bound and improves over the state-of-the-art lower bound. The
experiments show that our algorithm could simultaneously learn in both
stochastic and adversarial environments and is competitive compared to existing
methods that are designed for a single environment
Universal Algorithms: Beyond the Simplex
The bulk of universal algorithms in the online convex optimisation literature
are variants of the Hedge (exponential weights) algorithm on the simplex. While
these algorithms extend to polytope domains by assigning weights to the
vertices, this process is computationally unfeasible for many important classes
of polytopes where the number of vertices depends exponentially on the
dimension . In this paper we show the Subgradient algorithm is universal,
meaning it has regret in the antagonistic setting and
pseudo-regret in the i.i.d setting, with two main advantages over Hedge: (1)
The update step is more efficient as the action vectors have length only
rather than ; and (2) Subgradient gives better performance if the cost
vectors satisfy Euclidean rather than sup-norm bounds. This paper extends the
authors' recent results for Subgradient on the simplex. We also prove the same
and bounds when the domain is the unit ball. To the
authors' knowledge this is the first instance of these bounds on a domain other
than a polytope.Comment: 1 figure, 40 page