18 research outputs found

    Bandit Online Optimization Over the Permutahedron

    Full text link
    The permutahedron is the convex polytope with vertex set consisting of the vectors (Ο€(1),…,Ο€(n))(\pi(1),\dots, \pi(n)) for all permutations (bijections) Ο€\pi over {1,…,n}\{1,\dots, n\}. We study a bandit game in which, at each step tt, an adversary chooses a hidden weight weight vector sts_t, a player chooses a vertex Ο€t\pi_t of the permutahedron and suffers an observed loss of βˆ‘i=1nΟ€(i)st(i)\sum_{i=1}^n \pi(i) s_t(i). A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(nTlog⁑n)O(n\sqrt{T \log n}) for a time horizon of TT. Unfortunately, CombBand requires at each step an nn-by-nn matrix permanent approximation to within improved accuracy as TT grows, resulting in a total running time that is super linear in TT, making it impractical for large time horizons. We provide an algorithm of regret O(n3/2T)O(n^{3/2}\sqrt{T}) with total time complexity O(n3T)O(n^3T). The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters

    Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

    Full text link
    Given a set VV of nn objects, an online ranking system outputs at each time step a full ranking of the set, observes a feedback of some form and suffers a loss. We study the setting in which the (adversarial) feedback is an element in VV, and the loss is the position (0th, 1st, 2nd...) of the item in the outputted ranking. More generally, we study a setting in which the feedback is a subset UU of at most kk elements in VV, and the loss is the sum of the positions of those elements. We present an algorithm of expected regret O(n3/2Tk)O(n^{3/2}\sqrt{Tk}) over a time horizon of TT steps with respect to the best single ranking in hindsight. This improves previous algorithms and analyses either by a factor of either Ω(k)\Omega(\sqrt{k}), a factor of Ω(log⁑n)\Omega(\sqrt{\log n}) or by improving running time from quadratic to O(nlog⁑n)O(n\log n) per round. We also prove a matching lower bound. Our techniques also imply an improved regret bound for online rank aggregation over the Spearman correlation measure, and to other more complex ranking loss functions

    Optimal algorithms for group distributionally robust optimization and beyond

    Full text link
    Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known method

    Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

    Full text link
    Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves O(log⁑T)O(\log{T}) regret in the stochastic environment and O(mnT)O(m\sqrt{nT}) regret in the adversarial environment, where TT is the number of rounds, nn is the number of items and mm is the number of positions. We also provide a lower bound of order Ω(mnT)\Omega(m\sqrt{nT}) for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment

    Universal Algorithms: Beyond the Simplex

    Full text link
    The bulk of universal algorithms in the online convex optimisation literature are variants of the Hedge (exponential weights) algorithm on the simplex. While these algorithms extend to polytope domains by assigning weights to the vertices, this process is computationally unfeasible for many important classes of polytopes where the number VV of vertices depends exponentially on the dimension dd. In this paper we show the Subgradient algorithm is universal, meaning it has O(N)O(\sqrt N) regret in the antagonistic setting and O(1)O(1) pseudo-regret in the i.i.d setting, with two main advantages over Hedge: (1) The update step is more efficient as the action vectors have length only dd rather than VV; and (2) Subgradient gives better performance if the cost vectors satisfy Euclidean rather than sup-norm bounds. This paper extends the authors' recent results for Subgradient on the simplex. We also prove the same O(N)O(\sqrt N) and O(1)O(1) bounds when the domain is the unit ball. To the authors' knowledge this is the first instance of these bounds on a domain other than a polytope.Comment: 1 figure, 40 page
    corecore