Search CORE

134 research outputs found

Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm

Author: V'yugin Vladimir V.
Publication venue
Publication date: 01/01/2010
Field of study

In this paper the sequential prediction problem with expert advice is considered for the case where losses of experts suffered at each step cannot be bounded in advance. We present some modification of Kalai and Vempala algorithm of following the perturbed leader where weights depend on past losses of the experts. New notions of a volume and a scaled fluctuation of a game are introduced. We present a probabilistic algorithm protected from unrestrictedly large one-step losses. This algorithm has the optimal performance in the case when the scaled fluctuations of one-step losses of experts of the pool tend to zero.Comment: 31 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

First-order regret bounds for combinatorial semi-bandits

Author: Neu Gergely
Publication venue
Publication date: 10/06/2015
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as

\widetilde{O}(\sqrt{T})

with the number of rounds

T

. In this paper, we propose an algorithm that improves this scaling to

\widetilde{O}(\sqrt{{L_T^*}})

, where

L_T^*

is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Analysis of Perturbation Techniques in Online Learning

Author: Lee Chansoo
Publication venue
Publication date: 01/01/2018
Field of study

The most commonly used regularization technique in machine learning is to directly add a penalty function to the optimization objective. For example,

L_2

regularization is universally applied to a wide range of models including linear regression and neural networks. The alternative regularization technique, which has become essential in modern applications of machine learning, is implicit regularization by injecting random noise into the training data. In fact, this idea of using random perturbations as regularizer has been one of the first algorithms for online learning, where a learner chooses actions iteratively on a data sequence that may be designed adversarially to thwart learning process. One such classical algorithm is known as Follow The Perturbed Leader (FTPL). This dissertation presents new interpretations of FTPL. In the first part, we show that FTPL is equivalent to playing the gradients of a stochastically smoothed potential function in the dual space. In the second part, we show that FTPL is the extension of a differentially private mechanism that has inherent stability guarantees. These perspectives lead to novel frameworks for FTPL regret analysis, which not only prove strong performance guarantees but also help characterize the optimal choice of noise distributions. Furthermore, they extend to the partial information setting where the learner observes only part of the input data.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143968/1/chansool_1.pd

Deep Blue Documents at the University of Michigan

Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

Author: Poland Jan
Publication venue: 'Elsevier BV'
Publication date: 20/05/2008
Field of study

AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (“experts”), under partial observation: In each round t, only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decision. It is known that an adversary controlling the costs of the decisions can force the player a regret growing as t12 in the time t. In this work, we propose the first algorithm for a countably infinite set of decisions, that achieves a regret upper bounded by O(t12+ε), i.e. arbitrarily close to optimal order. To this aim, we build on the “follow the perturbed leader” principle, which dates back to work by Hannan in 1957. Our results hold against an adaptive adversary, for both the expected and high probability regret of the learner w.r.t. each decision. In the second part of the paper, we consider reactive problem settings, that is, situations where the learner’s decisions impact on the future behaviour of the adversary, and a strong strategy can draw benefit from well chosen past actions. We present a variant of our regret minimization algorithm which has still regret of order at most t12+ε relative to such strong strategies, and even sublinear regret not exceeding O(t45) w.r.t. the hypothetical (without external interference) performance of a strong strategy. We show how to combine the regret minimizer with a universal class of experts, given by the countable set of programs on some fixed universal Turing machine. This defines a universal learner with sublinear regret relative to any computable strategy

Elsevier - Publisher Connector

On Adaptivity in Information-constrained Online Learning

Author: Gopalan Aditya
Mitra Siddharth
Publication venue
Publication date: 06/12/2019
Field of study

We study how to adapt to smoothly-varying ('easy') environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with expert advice, we present an online algorithm whose regret depends optimally on the number of labels allowed and

Q^*

(the quadratic variation of the losses of the best action in hindsight), along with a parameter-free counterpart whose regret depends optimally on

Q

(the quadratic variation of the losses of all the actions). These quantities can be significantly smaller than

T

(the total time horizon), yielding an improvement over existing, variation-independent results for the problem. We then extend our analysis to handle label efficient prediction with bandit feedback, i.e., label efficient bandits. Our work builds upon the framework of optimistic online mirror descent, and leverages second order corrections along with a carefully designed hybrid regularizer that encodes the constrained information structure of the problem. We then consider revealing action-partial monitoring games -- a version of label efficient prediction with additive information costs, which in general are known to lie in the \textit{hard} class of games having minimax regret of order

T^{\frac{2}{3}}

. We provide a strategy with an

\mathcal{O}((Q^*T)^{\frac{1}{3}})

bound for revealing action games, along with one with a

\mathcal{O}((QT)^{\frac{1}{3}})

bound for the full class of hard partial monitoring games, both being strict improvements over current bounds.Comment: 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Short version at 11th Optimization for Machine Learning workshop (OPT 2019

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications