Search CORE

166 research outputs found

Boundary Crossing Probabilities for General Exponential Families

Author: Maillard Odalric-Ambrym
Publication venue
Publication date: 24/05/2017
Field of study

We consider parametric exponential families of dimension

K

on the real line. We study a variant of \textit{boundary crossing probabilities} coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension

K

. Formally, our result is a concentration inequality that bounds the probability that

\mathcal{B}^\psi(\hat \theta_n,\theta^\star)\geq f(t/n)/n

, where

\theta^\star

is the parameter of an unknown target distribution,

\hat \theta_n

is the empirical parameter estimate built from

n

observations,

\psi

is the log-partition function of the exponential family and

\mathcal{B}^\psi

is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function

f

is logarithmic, as it is enables to analyze the regret of the state-of-the-art \KLUCB\ and \KLUCBp\ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when

K=1

, while we provide results for arbitrary finite dimension

K

, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T.L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds

Author: Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

International audienceWe consider change-point detection in a fully sequential setup, when observations are received one by one and one must raise an alarm as early as possible after any change. We assume that both the change points and the distributions before and after the change are unknown. We consider the class of piecewise-constant mean processes with sub-Gaussian noise, and we target a detection strategy that is uniformly good on this class (this constrains the false alarm rate and detection delay). We introduce a novel tuning of the GLR test that takes here a simple form involving scan statistics, based on a novel sharp concentration inequality using an extension of the Laplace method for scan-statistics that holds doubly-uniformly in time. This also considerably simplifies the implementation of the test and analysis. We provide (perhaps surprisingly) the first fully non-asymptotic analysis of the detection delay of this test that matches the known existing asymptotic orders, with fully explicit numerical constants. Then, we extend this analysis to allow some changes that are not-detectable by any uniformly-good strategy (the number of observations before and after the change are too small for it to be detected by any such algorithm), and provide the first robust, finite-time analysis of the detection delay

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Concentration inequalities for sampling without replacement

Author: Bardenet Rémi
Maillard Odalric-Ambrym
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2015
Field of study

Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement from a finite population. Until now, the best general concentration inequality has been a Hoeffding inequality due to Serfling [Ann. Statist. 2 (1974) 39-48]. In this paper, we first improve on the fundamental result of Serfling [Ann. Statist. 2 (1974) 39-48], and further extend it to obtain a Bernstein concentration bound for sampling without replacement. We then derive an empirical version of our bound that does not require the variance to be known to the user.Comment: Published at http://dx.doi.org/10.3150/14-BEJ605 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Risk-aware linear bandits with convex loss

Author: Maillard Odalric-Ambrym
Saux Patrick
Publication venue
Publication date: 27/03/2023
Field of study

In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments

arXiv.org e-Print Archive

Basic Concentration Properties of Real-Valued Distributions

Author: Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 01/09/2017
Field of study

DoctoralIn this note we introduce and discuss a few concentration tools for the study of concentration inequalities on the real line. After recalling versions of the Chernoff method, we move to concentration inequalities for predictable processes. We especially focus on bounds that enable to handle the sum of real-valued random variables, where the number of summands is itself a random stopping time, and target fully explicit and empirical bounds. We then discuss some important other tools, such as the Laplace method and the transportation lemma

INRIA a CCSD electronic archive server

Practical Open-Loop Optimistic Planning

Author: Leurent Edouard
Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 16/09/2019
Field of study

International audienceWe consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies-i.e. sequences of actions-and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KL-OLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

Author: Maillard Odalric-Ambrym
Ortner Ronald
Ryabko Daniil
Publication venue
Publication date: 15/09/2014
Field of study

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP). Instead, she has access to several models that map histories of past interactions to states. Here we improve over known regret bounds in this setting, and more importantly generalize to the case where the models given to the learner do not contain a true model resulting in an MDP representation but only approximations of it. We also give improved error bounds for state aggregation

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Brownian Motions and Scrambled Wavelets for Least-Squares Regression

Author: Maillard Odalric-Ambrym
Munos Rémi
Publication venue: HAL CCSD
Publication date: 12/05/2010
Field of study

We consider ordinary (non penalized) least-squares regression where the regression function is chosen in a randomly generated sub-space GP \subset S of finite dimension P, where S is a function space of infinite dimension, e.g. L2([0, 1]^d). GP is defined as the span of P random features that are linear combinations of the basis functions of S weighted by random Gaussian i.i.d. coefficients. We characterize the so-called kernel space K \subset S of the resulting Gaussian process and derive approximation error bounds of order O(||f||^2_K log(P)/P) for functions f \in K approximated in GP . We apply this result to derive excess risk bounds for the least-squares estimate in various spaces. For illustration, we consider regression using the so-called scrambled wavelets (i.e. random linear combinations of wavelets of L2([0, 1]^d)) and derive an excess risk rate O(||f*||_K(logN)/sqrt(N)) which is arbitrarily close to the minimax optimal rate (up to a logarithmic factor) for target functions f* in K = H^s([0, 1]^d), a Sobolev space of smoothness order s > d/2. We describe an efficient implementation using lazy expansions with numerical complexity ˜O(2dN^3/2 logN+N^5/2), where d is the dimension of the input data and N is the number of data

HAL - Lille 3

INRIA a CCSD electronic archive server

Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds

Author: Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

INRIA a CCSD electronic archive server