199 research outputs found

    Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

    Full text link
    The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.Comment: 15 pages, 2 figures, submitted to ALT (Algorithmic Learning Theory

    Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates

    Full text link
    In this paper, we provide a novel construction of the linear-sized spectral sparsifiers of Batson, Spielman and Srivastava [BSS14]. While previous constructions required Ω(n4)\Omega(n^4) running time [BSS14, Zou12], our sparsification routine can be implemented in almost-quadratic running time O(n2+Δ)O(n^{2+\varepsilon}). The fundamental conceptual novelty of our work is the leveraging of a strong connection between sparsification and a regret minimization problem over density matrices. This connection was known to provide an interpretation of the randomized sparsifiers of Spielman and Srivastava [SS11] via the application of matrix multiplicative weight updates (MWU) [CHS11, Vis14]. In this paper, we explain how matrix MWU naturally arises as an instance of the Follow-the-Regularized-Leader framework and generalize this approach to yield a larger class of updates. This new class allows us to accelerate the construction of linear-sized spectral sparsifiers, and give novel insights on the motivation behind Batson, Spielman and Srivastava [BSS14]

    An efficient algorithm for learning with semi-bandit feedback

    Full text link
    We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

    PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

    Get PDF
    The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

    Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

    Get PDF
    International audienceIn this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multi-armed bandit setting. If the variances of the arms were known, one could design an optimal sampling strategy by pulling the arms proportionally to their variances. However, since the distributions are not known in advance, we need to design adaptive sampling strategies to select an arm at each round based on the previous observed samples. We describe two strategies based on pulling the arms proportionally to an upper-bound on their variances and derive regret bounds for these strategies. %on the excess estimation error compared to the optimal allocation. We show that the performance of these allocation strategies depends not only on the variances of the arms but also on the full shape of their distributions

    Sequential decision making with vector outcomes

    Full text link
    We study a multi-round optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure

    An expression signature of the angiogenic response in gastrointestinal neuroendocrine tumours: correlation with tumour phenotype and survival outcomes.

    Get PDF
    BACKGROUND: Gastroenteropancreatic neuroendocrine tumours (GEP-NETs) are heterogeneous with respect to biological behaviour and prognosis. As angiogenesis is a renowned pathogenic hallmark as well as a therapeutic target, we aimed to investigate the prognostic and clinico-pathological role of tissue markers of hypoxia and angiogenesis in GEP-NETs. METHODS: Tissue microarray (TMA) blocks were constructed with 86 tumours diagnosed from 1988 to 2010. Tissue microarray sections were immunostained for hypoxia inducible factor 1α (Hif-1α), vascular endothelial growth factor-A (VEGF-A), carbonic anhydrase IX (Ca-IX) and somatostatin receptors (SSTR) 1–5, Ki-67 and CD31. Biomarker expression was correlated with clinico-pathological variables and tested for survival prediction using Kaplan–Meier and Cox regression methods. RESULTS: Eighty-six consecutive cases were included: 51% male, median age 51 (range 16–82), 68% presenting with a pancreatic primary, 95% well differentiated, 51% metastatic. Higher grading (P=0.03), advanced stage (P<0.001), high Hif-1α and low SSTR-2 expression (P=0.03) predicted for shorter overall survival (OS) on univariate analyses. Stage, SSTR-2 and Hif-1α expression were confirmed as multivariate predictors of OS. Median OS for patients with SSTR-2+/Hif-1α-tumours was not reached after median follow up of 8.8 years, whereas SSTR-2-/Hif-1α+ GEP-NETs had a median survival of only 4.2 years (P=0.006). CONCLUSION: We have identified a coherent expression signature by immunohistochemistry that can be used for patient stratification and to optimise treatment decisions in GEP-NETs independently from stage and grading. Tumours with preserved SSTR-2 and low Hif-1α expression have an indolent phenotype and may be offered less aggressive management and less stringent follow up

    Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

    Get PDF
    We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities
    • 

    corecore