612 research outputs found

    Analysis of Different Types of Regret in Continuous Noisy Optimization

    Get PDF
    The performance measure of an algorithm is a crucial part of its analysis. The performance can be determined by the study on the convergence rate of the algorithm in question. It is necessary to study some (hopefully convergent) sequence that will measure how "good" is the approximated optimum compared to the real optimum. The concept of Regret is widely used in the bandit literature for assessing the performance of an algorithm. The same concept is also used in the framework of optimization algorithms, sometimes under other names or without a specific name. And the numerical evaluation of convergence rate of noisy algorithms often involves approximations of regrets. We discuss here two types of approximations of Simple Regret used in practice for the evaluation of algorithms for noisy optimization. We use specific algorithms of different nature and the noisy sphere function to show the following results. The approximation of Simple Regret, termed here Approximate Simple Regret, used in some optimization testbeds, fails to estimate the Simple Regret convergence rate. We also discuss a recent new approximation of Simple Regret, that we term Robust Simple Regret, and show its advantages and disadvantages.Comment: Genetic and Evolutionary Computation Conference 2016, Jul 2016, Denver, United States. 201

    Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

    Full text link
    In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of sqrt{K} for fixed iteration cost over purely sequential versions. Moreover, the multiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors

    Efficient approximate thompson sampling for search query recommendation

    Full text link
    Query suggestions have been a valuable feature for e-commerce sites in helping shoppers refine their search intent. In this paper, we develop an algorithm that helps e-commerce sites like eBay mingle the output of different recommendation al-gorithms. Our algorithm is based on “Thompson Sampling” — a technique designed for solving multi-arm bandit prob-lems where the best results are not known in advance but instead are tried out to gather feedback. Our approach is to treat query suggestions as a competition among data re-sources: we have many query suggestion candidates compet-ing for limited space on the search results page. An “arm” is played when a query suggestion candidate is chosen for display, and our goal is to maximize the expected reward (user clicks on a suggestion). Our experiments have shown promising results in using the click-based user feedback to drive success by enhancing the quality of query suggestions

    Single File Diffusion enhancement in a fluctuating modulated 1D channel

    Full text link
    We show that the diffusion of a single file of particles moving in a fluctuating modulated 1D channel is enhanced with respect to the one in a bald pipe. This effect, induced by the fluctuations of the modulation, is favored by the incommensurability between the channel potential modulation and the moving file periodicity. This phenomenon could be of importance in order to optimize the critical current in superconductors, in particular in the case where mobile vortices move in 1D channels designed by adapted patterns of pinning sites.Comment: 4 pages, 4 figure

    On the Prior Sensitivity of Thompson Sampling

    Full text link
    The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm's regret is low (high) when the prior is good (bad), little is known about the exact dependence. In this paper, we fully characterize the algorithm's worst-case dependence of regret on the choice of prior, focusing on a special yet representative case. These results also provide insights into the general sensitivity of the algorithm to the choice of priors. In particular, with pp being the prior probability mass of the true reward-generating model, we prove O(T/p)O(\sqrt{T/p}) and O((1p)T)O(\sqrt{(1-p)T}) regret upper bounds for the bad- and good-prior cases, respectively, as well as \emph{matching} lower bounds. Our proofs rely on the discovery of a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning Theory (ALT), 201

    Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies

    Full text link
    Pandemic influenza has the epidemic potential to kill millions of people. While various preventive measures exist (i.a., vaccination and school closures), deciding on strategies that lead to their most effective and efficient use remains challenging. To this end, individual-based epidemiological models are essential to assist decision makers in determining the best strategy to curb epidemic spread. However, individual-based models are computationally intensive and it is therefore pivotal to identify the optimal strategy using a minimal amount of model evaluations. Additionally, as epidemiological modeling experiments need to be planned, a computational budget needs to be specified a priori. Consequently, we present a new sampling technique to optimize the evaluation of preventive strategies using fixed budget best-arm identification algorithms. We use epidemiological modeling theory to derive knowledge about the reward distribution which we exploit using Bayesian best-arm identification algorithms (i.e., Top-two Thompson sampling and BayesGap). We evaluate these algorithms in a realistic experimental setting and demonstrate that it is possible to identify the optimal strategy using only a limited number of model evaluations, i.e., 2-to-3 times faster compared to the uniform sampling method, the predominant technique used for epidemiological decision making in the literature. Finally, we contribute and evaluate a statistic for Top-two Thompson sampling to inform the decision makers about the confidence of an arm recommendation

    Structural basis for membrane attack complex inhibition by CD59

    Get PDF
    CD59 is an abundant immuno-regulatory receptor that protects human cells from damage during complement activation. Here we show how the receptor binds complement proteins C8 and C9 at the membrane to prevent insertion and polymerization of membrane attack complex (MAC) pores. We present cryo-electron microscopy structures of two inhibited MAC precursors known as C5b8 and C5b9. We discover that in both complexes, CD59 binds the pore-forming β-hairpins of C8 to form an intermolecular β-sheet that prevents membrane perforation. While bound to C8, CD59 deflects the cascading C9 β-hairpins, rerouting their trajectory into the membrane. Preventing insertion of C9 restricts structural transitions of subsequent monomers and indirectly halts MAC polymerization. We combine our structural data with cellular assays and molecular dynamics simulations to explain how the membrane environment impacts the dual roles of CD59 in controlling pore formation of MAC, and as a target of bacterial virulence factors which hijack CD59 to lyse human cells

    Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

    Get PDF
    International audienceThe use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce error-correcting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rollouts-based approaches. The first method consists in using an ECOC-based classifier as the multiclass classifier, reducing the learning complexity from O(A2) to O(Alog(A)) . We then propose a novel method that profits from the ECOC's coding dictionary to split the initial MDP into O(log(A)) separate two-action MDPs. This second method reduces learning complexity even further, from O(A2) to O(log(A)) , thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance
    corecore