612 research outputs found
Analysis of Different Types of Regret in Continuous Noisy Optimization
The performance measure of an algorithm is a crucial part of its analysis.
The performance can be determined by the study on the convergence rate of the
algorithm in question. It is necessary to study some (hopefully convergent)
sequence that will measure how "good" is the approximated optimum compared to
the real optimum. The concept of Regret is widely used in the bandit literature
for assessing the performance of an algorithm. The same concept is also used in
the framework of optimization algorithms, sometimes under other names or
without a specific name. And the numerical evaluation of convergence rate of
noisy algorithms often involves approximations of regrets. We discuss here two
types of approximations of Simple Regret used in practice for the evaluation of
algorithms for noisy optimization. We use specific algorithms of different
nature and the noisy sphere function to show the following results. The
approximation of Simple Regret, termed here Approximate Simple Regret, used in
some optimization testbeds, fails to estimate the Simple Regret convergence
rate. We also discuss a recent new approximation of Simple Regret, that we term
Robust Simple Regret, and show its advantages and disadvantages.Comment: Genetic and Evolutionary Computation Conference 2016, Jul 2016,
Denver, United States. 201
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
In this paper, we consider the challenge of maximizing an unknown function f
for which evaluations are noisy and are acquired with high cost. An iterative
procedure uses the previous measures to actively select the next estimation of
f which is predicted to be the most useful. We focus on the case where the
function can be evaluated in parallel with batches of fixed size and analyze
the benefit compared to the purely sequential procedure in terms of cumulative
regret. We introduce the Gaussian Process Upper Confidence Bound and Pure
Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure
Exploration in the same batch of evaluations along the parallel iterations. We
prove theoretical upper bounds on the regret with batches of size K for this
procedure which show the improvement of the order of sqrt{K} for fixed
iteration cost over purely sequential versions. Moreover, the multiplicative
constants involved have the property of being dimension-free. We also confirm
empirically the efficiency of GP-UCB-PE on real and synthetic problems compared
to state-of-the-art competitors
Efficient approximate thompson sampling for search query recommendation
Query suggestions have been a valuable feature for e-commerce sites in helping shoppers refine their search intent. In this paper, we develop an algorithm that helps e-commerce sites like eBay mingle the output of different recommendation al-gorithms. Our algorithm is based on “Thompson Sampling” — a technique designed for solving multi-arm bandit prob-lems where the best results are not known in advance but instead are tried out to gather feedback. Our approach is to treat query suggestions as a competition among data re-sources: we have many query suggestion candidates compet-ing for limited space on the search results page. An “arm” is played when a query suggestion candidate is chosen for display, and our goal is to maximize the expected reward (user clicks on a suggestion). Our experiments have shown promising results in using the click-based user feedback to drive success by enhancing the quality of query suggestions
Layer-by-Layer Assembly and UV Photoreduction of Graphene-Polyoxometalate Composite Films for Electronics
Single File Diffusion enhancement in a fluctuating modulated 1D channel
We show that the diffusion of a single file of particles moving in a
fluctuating modulated 1D channel is enhanced with respect to the one in a bald
pipe. This effect, induced by the fluctuations of the modulation, is favored by
the incommensurability between the channel potential modulation and the moving
file periodicity. This phenomenon could be of importance in order to optimize
the critical current in superconductors, in particular in the case where mobile
vortices move in 1D channels designed by adapted patterns of pinning sites.Comment: 4 pages, 4 figure
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic bandits
has drawn much interest in understanding its theoretical properties. One
important benefit of the algorithm is that it allows domain knowledge to be
conveniently encoded as a prior distribution to balance exploration and
exploitation more effectively. While it is generally believed that the
algorithm's regret is low (high) when the prior is good (bad), little is known
about the exact dependence. In this paper, we fully characterize the
algorithm's worst-case dependence of regret on the choice of prior, focusing on
a special yet representative case. These results also provide insights into the
general sensitivity of the algorithm to the choice of priors. In particular,
with being the prior probability mass of the true reward-generating model,
we prove and regret upper bounds for the
bad- and good-prior cases, respectively, as well as \emph{matching} lower
bounds. Our proofs rely on the discovery of a fundamental property of Thompson
Sampling and make heavy use of martingale theory, both of which appear novel in
the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning
Theory (ALT), 201
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies
Pandemic influenza has the epidemic potential to kill millions of people.
While various preventive measures exist (i.a., vaccination and school
closures), deciding on strategies that lead to their most effective and
efficient use remains challenging. To this end, individual-based
epidemiological models are essential to assist decision makers in determining
the best strategy to curb epidemic spread. However, individual-based models are
computationally intensive and it is therefore pivotal to identify the optimal
strategy using a minimal amount of model evaluations. Additionally, as
epidemiological modeling experiments need to be planned, a computational budget
needs to be specified a priori. Consequently, we present a new sampling
technique to optimize the evaluation of preventive strategies using fixed
budget best-arm identification algorithms. We use epidemiological modeling
theory to derive knowledge about the reward distribution which we exploit using
Bayesian best-arm identification algorithms (i.e., Top-two Thompson sampling
and BayesGap). We evaluate these algorithms in a realistic experimental setting
and demonstrate that it is possible to identify the optimal strategy using only
a limited number of model evaluations, i.e., 2-to-3 times faster compared to
the uniform sampling method, the predominant technique used for epidemiological
decision making in the literature. Finally, we contribute and evaluate a
statistic for Top-two Thompson sampling to inform the decision makers about the
confidence of an arm recommendation
Structural basis for membrane attack complex inhibition by CD59
CD59 is an abundant immuno-regulatory receptor that protects human cells from damage during complement activation. Here we show how the receptor binds complement proteins C8 and C9 at the membrane to prevent insertion and polymerization of membrane attack complex (MAC) pores. We present cryo-electron microscopy structures of two inhibited MAC precursors known as C5b8 and C5b9. We discover that in both complexes, CD59 binds the pore-forming β-hairpins of C8 to form an intermolecular β-sheet that prevents membrane perforation. While bound to C8, CD59 deflects the cascading C9 β-hairpins, rerouting their trajectory into the membrane. Preventing insertion of C9 restricts structural transitions of subsequent monomers and indirectly halts MAC polymerization. We combine our structural data with cellular assays and molecular dynamics simulations to explain how the membrane environment impacts the dual roles of CD59 in controlling pore formation of MAC, and as a target of bacterial virulence factors which hijack CD59 to lyse human cells
Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization
International audienceThe use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce error-correcting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rollouts-based approaches. The first method consists in using an ECOC-based classifier as the multiclass classifier, reducing the learning complexity from O(A2) to O(Alog(A)) . We then propose a novel method that profits from the ECOC's coding dictionary to split the initial MDP into O(log(A)) separate two-action MDPs. This second method reduces learning complexity even further, from O(A2) to O(log(A)) , thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance
- …