Search CORE

735 research outputs found

Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains

Author: Auger Anne
Hansen Nikolaus
Publication venue
Publication date: 01/06/2016
Field of study

In this paper, we consider comparison-based adaptive stochastic algorithms for solving numerical optimisation problems. We consider a specific subclass of algorithms that we call comparison-based step-size adaptive randomized search (CB-SARS), where the state variables at a given iteration are a vector of the search space and a positive parameter, the step-size, typically controlling the overall standard deviation of the underlying search distribution.We investigate the linear convergence of CB-SARS on\emph{scaling-invariant} objective functions. Scaling-invariantfunctions preserve the ordering of points with respect to their functionvalue when the points are scaled with the same positive parameter (thescaling is done w.r.t. a fixed reference point). This class offunctions includes norms composed with strictly increasing functions aswell as many non quasi-convex and non-continuousfunctions. On scaling-invariant functions, we show the existence of ahomogeneous Markov chain, as a consequence of natural invarianceproperties of CB-SARS (essentially scale-invariance and invariance tostrictly increasing transformation of the objective function). We thenderive sufficient conditions for \emph{global linear convergence} ofCB-SARS, expressed in terms of different stability conditions of thenormalised homogeneous Markov chain (irreducibility, positivity, Harrisrecurrence, geometric ergodicity) and thus define a general methodologyfor proving global linear convergence of CB-SARS algorithms onscaling-invariant functions. As a by-product we provide aconnexion between comparison-based adaptive stochasticalgorithms and Markov chain Monte Carlo algorithms.Comment: SIAM Journal on Optimization, Society for Industrial and Applied Mathematics, 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Linear Convergence on Positively Homogeneous Functions of a Comparison Based Step-Size Adaptive Randomized Search: the (1+1) ES with Generalized One-fifth Success Rule

Author: Auger Anne
Hansen Nikolaus
Publication venue
Publication date: 30/10/2013
Field of study

In the context of unconstraint numerical optimization, this paper investigates the global linear convergence of a simple probabilistic derivative-free optimization algorithm (DFO). The algorithm samples a candidate solution from a standard multivariate normal distribution scaled by a step-size and centered in the current solution. This solution is accepted if it has a better objective function value than the current one. Crucial to the algorithm is the adaptation of the step-size that is done in order to maintain a certain probability of success. The algorithm, already proposed in the 60's, is a generalization of the well-known Rechenberg's

(1+1)

Evolution Strategy (ES) with one-fifth success rule which was also proposed by Devroye under the name compound random search or by Schumer and Steiglitz under the name step-size adaptive random search. In addition to be derivative-free, the algorithm is function-value-free: it exploits the objective function only through comparisons. It belongs to the class of comparison-based step-size adaptive randomized search (CB-SARS). For the convergence analysis, we follow the methodology developed in a companion paper for investigating linear convergence of CB-SARS: by exploiting invariance properties of the algorithm, we turn the study of global linear convergence on scaling-invariant functions into the study of the stability of an underlying normalized Markov chain (MC). We hence prove global linear convergence by studying the stability (irreducibility, recurrence, positivity, geometric ergodicity) of the normalized MC associated to the

(1+1)

-ES. More precisely, we prove that starting from any initial solution and any step-size, linear convergence with probability one and in expectation occurs. Our proof holds on unimodal functions that are the composite of strictly increasing functions by positively homogeneous functions with degree

\alpha

(assumed also to be continuously differentiable). This function class includes composite of norm functions but also non-quasi convex functions. Because of the composition by a strictly increasing function, it includes non continuous functions. We find that a sufficient condition for global linear convergence is the step-size increase on linear functions, a condition typically satisfied for standard parameter choices. While introduced more than 40 years ago, we provide here the first proof of global linear convergence for the

(1+1)

-ES with generalized one-fifth success rule and the first proof of linear convergence for a CB-SARS on such a class of functions that includes non-quasi convex and non-continuous functions. Our proof also holds on functions where linear convergence of some CB-SARS was previously proven, namely convex-quadratic functions (including the well-know sphere function)

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Thèse d'habilitation à diriger des recherches "Analysis of Comparison-based Stochastic Continuous Black-Box Optimization Algorithms"

Author: Auger Anne
Publication venue: HAL CCSD
Publication date: 02/05/2016
Field of study

This manuscript presents a large part of my research since the end of my PhD. Most of mywork is related to numerical (also referred to as continuous) optimization, at the exception of onecontribution done during my postdoc in Zurich introducing a new stochastic algorithm to simulatechemical or biochemical systems [23].The optimization algorithms at the core of my work are adaptive derivative-free stochastic (orrandomized) optimization methods. The algorithms are tailored to tackle dificult numerical optimizationproblems in a so-called black-box context where the objective function to be optimized isseen as a black-box. For a given input solution, the black-box returns solely the objective functionvalue but no gradient or higher order derivatives are assumed. The optimization algorithm canuse the information returned by the black-box, i.e. the history of function values associated tothe queried search points, but no other knowledge that could be within the black-box (parametersdescribing the class of functions the function belongs to, ...). This black-box context is verynatural in industrial settings where the function to be optimized can be given by an executablefile for which the source code is not provided. It is also natural in situations where the functionis given by a large simulation code from which it is hard to extract any useful information for theoptimization.This context is also called derivative-free optimization (DFO) in the mathematical optimizationcommunity. Well-known DFO methods are the Nelder-Mead algorithm [79, 77], pattern searchmethods [54, 90, 6] or more recently the NEW Unconstraint Optimization Algorithm (NEWUOA)developed by Powell [82, 81].In this context, I have been focusing on DFO methods in the literal sense. However the methodsmy research is centered on have a large stochastic component and originate from the community ofbio-inspired algorithms mainly composed of computer scientists and engineers. The methods wereintroduced at the end of the 70's. A parallel with Darwin's theory of the evolution of species basedon blind variation and natural selection was recognized and served as source of inspiration for thosemethods. Nowadays this field of bio-inspired methods is referred to as evolutionary computation(EC) and a generic term for the methods is evolutionary algorithms. The probably most famousexamples of bio-inspired methods are genetic algorithms (GAs). However today GAs are known tobe not competitive for numerical optimization. Evolution Strategies (ES) introduced in the endof the 70's [83] have emerged as the main sub-branch of EC devoted to continuous optimization.One important feature of ES is that they are comparison-based algorithms. The present mostadvanced ES algorithm, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [50]is a variable metric method recognized as the state-of-the-art method for stochastic numericaloptimization. It is used in many applications in industry and academy.Because of historical reasons, the developments and work on Evolution Strategies are mainlycarried out in the EC field where practice and effectiveness is definitely as (or more) importantas having a theorem proven about an algorithm. However ES algorithms are simply adaptivestochastic iterative methods and they need to be studied from a mathematical perspective aswell as any other iterative method in optimization or other domain in order to understand themethods better and convince a broader class of people about their soundness. Questions like theirconvergence and speed of convergence central in optimization need to be addressed.My research is encompassed within this general context: I am particularly interested by themathematical aspects of adaptive stochastic methods like ES (and of course CMA-ES) or moregenerally adaptive stochastic optimization algorithms. Evolution strategies have this attractivefacet that while introduced in the bio-inspired and engineering context, they turn out to bemethods with deep theoretical foundations related to invariance, information geometry, stochasticapproximation and strongly connected to Markov chain Monte Carlo (MCMC) algorithms. Thosefoundations and connections are relatively new and to a small (for some topics) or large (forothers) extent partly related to some of my contributions. They will be explained within themanuscript. I particularly care that the theory I am working on relates to practical algorithms orhas an impact on (new) algorithm designs. I attempt to illustrate this within the manuscript.While optimization is the central theme of my research, I have been tackling various aspect ofoptimization. Although most of my work is devoted to single-objective optimization, I have alsobeen working on multi-objective optimization where the goal is to optimize simultaneously severalconflicting objectives and where instead of a single solution, a set of solutions, the so-called Paretoset composed of the best compromises is searched.In the field of single-objective optimization, I have been tackling diverse contexts like noisyoptimization where for a given point in a search space we do not observe one deterministic valuebut a distribution of possible function values, large-scale optimization where one is interested intackling problems of the order of 104 (medium large-scale) to 106 variables (large-scale) and to asmaller extent constrained optimization.In addition to investigating theoretical questions, I have been also working on designing newalgorithms that calls for theory complemented with numerical simulations. Last I have tackledsome applications mainly in the context of the PhD of Mohamed Jebalia with an application inchromatography and of the PhD of Zyed Bouzarkouna (PhD financed by the French Institute forpetrol) on the placement of oil wells.Furthermore, a non neglect-able part of my research those past years has been devoted tobenchmarking of algorithms. Benchmarking complements theory as it is difficult to assess theoreticallythe performance of algorithms on all typical functions one is interested. The mainmotivation has then been to improve the standards on how benchmarking is done. Those contributionswere done along with the development of the Comparing COntinuous Optimizers platform(COCO).My work is articulated around three main complementary axis, namely theory / algorithmdesign and applications. An overview of the contributions presented within this habilitationorganized along those axes is given in Figure 3.1.Ce mémoire décrit l'essentiel de mon travail scientifique depuis la fin de ma thèse. Mes travauxsont centrés sur l'optimisation numérique dite "boîte-noire" à l'exception d'un article effectuédurant mon séjour post-doctoral à l'ETH Zurich qui introduit un nouvel algorithme d'optimisationstochastique pour simuler des systèmes en chimie ou bio-chimie [23].Les algorithmes d'optimisation au coeur de mon travail sont des algorithmes adaptatifs sansdérivées et stochastiques. Ils sont particulièrement adaptés à l'optimisation de problèmes difficiles dans des contextes oèu la fonction n'est accessible qu'à travers une \boîte-noire" retournantl'information d'ordre zero, c'est-à-dire que la seule information disponible et utilisable parl'algorithme sont les couples (points de l'espace de recherche, valeur de fonction objectif associée).Ce contexte est très courant dans l'industrie oèu les problèmes d'optimisation rencontrés font appelà des codes de simulations numériques pour lesquels, souvent, simplement un executable du codeest disponible. L'aspect "sans-dérivées" est aussi très commun car le calcul d'un gradient (quiprésuppose la fonction sous-jacente dérivable) sur des codes de simulations numériques, par exempleen utilisant une méthode d'adjoint ou de differentiation automatique peut ^etre couteux entemps de développement. Il est par ailleurs usuel que la formulation d'un problème d'optimisationchange au fur et à mesure de sa résolution, adapter le code de calcul de gradient peut alors s'avérertrès lourd et peut motiver l'utilisation d'une méthode d'optimisation boîte-noire.Ce contexte d'optimisation boîte-noire s'appelle également optimisation sans dérivées dans lacommunauté \mathematical programming" et l'acronyme anglais associé est DFO pour \derivativefree optimization". Les méthodes qualifiées de DFO sont généralement deterministes. Lesméthodes DFO les plus connues à l'heure actuelle sont l'algorithme du simplexe ou de Nelder-Mead [79, 77], les algorithmes de "pattern search" [54, 90, 6] et l'algorithme NEWUOA (NEWUnconstraint Optimization Algorithm) développé par Powell [82, 81]. Ce dernier algorithme est àl'heure actuelle considéré comme l'algorithme DFO déterministe état de l'art.Mon travail porte ainsi sur des méthodes DFO au sens littéral du terme. En revanche, lesméthodes auxquelles je me suis intéressées ont une large composante stochastique et ont étédéveloppées dans la communauté des algorithmes bio-inspirés qui se compose essentiellementd'ingénieurs et d'informaticiens. Les premiers algorithmes ont été introduits dans les années70. Un parallèle entre la théorie de Darwin de l'évolution des espèces et l'optimisation a servià l'origine de source d'inspiration pour leur développement. A l'heure actuelle, ce domaine desméthodes bio-inspirées est également appelé \Evolutionary Computation". Un terme génériquepour les algorithmes est algorithme évolutionnaire (EA). Pour beaucoup de chercheurs (dont je faispartie) dans ce domaine, l'aspect bio-inspiré n'est plus présent et le développement des algorithmesest seulement motivé par des considérations mathématiques et numériques.Parmi les algorithmes évolutionnaires, les algorithmes génétiques (GA) sont probablementencore les plus célèbres en dehors de la communauté EC. En revanche, les GAs ne sont pasdes algorithmes compétitifs pour l'optimisation numérique{ce fait est reconnu depuis plus d'unedizaine d'années. Les strategies d'évolutions (ES), introduites à la fin des annéees 70 [83], se sont imposées comme les algorithmes évolutionnaires pour l'optimisation numérique. A l'heure actuelle,l'algorithme ES le plus abouti est l'algorithme Covariance Matrix Adaptation Evolution Strategy(CMA-ES) [50]. L'algorithme adapte un vecteur Gaussien (paramétré par vecteur moyenne etmatrice de covariance) qui encode la métrique sous-jacente. Cette métrique apprend sur desfonctions convexes quadratiques l'information d'ordre 2, c'est à dire que la matrice de covariancedevient proportionnelle à l'inverse de la matrice Hessienne. Ainsi, CMA-ES peut ^etre vu comme lependant stochastique d'une méthode de quasi-Newton. Une particularité essentielle de CMA-ESet des ES en général est d^u au fait qu'ils n'utilisent que des comparaisons pour les difrérentesmises à jour. Plus précisément, nous avons vu que les ESs sont des algorithmes d'optimisationsans dérivées, ils n'utilisent cependant qu'une information \dégradée" de ce que la boîte-noire leurfournit, à savoir simplement le résultat de la comparaison des solutions candidates, i.e. étant donnédeux solutions x1 et x2, est ce que f(x1) est plus grand ou plus petit que f(x2). En conséquenceils optimisent de la m^eme façcon une fonction f : Rn ! R ou n'importe quelle fonction g o f oùg : f(Rn) ! R est une fonction strictement croissante: ils sont invariants à la composition àgauche par une fonction monotone strictement croissante.L'algorithme CMA-ES est reconnu comme la méthode état de l'art pour l'optimisation stochastiquenumérique. Il est utilisé dans de nombreuses applications dans l'industrie ou dans le mondeacadémique.Pour des raisons historiques, les algorithmes ESs ont été développés dans la communauté ECoù la mise au point d'un algorithme est la plupart du temps découplée du soucis de prouverun théorème de convergence sur la méthode et repose essentiellement sur l'utilisation de modèlesmathématiques approximatifs simplifiés et de simulations numériques sur des fonctions tests. Bienque ce découplage entre mise au point pratique et théorie puisse ^etre vu comme un inconvenient,il présente l'avantage que le développement d'une méthode n'est pas restreinte (ou bridée) parune contrainte technique liée à une preuve mathématique. Cela a permis à un algorithme commeCMA-ES de voir le jour bien avant que l'on comprenne certains de ses fondements théoriques etbien avant que l'on puisse établir une preuve de convergence. En revanche, cela implique aussique les études théoriques de convergence par exemple s'avèrent relativement compliquées.Ma recherche se situe dans ce contexte général: je suis particulièrement intéressée par l'étudemathématique d'algorithmes adaptatifs stochastiques comme les algorithmes ESs (en particulierCMA-ES) et par l'établissement de preuves de convergence. Ces algorithmes ont une particularité attractive: bien qu'introduits dans un contexte où les performances pratiques sont plusimportantes que les preuves théoriques, ils s'avèrent avoir des fondements mathématiques profondsliés en particulier aux notions d'invariance et de géométrie de l'information. Par ailleurs, ilss'inscrivent dans le cadre plus général d'algorithmes d'approximation stochastique et ils sont fortementconnectés aux méthodes Monte-Carlo par chaînes de Markov (MCMC). Ces deux dernierspoints fournissent des outils mathématiques puissants pour établir des preuves de convergence(linéaire). La comprehension de ces fondements et connexions est reliée en partie à mon travailcomme cela sera illustré dans ce mémoire.J'ai abordé plusieurs facettes de l'optimisation numérique. Bien que l'essentiel de mes travauxporte sur l'optimisation mono-objectif, i.e. minimizer f : X Rn ! R, j'ai également travaillé en optimisation multi-objectif, i.e. où l'on s'intéresse à minimiser une fonction vectoriellef : X Rn ! Rk. Dans ce cas là, la notion d'optimum est remplacée par celle d'ensemblede points de Pareto composé des meilleurs compromis possibles. Mes contributions portent surl'étude d'algorithmes à base d'hypervolume qui quantifient la qualité d'un ensemble de solutionsen calculant le volume compris entre les solutions et un point de reference. Les algorithmes utilisantl'hypervolume sont à l'heure actuelle les algorithmes état de l'art. Nous avons pu établirdes caractérisations théoriques de l'ensemble des solutions optimales au sens de l'hypervolume.En optimisation mono-objectif, j'ai travaillé sur l'optimisation bruitée où étant donné un point del'espace de recherche, on observe une distribution de valeurs de fonction objectif, sur l'optimisationà grande échelle où l'on s'intéresse à l'optimisation de problèmes avec de l'ordre de 104 à 106 variableset sur l'optimisation sous contrainte.Mes travaux s'articulent autour de trois grands axes: théorie / nouveaux algorithmes / applications (voir Figure 3.1). Ces trois axes sont complémentaires et couplés: par exemple, la miseau point de nouveaux algorithmes repose sur l'établissement de bornes théoriques de convergenceet est ensuite complémentée par des simulations numériques. Ceci est illustré au Chapitre 6. Parailleurs le développement d'algorithmes pour l'optimisation en grande dimension repose sur laconnexion entre CMA-ES et la géométrie de l'information (voir Chapitre 4). Un autre exemplede complémentarité est le suivant: les applications abordées notamment pour l'optimisation duplacement de puits de pétrole ont motivé l'introduction de nouvelles variantes de CMA-ES (voirChapitre 9).Par ailleurs, une partie non négligeable de mes travaux porte sur le test (benchmarking)d'algorithmes. La motivation principale est d'améliorer les méthodologies pour tester et comparerles algorithmes d'optimisation numériques. Ces travaux ont été accompagnés du développementd'une plateforme, Comparing COntinuous Optimizers (COCO) et ont un impact maintenant surla mise au point de nouveaux algorithmes mais également sur le test d'hypothèses théoriques

INRIA a CCSD electronic archive server

Analyse Markovienne des Stratégies d'Evolution

Author: Chotard Alexandre
Publication venue: HAL CCSD
Publication date: 24/09/2015
Field of study

In this dissertation an analysis of Evolution Strategies (ESs) using the theory of Markov chains is conducted. Proofs of divergence or convergence of these algorithms are obtained, and tools to achieve such proofs are developed.ESs are so called "black-box" stochastic optimization algorithms, i.e. information on the function to be optimized are limited to the values it associates to points. In particular, gradients are unavailable. Proofs of convergence or divergence of these algorithms can be obtained through the analysis of Markov chains underlying these algorithms. The proofs of log-linear convergence and of divergence obtained in this thesis in the context of a linear function with or without constraint are essential components for the proofs of convergence of ESs on wide classes of functions.This dissertation first gives an introduction to Markov chain theory, then a state of the art on ESs and on black-box continuous optimization, and present already established links between ESs and Markov chains.The contributions of this thesis are then presented:o General mathematical tools that can be applied to a wider range of problems are developed. These tools allow to easily prove specific Markov chain properties (irreducibility, aperiodicity and the fact that compact sets are small sets for the Markov chain) on the Markov chains studied. Obtaining these properties without these tools is a ad hoc, tedious and technical process, that can be of very high difficulty.o Then different ESs are analyzed on different problems. We study a (1,\lambda)-ES using cumulative step-size adaptation on a linear function and prove the log-linear divergence of the step-size; we also study the variation of the logarithm of the step-size, from which we establish a necessary condition for the stability of the algorithm with respect to the dimension of the search space. Then we study an ES with constant step-size and with cumulative step-size adaptation on a linear function with a linear constraint, using resampling to handle unfeasible solutions. We prove that with constant step-size the algorithm diverges, while with cumulative step-size adaptation, depending on parameters of the problem and of the ES, the algorithm converges or diverges log-linearly. We then investigate the dependence of the convergence or divergence rate of the algorithm with parameters of the problem and of the ES. Finally we study an ES with a sampling distribution that can be non-Gaussian and with constant step-size on a linear function with a linear constraint. We give sufficient conditions on the sampling distribution for the algorithm to diverge. We also show that different covariance matrices for the sampling distribution correspond to a change of norm of the search space, and that this implies that adapting the covariance matrix of the sampling distribution may allow an ES with cumulative step-size adaptation to successfully diverge on a linear function with any linear constraint.Finally, these results are summed-up, discussed, and perspectives for future work are explored.Cette thèse contient des preuves de convergence ou de divergence d'algorithmes d'optimisation appelés stratégies d'évolution (ESs), ainsi que le développement d'outils mathématiques permettant ces preuves.Les ESs sont des algorithmes d'optimisation stochastiques dits ``boîte noire'', i.e. où les informations sur la fonction optimisée se réduisent aux valeurs qu'elle associe à des points. En particulier, le gradient de la fonction est inconnu. Des preuves de convergence ou de divergence de ces algorithmes peuvent être obtenues via l'analyse de chaînes de Markov sous-jacentes à ces algorithmes. Les preuves de convergence et de divergence obtenues dans cette thèse permettent d'établir le comportement asymptotique des ESs dans le cadre de l'optimisation d'une fonction linéaire avec ou sans contrainte, qui est un cas clé pour des preuves de convergence d'ESs sur de larges classes de fonctions.Cette thèse présente tout d'abord une introduction aux chaînes de Markov puis un état de l'art sur les ESs et leur contexte parmi les algorithmes d'optimisation continue boîte noire, ainsi que les liens établis entre ESs et chaînes de Markov. Les contributions de cette thèse sont ensuite présentées:o Premièrement des outils mathématiques généraux applicables dans d'autres problèmes sont développés. L'utilisation de ces outils permet d'établir aisément certaines propriétés (à savoir l'irreducibilité, l'apériodicité et le fait que les compacts sont des small sets pour la chaîne de Markov) sur les chaînes de Markov étudiées. Sans ces outils, établir ces propriétés était un processus ad hoc et technique, pouvant se montrer très difficile.o Ensuite différents ESs sont analysés dans différents problèmes. Un (1,\lambda)-ES utilisant cumulative step-size adaptation est étudié dans le cadre de l'optimisation d'une fonction linéaire. Il est démontré que pour \lambda > 2 l'algorithme diverge log-linéairement, optimisant la fonction avec succès. La vitesse de divergence de l'algorithme est donnée explicitement, ce qui peut être utilisé pour calculer une valeur optimale pour \lambda dans le cadre de la fonction linéaire. De plus, la variance du step-size de l'algorithme est calculée, ce qui permet de déduire une condition sur l'adaptation du paramètre de cumulation avec la dimension du problème afin d'obtenir une stabilité de l'algorithme. Ensuite, un (1,\lambda)-ES avec un step-size constant et un (1,\lambda)-ES avec cumulative step-size adaptation sont étudiés dans le cadre de l'optimisation d'une fonction linéaire avec une contrainte linéaire. Avec un step-size constant, l'algorithme résout le problème en divergeant lentement. Sous quelques conditions simples, ce résultat tient aussi lorsque l'algorithme utilise des distributions non Gaussiennes pour générer de nouvelles solutions. En adaptant le step-size avec cumulative step-size adaptation, le succès de l'algorithme dépend de l'angle entre les gradients de la contrainte et de la fonction optimisée. Si celui ci est trop faible, l'algorithme convergence prématurément. Autrement, celui ci diverge log-linéairement.Enfin, les résultats sont résumés, discutés, et des perspectives sur des travaux futurs sont présentées

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

Theses.fr

HAL-Rennes 1

Diffusion Asymptotics for Sequential Experiments

Author: Wager Stefan
Xu Kuang
Publication venue
Publication date: 10/06/2021
Field of study

We propose a new diffusion-asymptotic analysis for sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with

n

time steps, we let the mean reward gaps between actions scale to the order

1/\sqrt{n}

so as to preserve the difficulty of the learning task as

n

grows. In this regime, we show that the behavior of a class of sequentially randomized Markov experiments converges to a diffusion limit, given as the solution of a stochastic differential equation. The diffusion limit thus enables us to derive refined, instance-specific characterization of the stochastic dynamics of adaptive experiments. As an application of this framework, we use the diffusion limit to obtain several new insights on the regret and belief evolution of Thompson sampling. We show that a version of Thompson sampling with an asymptotically uninformative prior variance achieves nearly-optimal instance-specific regret scaling when the reward gaps are relatively large. We also demonstrate that, in this regime, the posterior beliefs underlying Thompson sampling are highly unstable over time

arXiv.org e-Print Archive

Recommended from our members

Exploring Probability Measures with Markov Processes

Author: Power Samuel
Publication venue: University of Cambridge
Publication date: 13/07/2020
Field of study

In many domains where mathematical modelling is applied, a deterministic description of the system at hand is insufficient, and so it is useful to model systems as being in some way stochastic. This is often achieved by modeling the state of the system as being drawn from a probability measure, which is usually given algebraically, i.e. as a formula. While this representation can be useful for deriving certain characteristics of the system, it is by now well-appreciated that many questions about stochastic systems are best-answered by looking at samples from the associated probability measure. In this thesis, we seek to develop and analyse efficient techniques for generating samples from a given probability measure, with a focus on algorithms which simulate a Markov process with the desired invariant measure. The first work presented in this thesis considers the use of Piecewise-Deterministic Markov Processes (PDMPs) for generating samples. In contrast to usual approaches, PDMPs are i) defined as continuous-time processes, and ii) are typically non-reversible with respect to their invariant measure. These distinctions pose computational and theoretical challenges for the design, analysis, and implementation of PDMP-based samplers. The key contribution of this work is to develop a transparent characterisation of how one can construct a PDMP (within the class of trajectorially-reversible processes) which admits the desired invariant measure, and to offer actionable recommendations on how these processes should be designed in practice. The second work presented in this thesis considers the task of sampling from a probability measure on a discrete space. While work in recent years has made it possible to apply sampling algorithms to probability measures with differentiable densities on continuous spaces in a reasonably generic way, samplers on discrete spaces are still largely derived on a case-by-case basis. The contention of this work is that this is not necessary, and that one can in fact define quite generally-applicable algorithms which can sample efficiently from discrete probability measures. The contributions are then to propose a small collection of algorithms for this task, and verify their efficiency empirically. Building on the previous chapter’s work, our samplers are again defined in continuous time and non-reversible, each of which offer noticeable benefits in efficiency. The third work presented in this thesis concerns a theoretical study of a particular class of Markov Chain-based sampling algorithms which make use of parallel computing resources. The Markov Chains which are produced by this algorithm are mathematically equivalent to a standard Metropolis-Hastings chain, but their real-time convergence properties are affected nontrivially by the application of parallelism. The contribution of this work is to analyse the convergence behaviour of these chains, and to use the ‘optimal scaling’ framework (as developed by Roberts, Rosenthal, and others) to make recommendations concerning the tuning of such algorithms in practice. The introductory chapters provide a general overview on the task of generating samples from a probability measure, with particular focus on methods involving Markov processes. There is also an interlude on the relative benefits of i) continuous-time and ii) non-reversible Markov processes for sampling, which are intended to provide additional context for the reading of the first two works.PhD Studentship paid for by Cantab Capital Institute for the Mathematics of Informatio

Apollo (Cambridge)

Control Theory: A Mathematical Perspective on Cyber-Physical Systems

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2015
Field of study

Control theory is an interdisciplinary field that is located at the crossroads of pure and applied mathematics with systems engineering and the sciences. Recently the control field is facing new challenges motivated by application domains that involve networks of systems. Examples are interacting robots, networks of autonomous cars or the smart grid. In order to address the new challenges posed by these application disciplines, the special focus of this workshop has been on the currently very active field of Cyber-Physical Systems, which forms the underlying basis for many network control applications. A series of lectures in this workshop was devoted to give an overview on current theoretical developments in Cyber-Physical Systems, emphasizing in particular the mathematical aspects of the field. Special focus was on the dynamics and control of networks of systems, distributed optimization and formation control, fundamentals of nonlinear interconnected systems, as well as open problems in control

Repositorium für Naturwissenschaften und Technik

Scaling-invariant functions versus positively homogeneous functions

Author: Auger Anne
Gissler Armand
Hansen Nikolaus
Touré Cheikh
Publication venue: Springer Verlag
Publication date: 01/09/2021
Field of study

International audienceScaling-invariant functions preserve the order of points when the points are scaled by the same positive scalar (with respect to a unique reference point). Composites of strictly monotonic functions with positively homogeneous functions are scaling-invariant with respect to zero. We prove in this paper that the reverse is true for large classes of scaling-invariant functions. Specifically, we give necessary and sufficient conditions for scaling-invariant functions to be composites of a strictly monotonic function with a positively homogeneous function. We also study sublevel sets of scaling-invariant functions generalizing well-known properties of positively homogeneous functions

INRIA a CCSD electronic archive server

Constructive Approximation and Learning by Greedy Algorithms

Author: Oglic Dino
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This thesis develops several kernel-based greedy algorithms for different machine learning problems and analyzes their theoretical and empirical properties. Greedy approaches have been extensively used in the past for tackling problems in combinatorial optimization where finding even a feasible solution can be a computationally hard problem (i.e., not solvable in polynomial time). A key feature of greedy algorithms is that a solution is constructed recursively from the smallest constituent parts. In each step of the constructive process a component is added to the partial solution from the previous step and, thus, the size of the optimization problem is reduced. The selected components are given by optimization problems that are simpler and easier to solve than the original problem. As such schemes are typically fast at constructing a solution they can be very effective on complex optimization problems where finding an optimal/good solution has a high computational cost. Moreover, greedy solutions are rather intuitive and the schemes themselves are simple to design and easy to implement. There is a large class of problems for which greedy schemes generate an optimal solution or a good approximation of the optimum. In the first part of the thesis, we develop two deterministic greedy algorithms for optimization problems in which a solution is given by a set of functions mapping an instance space to the space of reals. The first of the two approaches facilitates data understanding through interactive visualization by providing means for experts to incorporate their domain knowledge into otherwise static kernel principal component analysis. This is achieved by greedily constructing embedding directions that maximize the variance at data points (unexplained by the previously constructed embedding directions) while adhering to specified domain knowledge constraints. The second deterministic greedy approach is a supervised feature construction method capable of addressing the problem of kernel choice. The goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity — large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. The approach mimics functional gradient descent and constructs features by fitting squared error residuals. We show that the constructive process is consistent and provide conditions under which it converges to the optimal solution. In the second part of the thesis, we investigate two problems for which deterministic greedy schemes can fail to find an optimal solution or a good approximation of the optimum. This happens as a result of making a sequence of choices which take into account only the immediate reward without considering the consequences onto future decisions. To address this shortcoming of deterministic greedy schemes, we propose two efficient randomized greedy algorithms which are guaranteed to find effective solutions to the corresponding problems. In the first of the two approaches, we provide a mean to scale kernel methods to problems with millions of instances. An approach, frequently used in practice, for this type of problems is the Nyström method for low-rank approximation of kernel matrices. A crucial step in this method is the choice of landmarks which determine the quality of the approximation. We tackle this problem with a randomized greedy algorithm based on the K-means++ cluster seeding scheme and provide a theoretical and empirical study of its effectiveness. In the second problem for which a deterministic strategy can fail to find a good solution, the goal is to find a set of objects from a structured space that are likely to exhibit an unknown target property. This discrete optimization problem is of significant interest to cyclic discovery processes such as de novo drug design. We propose to address it with an adaptive Metropolis–Hastings approach that samples candidates from the posterior distribution of structures conditioned on them having the target property. The proposed constructive scheme defines a consistent random process and our empirical evaluation demonstrates its effectiveness across several different application domains

bonndoc – Der Publikationsserver der Universität Bonn

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

Author: Bach Francis
Dieuleveut Aymeric
Durmus Alain
Publication venue
Publication date: 10/04/2018
Field of study

We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step-size, as well as the lack of convergence in the general (non-quadratic) case. For this analysis, we bring tools from Markov chain theory into the analysis of stochastic gradient. We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum and we show empirical improvements of the new extrapolation scheme

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server