8,557 research outputs found

    A Wide Range No-Regret Theorem

    Get PDF
    In a sequential decision problem at any stage a decision maker, based on the history, takes a decision and receives a payoff which depends also on the realized state of nature. A strategy, f, is said to be as good as an alternative strategy g at a sequence of states, if in the long run f does, on average, at least as well as g does. It is shown that for any distribution, P, over the alternative strategies there is a strategy f which is, at any sequence of states, as good as P-almost any alternative g.No-regret, Approachability, large spaces

    Fighting Bandits with a New Kind of Smoothness

    Full text link
    We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the \emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the Θ(TN)\Theta(\sqrt{TN}) minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O(TNlog⁥N)O(\sqrt{TN \log N}) if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.Comment: In Proceedings of NIPS, 201

    Adversarially Robust Optimization with Gaussian Processes

    Get PDF
    In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail.Comment: Corrected typo

    Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

    Full text link
    In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.Comment: 22 pages, 2 figure

    Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks

    Full text link
    We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a different node, and a limited number of gradient oracle calls is allowed at each node. In this framework, we introduce a convex optimization algorithm based on the stochastic gradient descent (SGD) updates. Particularly, we use a carefully designed time-dependent weighted averaging of the SGD iterates, which yields a convergence rate of O(NNT)O\left(\frac{N\sqrt{N}}{T}\right) after TT gradient updates for each node on a network of NN nodes. We then show that after TT gradient oracle calls, the average SGD iterate achieves a mean square deviation (MSD) of O(NT)O\left(\frac{\sqrt{N}}{T}\right). This rate of convergence is optimal as it matches the performance lower bound up to constant terms. Similar to the SGD algorithm, the computational complexity of the proposed algorithm also scales linearly with the dimensionality of the data. Furthermore, the communication load of the proposed method is the same as the communication load of the SGD algorithm. Thus, the proposed algorithm is highly efficient in terms of complexity and communication load. We illustrate the merits of the algorithm with respect to the state-of-art methods over benchmark real life data sets and widely studied network topologies
    • 

    corecore