52,935 research outputs found

    A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

    Get PDF
    We study the problem of optimizing a function under a \emph{budgeted number of evaluations}. We only assume that the function is \emph{locally} smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} bb of the function evaluation and 2) the local smoothness, dd, of the function. A smaller dd results in smaller optimization error. We come with a new, simple, and parameter-free approach. First, for all values of bb and dd, this approach recovers at least the state-of-the-art regret guarantees. Second, our approach additionally obtains these results while being \textit{agnostic} to the values of both bb and dd. This leads to the first algorithm that naturally adapts to an \textit{unknown} range of noise bb and leads to significant improvements in a moderate and low-noise regime. Third, our approach also obtains a remarkable improvement over the state-of-the-art SOO algorithm when the noise is very low which includes the case of optimization under deterministic feedback (b=0b=0). There, under our minimal local smoothness assumption, this improvement is of exponential magnitude and holds for a class of functions that covers the vast majority of functions that practitioners optimize (d=0d=0). We show that our algorithmic improvement is borne out in experiments as we empirically show faster convergence on common benchmarks

    Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

    Full text link
    Policy optimization methods have shown great promise in solving complex reinforcement and imitation learning tasks. While model-free methods are broadly applicable, they often require many samples to optimize complex policies. Model-based methods greatly improve sample-efficiency but at the cost of poor generalization, requiring a carefully handcrafted model of the system dynamics for each task. Recently, hybrid methods have been successful in trading off applicability for improved sample-complexity. However, these have been limited to continuous action spaces. In this work, we present a new hybrid method based on an approximation of the dynamics as an expectation over the next state under the current policy. This relaxation allows us to derive a novel hybrid policy gradient estimator, combining score function and pathwise derivative estimators, that is applicable to discrete action spaces. We show significant gains in sample complexity, ranging between 1.71.7 and 25×25\times, when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and Hand Mass. Our method is applicable to both discrete and continuous action spaces, when competing pathwise methods are limited to the latter.Comment: In AAAI 2018 proceeding

    Hybridization of multi-objective deterministic particle swarm with derivative-free local searches

    Get PDF
    The paper presents a multi-objective derivative-free and deterministic global/local hybrid algorithm for the efficient and effective solution of simulation-based design optimization (SBDO) problems. The objective is to show how the hybridization of two multi-objective derivative-free global and local algorithms achieves better performance than the separate use of the two algorithms in solving specific SBDO problems for hull-form design. The proposed method belongs to the class of memetic algorithms, where the global exploration capability of multi-objective deterministic particle swarm optimization is enriched by exploiting the local search accuracy of a derivative-free multi-objective line-search method. To the authors best knowledge, studies are still limited on memetic, multi-objective, deterministic, derivative-free, and evolutionary algorithms for an effective and efficient solution of SBDO for hull-form design. The proposed formulation manages global and local searches based on the hypervolume metric. The hybridization scheme uses two parameters to control the local search activation and the number of function calls used by the local algorithm. The most promising values of these parameters were identified using forty analytical tests representative of the SBDO problem of interest. The resulting hybrid algorithm was finally applied to two SBDO problems for hull-form design. For both analytical tests and SBDO problems, the hybrid method achieves better performance than its global and local counterparts

    A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

    Get PDF
    Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a "mirror domain" defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way

    Trajectory-Based Off-Policy Deep Reinforcement Learning

    Full text link
    Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201

    VPE: Variational Policy Embedding for Transfer Reinforcement Learning

    Full text link
    Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffers from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments. We consider this as a problem of transferring knowledge within a family of similar Markov decision processes. For this purpose we assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative mapping and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task
    • …
    corecore