52,935 research outputs found
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption
We study the problem of optimizing a function under a \emph{budgeted number
of evaluations}. We only assume that the function is \emph{locally} smooth
around one of its global optima. The difficulty of optimization is measured in
terms of 1) the amount of \emph{noise} of the function evaluation and 2)
the local smoothness, , of the function. A smaller results in smaller
optimization error. We come with a new, simple, and parameter-free approach.
First, for all values of and , this approach recovers at least the
state-of-the-art regret guarantees. Second, our approach additionally obtains
these results while being \textit{agnostic} to the values of both and .
This leads to the first algorithm that naturally adapts to an \textit{unknown}
range of noise and leads to significant improvements in a moderate and
low-noise regime. Third, our approach also obtains a remarkable improvement
over the state-of-the-art SOO algorithm when the noise is very low which
includes the case of optimization under deterministic feedback (). There,
under our minimal local smoothness assumption, this improvement is of
exponential magnitude and holds for a class of functions that covers the vast
majority of functions that practitioners optimize (). We show that our
algorithmic improvement is borne out in experiments as we empirically show
faster convergence on common benchmarks
Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces
Policy optimization methods have shown great promise in solving complex
reinforcement and imitation learning tasks. While model-free methods are
broadly applicable, they often require many samples to optimize complex
policies. Model-based methods greatly improve sample-efficiency but at the cost
of poor generalization, requiring a carefully handcrafted model of the system
dynamics for each task. Recently, hybrid methods have been successful in
trading off applicability for improved sample-complexity. However, these have
been limited to continuous action spaces. In this work, we present a new hybrid
method based on an approximation of the dynamics as an expectation over the
next state under the current policy. This relaxation allows us to derive a
novel hybrid policy gradient estimator, combining score function and pathwise
derivative estimators, that is applicable to discrete action spaces. We show
significant gains in sample complexity, ranging between and ,
when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and
Hand Mass. Our method is applicable to both discrete and continuous action
spaces, when competing pathwise methods are limited to the latter.Comment: In AAAI 2018 proceeding
Hybridization of multi-objective deterministic particle swarm with derivative-free local searches
The paper presents a multi-objective derivative-free and deterministic global/local hybrid algorithm for the efficient and effective solution of simulation-based design optimization (SBDO) problems. The objective is to show how the hybridization of two multi-objective derivative-free global and local algorithms achieves better performance than the separate use of the two algorithms in solving specific SBDO problems for hull-form design. The proposed method belongs to the class of memetic algorithms, where the global exploration capability of multi-objective deterministic particle swarm optimization is enriched by exploiting the local search accuracy of a derivative-free multi-objective line-search method. To the authors best knowledge, studies are still limited on memetic, multi-objective, deterministic, derivative-free, and evolutionary algorithms for an effective and efficient solution of SBDO for hull-form design. The proposed formulation manages global and local searches based on the hypervolume metric. The hybridization scheme uses two parameters to control the local search activation and the number of function calls used by the local algorithm. The most promising values of these parameters were identified using forty analytical tests representative of the SBDO problem of interest. The resulting hybrid algorithm was finally applied to two SBDO problems for hull-form design. For both analytical tests and SBDO problems, the hybrid method achieves better performance than its global and local counterparts
A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality
Stochastic mirror descent (SMD) is a fairly new family of algorithms that has
recently found a wide range of applications in optimization, machine learning,
and control. It can be considered a generalization of the classical stochastic
gradient algorithm (SGD), where instead of updating the weight vector along the
negative direction of the stochastic gradient, the update is performed in a
"mirror domain" defined by the gradient of a (strictly convex) potential
function. This potential function, and the mirror domain it yields, provides
considerable flexibility in the algorithm compared to SGD. While many
properties of SMD have already been obtained in the literature, in this paper
we exhibit a new interpretation of SMD, namely that it is a risk-sensitive
optimal estimator when the unknown weight vector and additive noise are
non-Gaussian and belong to the exponential family of distributions. The
analysis also suggests a modified version of SMD, which we refer to as
symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman
divergence, which allow us to extend results from quadratics and Gaussians to
certain convex functions and exponential families in a rather seamless way
Trajectory-Based Off-Policy Deep Reinforcement Learning
Policy gradient methods are powerful reinforcement learning algorithms and
have been demonstrated to solve many complex tasks. However, these methods are
also data-inefficient, afflicted with high variance gradient estimates, and
frequently get stuck in local optima. This work addresses these weaknesses by
combining recent improvements in the reuse of off-policy data and exploration
in parameter space with deterministic behavioral policies. The resulting
objective is amenable to standard neural network optimization strategies like
stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo.
Incorporation of previous rollouts via importance sampling greatly improves
data-efficiency, whilst stochastic optimization schemes facilitate the escape
from local optima. We evaluate the proposed approach on a series of continuous
control benchmark tasks. The results show that the proposed algorithm is able
to successfully and reliably learn solutions using fewer system interactions
than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
VPE: Variational Policy Embedding for Transfer Reinforcement Learning
Reinforcement Learning methods are capable of solving complex problems, but
resulting policies might perform poorly in environments that are even slightly
different. In robotics especially, training and deployment conditions often
vary and data collection is expensive, making retraining undesirable.
Simulation training allows for feasible training times, but on the other hand
suffers from a reality-gap when applied in real-world settings. This raises the
need of efficient adaptation of policies acting in new environments. We
consider this as a problem of transferring knowledge within a family of similar
Markov decision processes.
For this purpose we assume that Q-functions are generated by some
low-dimensional latent variable. Given such a Q-function, we can find a master
policy that can adapt given different values of this latent variable. Our
method learns both the generative mapping and an approximate posterior of the
latent variables, enabling identification of policies for new tasks by
searching only in the latent space, rather than the space of all policies. The
low-dimensional space, and master policy found by our method enables policies
to quickly adapt to new environments. We demonstrate the method on both a
pendulum swing-up task in simulation, and for simulation-to-real transfer on a
pushing task
- …