753 research outputs found
Closed Form for Some Gaussian Convolutions
The convolution of a function with an isotropic Gaussian appears in many
contexts such as differential equations, computer vision, signal processing,
and numerical optimization. Although this convolution does not always have a
closed form expression, there are important family of functions for which
closed form exists. This article investigates some of such cases
Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization
In this paper, we consider derivative free optimization problems, where the
objective function is smooth but is computed with some amount of noise, the
function evaluations are expensive and no derivative information is available.
We are motivated by policy optimization problems in reinforcement learning that
have recently become popular [Choromaski et al. 2018; Fazel et al. 2018;
Salimans et al. 2016], and that can be formulated as derivative free
optimization problems with the aforementioned characteristics. In each of these
works some approximation of the gradient is constructed and a (stochastic)
gradient method is applied. In [Salimans et al. 2016] the gradient information
is aggregated along Gaussian directions, while in [Choromaski et al. 2018] it
is computed along orthogonal direction. We provide a convergence rate analysis
for a first-order line search method, similar to the ones used in the
literature, and derive the conditions on the gradient approximations that
ensure this convergence. We then demonstrate via rigorous analysis of the
variance and by numerical comparisons on reinforcement learning tasks that the
Gaussian sampling method used in [Salimans et al. 2016] is significantly
inferior to the orthogonal sampling used in [Choromaski et al. 2018] as well as
more general interpolation methods.Comment: 14 pages, 2 figures. arXiv admin note: text overlap with
arXiv:1905.0133
AdaDGS: An adaptive black-box optimization method with a nonlocal directional Gaussian smoothing gradient
The local gradient points to the direction of the steepest slope in an
infinitesimal neighborhood. An optimizer guided by the local gradient is often
trapped in local optima when the loss landscape is multi-modal. A directional
Gaussian smoothing (DGS) approach was recently proposed in (Zhang et al., 2020)
and used to define a truly nonlocal gradient, referred to as the DGS gradient,
for high-dimensional black-box optimization. Promising results show that
replacing the traditional local gradient with the DGS gradient can
significantly improve the performance of gradient-based methods in optimizing
highly multi-modal loss functions. However, the optimal performance of the DGS
gradient may rely on fine tuning of two important hyper-parameters, i.e., the
smoothing radius and the learning rate. In this paper, we present a simple, yet
ingenious and efficient adaptive approach for optimization with the DGS
gradient, which removes the need of hyper-parameter fine tuning. Since the DGS
gradient generally points to a good search direction, we perform a line search
along the DGS direction to determine the step size at each iteration. The
learned step size in turn will inform us of the scale of function landscape in
the surrounding area, based on which we adjust the smoothing radius accordingly
for the next iteration. We present experimental results on high-dimensional
benchmark functions, an airfoil design problem and a game content generation
problem. The AdaDGS method has shown superior performance over several the
state-of-the-art black-box optimization methods.Comment: 14 page
Derivative-free optimization methods
In many optimization problems arising from scientific, engineering and
artificial intelligence applications, objective and constraint functions are
available only as the output of a black-box or simulation oracle that does not
provide derivative information. Such settings necessitate the use of methods
for derivative-free, or zeroth-order, optimization. We provide a review and
perspectives on developments in these methods, with an emphasis on highlighting
recent developments and on unifying treatment of such problems in the
non-linear optimization and machine learning literature. We categorize methods
based on assumed properties of the black-box functions, as well as features of
the methods. We first overview the primary setting of deterministic methods
applied to unconstrained, non-convex optimization problems where the objective
function is defined by a deterministic black-box oracle. We then discuss
developments in randomized methods, methods that assume some additional
structure about the objective (including convexity, separability and general
non-smooth compositions), methods for problems where the output of the
black-box oracle is stochastic, and methods for handling different types of
constraints
A Novel Evolution Strategy with Directional Gaussian Smoothing for Blackbox Optimization
We propose an improved evolution strategy (ES) using a novel nonlocal
gradient operator for high-dimensional black-box optimization. Standard ES
methods with -dimensional Gaussian smoothing suffer from the curse of
dimensionality due to the high variance of Monte Carlo (MC) based gradient
estimators. To control the variance, Gaussian smoothing is usually limited in a
small region, so existing ES methods lack nonlocal exploration ability required
for escaping from local minima. We develop a nonlocal gradient operator with
directional Gaussian smoothing (DGS) to address this challenge. The DGS
conducts 1D nonlocal explorations along orthogonal directions in
, each of which defines a nonlocal directional derivative as a 1D
integral. We then use Gauss-Hermite quadrature, instead of MC sampling, to
estimate the 1D integrals to ensure high accuracy (i.e., small variance).
Our method enables effective nonlocal exploration to facilitate the global
search in high-dimensional optimization. We demonstrate the superior
performance of our method in three sets of examples, including benchmark
functions for global optimization, and real-world science and engineering
applications
An adaptive stochastic gradient-free approach for high-dimensional blackbox optimization
In this work, we propose a novel adaptive stochastic gradient-free (ASGF)
approach for solving high-dimensional nonconvex optimization problems based on
function evaluations. We employ a directional Gaussian smoothing of the target
function that generates a surrogate of the gradient and assists in avoiding bad
local optima by utilizing nonlocal information of the loss landscape. Applying
a deterministic quadrature scheme results in a massively scalable technique
that is sample-efficient and achieves spectral accuracy. At each step we
randomly generate the search directions while primarily following the surrogate
of the smoothed gradient. This enables exploitation of the gradient direction
while maintaining sufficient space exploration, and accelerates convergence
towards the global extrema. In addition, we make use of a local approximation
of the Lipschitz constant in order to adaptively adjust the values of all
hyperparameters, thus removing the careful fine-tuning of current algorithms
that is often necessary to be successful when applied to a large class of
learning tasks. As such, the ASGF strategy offers significant improvements when
solving high-dimensional nonconvex optimization problems when compared to other
gradient-free methods (including the so called "evolutionary strategies") as
well as iterative approaches that rely on the gradient information of the
objective function. We illustrate the improved performance of this method by
providing several comparative numerical studies on benchmark global
optimization problems and reinforcement learning tasks
Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods
This paper presents a finite difference quasi-Newton method for the
minimization of noisy functions. The method takes advantage of the scalability
and power of BFGS updating, and employs an adaptive procedure for choosing the
differencing interval based on the noise estimation techniques of Hamming
(2012) and Mor\'e and Wild (2011). This noise estimation procedure and the
selection of are inexpensive but not always accurate, and to prevent
failures the algorithm incorporates a recovery mechanism that takes appropriate
action in the case when the line search procedure is unable to produce an
acceptable point. A novel convergence analysis is presented that considers the
effect of a noisy line search procedure. Numerical experiments comparing the
method to a function interpolating trust region method are presented.Comment: 26 pages, 9 figure
Hessian Estimation via Stein's Identity in Black-Box Problems
When the available information is noisy zeroth-order (ZO) oracle, stochastic
approximation methods are popular for estimating the root of the multivariate
gradient equation. Inspired by the Stein's identity, this work establishes a
novel Hessian approximation scheme. We compare it alongside with second-order
simultaneous perturbation stochastic approximation (2SPSA). On the basis of the
almost sure convergence and the same convergence rate, 2SPSA requires four ZO
queries, while ours requires three ZO queries. Moreover, 2SPSA requires two
statistically independent perturbations and two differencing stepsizes, while
ours requires generating one perturbation vector only and tuning one
differencing stepsize only. Besides, the weighting mechanism for the Hessian
estimate is generalized and the smoothness restriction on the loss function is
relaxed compared to 2SPSA. Finally, we present numerical support for the
reduced per-iteration ZO query complexity.Comment: 17 pages, 2 figures, accepted by Mathematical and Scientific Machine
Learning 2021, to appear in Proceedings of Machine Learning Researc
Recent Theoretical Advances in Non-Convex Optimization
Motivated by recent increased interest in optimization algorithms for
non-convex optimization in application to training deep neural networks and
other optimization problems in data analysis, we give an overview of recent
theoretical results on global performance guarantees of optimization algorithms
for non-convex optimization. We start with classical arguments showing that
general non-convex problems could not be solved efficiently in a reasonable
time. Then we give a list of problems that can be solved efficiently to find
the global minimizer by exploiting the structure of the problem as much as it
is possible. Another way to deal with non-convexity is to relax the goal from
finding the global minimum to finding a stationary point or a local minimum.
For this setting, we first present known results for the convergence rates of
deterministic first-order methods, which are then followed by a general
theoretical analysis of optimal stochastic and randomized gradient schemes, and
an overview of the stochastic first-order methods. After that, we discuss quite
general classes of non-convex problems, such as minimization of
-weakly-quasi-convex functions and functions that satisfy
Polyak--Lojasiewicz condition, which still allow obtaining theoretical
convergence guarantees of first-order methods. Then we consider higher-order
and zeroth-order/derivative-free methods and their convergence rates for
non-convex optimization problems.Comment: 81 page
A stochastic subspace approach to gradient-free optimization in high dimensions
We present a stochastic descent algorithm for unconstrained optimization that
is particularly efficient when the objective function is slow to evaluate and
gradients are not easily obtained, as in some PDE-constrained optimization and
machine learning problems. The algorithm maps the gradient onto a
low-dimensional random subspace of dimension at each iteration, similar
to coordinate descent but without restricting directional derivatives to be
along the axes. Without requiring a full gradient, this mapping can be
performed by computing directional derivatives (e.g., via forward-mode
automatic differentiation). We give proofs for convergence in expectation under
various convexity assumptions as well as probabilistic convergence results
under strong-convexity. Our method extends the well-known Gaussian smoothing
technique to descent in subspaces of dimension greater than one, opening the
doors to new analysis of Gaussian smoothing when more than one directional
derivative is used at each iteration. We also provide a finite-dimensional
variant of a special case of the Johnson-Lindenstrauss lemma. Experimentally,
we show that our method compares favorably to coordinate descent, Gaussian
smoothing, gradient descent and BFGS (when gradients are calculated via
forward-mode automatic differentiation) on problems from the machine learning
and shape optimization literature.Comment: arXiv admin note: substantial text overlap with arXiv:1904.0114
- …