145 research outputs found
Exploiting higher order smoothness in derivative-free optimization and continuous bandits
We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. Based on this algorithm, we also propose an estimator of the minimum value of the function achieving almost sharp oracle behavior. We compare our results with the state-of-the-art, highlighting a number of key improvements
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits
We study the problem of zero-order optimization of a strongly convex
function. The goal is to find the minimizer of the function by a sequential
exploration of its values, under measurement noise. We study the impact of
higher order smoothness properties of the function on the optimization error
and on the cumulative regret. To solve this problem we consider a randomized
approximation of the projected gradient descent algorithm. The gradient is
estimated by a randomized procedure involving two function evaluations and a
smoothing kernel. We derive upper bounds for this algorithm both in the
constrained and unconstrained settings and prove minimax lower bounds for any
sequential search method. Our results imply that the zero-order algorithm is
nearly optimal in terms of sample complexity and the problem parameters. Based
on this algorithm, we also propose an estimator of the minimum value of the
function achieving almost sharp oracle behavior. We compare our results with
the state-of-the-art, highlighting a number of key improvements
Small Errors in Random Zeroth Order Optimization are Imaginary
The vast majority of zeroth order optimization methods try to imitate first
order methods via some smooth approximation of the gradient. Here, the smaller
the smoothing parameter, the smaller the gradient approximation error. We show
that for the majority of zeroth order methods this smoothing parameter can
however not be chosen arbitrarily small as numerical cancellation errors will
dominate. As such, theoretical and numerical performance could differ
significantly. Using classical tools from numerical differentiation we will
propose a new smoothed approximation of the gradient that can be integrated
into general zeroth order algorithmic frameworks. Since the proposed smoothed
approximation does not suffer from cancellation errors, the smoothing parameter
(and hence the approximation error) can be made arbitrarily small. Sublinear
convergence rates for algorithms based on our smoothed approximation are
proved. Numerical experiments are also presented to demonstrate the superiority
of algorithms based on the proposed approximation.Comment: New: Figure 3.
Distributed Zero-Order Optimization under Adversarial Noise
We study the problem of distributed zero-order optimization for a class of strongly convex functions. They are formed by the average of local objectives, associated to different nodes in a prescribed network. We propose a distributed zero-order projected gradient descent algorithm to solve the problem. Exchange of information within the network is permitted only between neighbouring nodes. An important feature of our procedure is that it can query only function values, subject to a general noise model, that does not require zero mean or independent errors. We derive upper bounds for the average cumulative regret and optimization error of the algorithm which highlight the role played by a network connectivity parameter, the number of variables, the noise level, the strong convexity parameter, and smoothness properties of the local objectives. The bounds indicate some key improvements of our method over the state-of-the-art, both in the distributed and standard zero-order optimization settings. We also comment on lower bounds and observe that the dependency over certain function parameters in the bound is nearly optimal
- …