Search CORE

145 research outputs found

Exploiting higher order smoothness in derivative-free optimization and continuous bandits

Author: Akhavan A
Pontil M
Tsybakov AB
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 06/12/2020
Field of study

We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. Based on this algorithm, we also propose an estimator of the minimum value of the function achieving almost sharp oracle behavior. We compare our results with the state-of-the-art, highlighting a number of key improvements

UCL Discovery

Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits

Author: Akhavan Arya
Pontil Massimiliano
Tsybakov Alexandre B.
Publication venue
Publication date: 14/06/2020
Field of study

arXiv.org e-Print Archive

Small Errors in Random Zeroth Order Optimization are Imaginary

Author: Jongeneel Wouter
Kuhn Daniel
Yue Man-Chung
Publication venue
Publication date: 11/03/2021
Field of study

The vast majority of zeroth order optimization methods try to imitate first order methods via some smooth approximation of the gradient. Here, the smaller the smoothing parameter, the smaller the gradient approximation error. We show that for the majority of zeroth order methods this smoothing parameter can however not be chosen arbitrarily small as numerical cancellation errors will dominate. As such, theoretical and numerical performance could differ significantly. Using classical tools from numerical differentiation we will propose a new smoothed approximation of the gradient that can be integrated into general zeroth order algorithmic frameworks. Since the proposed smoothed approximation does not suffer from cancellation errors, the smoothing parameter (and hence the approximation error) can be made arbitrarily small. Sublinear convergence rates for algorithms based on our smoothed approximation are proved. Numerical experiments are also presented to demonstrate the superiority of algorithms based on the proposed approximation.Comment: New: Figure 3.

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Distributed Zero-Order Optimization under Adversarial Noise

Author: Akhavan A
Pontil M
Tsybakov AB
Publication venue
Publication date: 01/01/2021
Field of study

We study the problem of distributed zero-order optimization for a class of strongly convex functions. They are formed by the average of local objectives, associated to different nodes in a prescribed network. We propose a distributed zero-order projected gradient descent algorithm to solve the problem. Exchange of information within the network is permitted only between neighbouring nodes. An important feature of our procedure is that it can query only function values, subject to a general noise model, that does not require zero mean or independent errors. We derive upper bounds for the average cumulative regret and optimization error of the algorithm which highlight the role played by a network connectivity parameter, the number of variables, the noise level, the strong convexity parameter, and smoothness properties of the local objectives. The bounds indicate some key improvements of our method over the state-of-the-art, both in the distributed and standard zero-order optimization settings. We also comment on lower bounds and observe that the dependency over certain function parameters in the bound is nearly optimal

UCL Discovery