Search CORE

11 research outputs found

Ускоренный спуск по случайному направлению с неевклидовой прокс-структурой

Author: Gasnikov Alexander
Gorbunov Eduard
Vorontsova Evgeniya
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 11/02/2019
Field of study

International audienceРассматриваются задачи гладкой выпуклой оптимизации,. для численногорешения которых полный градиент недоступен. В 2011 г. Ю.Е. Нестеровымбыли предложены ускоренные безградиентные методы решения таких задач.Поскольку рассматривались только задачи безусловной оптимизации, то ис-пользовалась евклидова прокс-структура. Однако если заранее знать, например,что решение задачи разреженно, а точнее, что расстояние от точки старта дорешения в 1-норме и в 2-норме близки, то более выгодно выбирать не евклидовупрокс-структуру, связанную с 2-нормой, а прокс-структуру, связанную с 1-нормой. Полное обоснование этого утверждения проводится в статье. Предла-гается ускоренный метод спуска по случайному направлению с неевклидовойпрокс-структурой для решения задачи безусловной оптимизации (в дальней-шем подход предполагается расширить на ускоренный безградиентный метод).Получены оценки скорости сходимости метода. Показаны сложности переносаописанного подхода на задачи условной оптимизации

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Gradient-Free Methods for Saddle-Point Problem

Author: AV Gasnikov
AV Gasnikov
EA Vorontsova
O Shamir
RS Sutton
Y Nesterov
Publication venue
Publication date: 09/09/2020
Field of study

In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces

\frac{n}{\log n}

times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases of some classical sets

arXiv.org e-Print Archive

Crossref

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

Author: Zavlanos Michael M.
Zhang Yan
Publication venue
Publication date: 18/06/2020
Field of study

In this paper, we propose a distributed zeroth-order policy optimization method for Multi-Agent Reinforcement Learning (MARL). Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multi-hop neighbors may incur significant communication overhead. The advantage of the proposed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, to calculate the local policy gradients, we develop a new distributed zeroth-order policy gradient estimator that relies on one-point residual-feedback which, compared to existing zeroth-order estimators that also rely on one-point feedback, significantly reduces the variance of the policy gradient estimates improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to a neighborhood of the global optimal policy that depends on the number of consensus steps used to calculate the local estimates of the global accumulated rewards. Moreover, we provide numerical experiments that demonstrate that our new zeroth-order policy gradient estimator is more sample-efficient compared to other existing one-point estimators

arXiv.org e-Print Archive

Small Errors in Random Zeroth Order Optimization are Imaginary

Author: Jongeneel Wouter
Kuhn Daniel
Yue Man-Chung
Publication venue
Publication date: 11/03/2021
Field of study

The vast majority of zeroth order optimization methods try to imitate first order methods via some smooth approximation of the gradient. Here, the smaller the smoothing parameter, the smaller the gradient approximation error. We show that for the majority of zeroth order methods this smoothing parameter can however not be chosen arbitrarily small as numerical cancellation errors will dominate. As such, theoretical and numerical performance could differ significantly. Using classical tools from numerical differentiation we will propose a new smoothed approximation of the gradient that can be integrated into general zeroth order algorithmic frameworks. Since the proposed smoothed approximation does not suffer from cancellation errors, the smoothing parameter (and hence the approximation error) can be made arbitrarily small. Sublinear convergence rates for algorithms based on our smoothed approximation are proved. Numerical experiments are also presented to demonstrate the superiority of algorithms based on the proposed approximation.Comment: New: Figure 3.

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne