11 research outputs found
Ускоренный спуск по случайному направлению с неевклидовой прокс-структурой
International audienceРассматриваются задачи гладкой выпуклой оптимизации,. для численногорешения которых полный градиент недоступен. В 2011 г. Ю.Е. Нестеровымбыли предложены ускоренные безградиентные методы решения таких задач.Поскольку рассматривались только задачи безусловной оптимизации, то ис-пользовалась евклидова прокс-структура. Однако если заранее знать, например,что решение задачи разреженно, а точнее, что расстояние от точки старта дорешения в 1-норме и в 2-норме близки, то более выгодно выбирать не евклидовупрокс-структуру, связанную с 2-нормой, а прокс-структуру, связанную с 1-нормой. Полное обоснование этого утверждения проводится в статье. Предла-гается ускоренный метод спуска по случайному направлению с неевклидовойпрокс-структурой для решения задачи безусловной оптимизации (в дальней-шем подход предполагается расширить на ускоренный безградиентный метод).Получены оценки скорости сходимости метода. Показаны сложности переносаописанного подхода на задачи условной оптимизации
Gradient-Free Methods for Saddle-Point Problem
In the paper, we generalize the approach Gasnikov et. al, 2017, which allows
to solve (stochastic) convex optimization problems with an inexact
gradient-free oracle, to the convex-concave saddle-point problem. The proposed
approach works, at least, like the best existing approaches. But for a special
set-up (simplex type constraints and closeness of Lipschitz constants in 1 and
2 norms) our approach reduces times the required number of
oracle calls (function calculations). Our method uses a stochastic
approximation of the gradient via finite differences. In this case, the
function must be specified not only on the optimization set itself, but in a
certain neighbourhood of it. In the second part of the paper, we analyze the
case when such an assumption cannot be made, we propose a general approach on
how to modernize the method to solve this problem, and also we apply this
approach to particular cases of some classical sets
Cooperative Multi-Agent Reinforcement Learning with Partial Observations
In this paper, we propose a distributed zeroth-order policy optimization
method for Multi-Agent Reinforcement Learning (MARL). Existing MARL algorithms
often assume that every agent can observe the states and actions of all the
other agents in the network. This can be impractical in large-scale problems,
where sharing the state and action information with multi-hop neighbors may
incur significant communication overhead. The advantage of the proposed
zeroth-order policy optimization method is that it allows the agents to compute
the local policy gradients needed to update their local policy functions using
local estimates of the global accumulated rewards that depend on partial state
and action information only and can be obtained using consensus. Specifically,
to calculate the local policy gradients, we develop a new distributed
zeroth-order policy gradient estimator that relies on one-point
residual-feedback which, compared to existing zeroth-order estimators that also
rely on one-point feedback, significantly reduces the variance of the policy
gradient estimates improving, in this way, the learning performance. We show
that the proposed distributed zeroth-order policy optimization method with
constant stepsize converges to a neighborhood of the global optimal policy that
depends on the number of consensus steps used to calculate the local estimates
of the global accumulated rewards. Moreover, we provide numerical experiments
that demonstrate that our new zeroth-order policy gradient estimator is more
sample-efficient compared to other existing one-point estimators
Small Errors in Random Zeroth Order Optimization are Imaginary
The vast majority of zeroth order optimization methods try to imitate first
order methods via some smooth approximation of the gradient. Here, the smaller
the smoothing parameter, the smaller the gradient approximation error. We show
that for the majority of zeroth order methods this smoothing parameter can
however not be chosen arbitrarily small as numerical cancellation errors will
dominate. As such, theoretical and numerical performance could differ
significantly. Using classical tools from numerical differentiation we will
propose a new smoothed approximation of the gradient that can be integrated
into general zeroth order algorithmic frameworks. Since the proposed smoothed
approximation does not suffer from cancellation errors, the smoothing parameter
(and hence the approximation error) can be made arbitrarily small. Sublinear
convergence rates for algorithms based on our smoothed approximation are
proved. Numerical experiments are also presented to demonstrate the superiority
of algorithms based on the proposed approximation.Comment: New: Figure 3.