Search CORE

19,065 research outputs found

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

Author: Ding Dongsheng
Jovanović Mihailo R.
Wang Zhaoran
Wei Xiaohan
Yang Zhuoran
Publication venue
Publication date: 31/05/2023
Field of study

We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate

O((|X|+|Y|) L \sqrt{T(|A|+|B|)}))

for both regret and constraint violation after playing

T

episodes of the game. Here,

L

is the horizon of each episode,

(|X|,|A|)

and

(|Y|,|B|)

are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.Comment: 59 pages, a full version of the main paper in the 5th Annual Conference on Learning for Dynamics and Contro

arXiv.org e-Print Archive

Dual Averaging Method for Online Graph-structured Sparsity

Author: Bahmani Sohail
Bottou Léon
Chen Feng
Chen Lin
Duchi John
Duchi John
Gao Xiand
Hegde Chinmay
Johnson David S
Kingma Diederik P
Langford John
Qian Jing
Xiao Lin
Zhou Pan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/05/2019
Field of study

Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.Comment: 11 pages, 14 figure

arXiv.org e-Print Archive

Crossref

Scipedia

Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging

Author: Juditsky Anatoli
Nazin Alexander
Tsybakov Alexandre
Vayatis Nicolas
Publication venue
Publication date: 07/03/2006
Field of study

We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order

\sqrt{(\log M)/t}

with an explicit and small constant factor, where

M

is the dimension of the problem and

t

stands for the sample size. A similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss.Comment: 29 pages; mai 200

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Hal-Diderot