19,065 research outputs found
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning
We examine online safe multi-agent reinforcement learning using constrained
Markov games in which agents compete by maximizing their expected total rewards
under a constraint on expected total utilities. Our focus is confined to an
episodic two-player zero-sum constrained Markov game with independent
transition functions that are unknown to agents, adversarial reward functions,
and stochastic utility functions. For such a Markov game, we employ an approach
based on the occupancy measure to formulate it as an online constrained
saddle-point problem with an explicit constraint. We extend the Lagrange
multiplier method in constrained optimization to handle the constraint by
creating a generalized Lagrangian with minimax decision primal variables and a
dual variable. Next, we develop an upper confidence reinforcement learning
algorithm to solve this Lagrangian problem while balancing exploration and
exploitation. Our algorithm updates the minimax decision primal variables via
online mirror descent and the dual variable via projected gradient step and we
prove that it enjoys sublinear rate for
both regret and constraint violation after playing episodes of the game.
Here, is the horizon of each episode, and are the
state/action space sizes of the min-player and the max-player, respectively. To
the best of our knowledge, we provide the first provably efficient online safe
reinforcement learning algorithm in constrained Markov games.Comment: 59 pages, a full version of the main paper in the 5th Annual
Conference on Learning for Dynamics and Contro
Dual Averaging Method for Online Graph-structured Sparsity
Online learning algorithms update models via one sample per iteration, thus
efficient to process large-scale datasets and useful to detect malicious events
for social benefits, such as disease outbreak and traffic congestion on the
fly. However, existing algorithms for graph-structured models focused on the
offline setting and the least square loss, incapable for online setting, while
methods designed for online setting cannot be directly applied to the problem
of complex (usually non-convex) graph-structured sparsity model. To address
these limitations, in this paper we propose a new algorithm for
graph-structured sparsity constraint problems under online setting, which we
call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both
averaging gradient (in dual space) and primal variables (in primal space) onto
lower dimensional subspaces, thus capturing the graph-structured sparsity
effectively. Furthermore, the objective functions assumed here are generally
convex so as to handle different losses for online learning settings. To the
best of our knowledge, \textsc{GraphDA} is the first online learning algorithm
for graph-structure constrained optimization problems. To validate our method,
we conduct extensive experiments on both benchmark graph and real-world graph
datasets. Our experiment results show that, compared to other baseline methods,
\textsc{GraphDA} not only improves classification performance, but also
successfully captures graph-structured features more effectively, hence
stronger interpretability.Comment: 11 pages, 14 figure
Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging
We consider a recursive algorithm to construct an aggregated estimator from a
finite number of base decision rules in the classification problem. The
estimator approximately minimizes a convex risk functional under the
l1-constraint. It is defined by a stochastic version of the mirror descent
algorithm (i.e., of the method which performs gradient descent in the dual
space) with an additional averaging. The main result of the paper is an upper
bound for the expected accuracy of the proposed estimator. This bound is of the
order with an explicit and small constant factor, where
is the dimension of the problem and stands for the sample size. A similar
bound is proved for a more general setting that covers, in particular, the
regression model with squared loss.Comment: 29 pages; mai 200
- …