Search CORE

148 research outputs found

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

Author: Aggarwal Vaneet
Bai Qinbo
Gattami Ather
Publication venue
Publication date: 30/01/2021
Field of study

In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak constraints, where the agent chooses the policy to maximize the long-term average reward as well as satisfies the constraints at each time. We propose a model-free algorithm that converts CMDP problem to an unconstrained problem and a Q-learning based approach is used. We extend the concept of probably approximately correct (PAC) to define a criterion of

\epsilon

-optimal policy. The proposed algorithm is proved to achieve an

\epsilon

-optimal policy with probability at least

1-p

when the episode

K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})

, where

S

and

A

is the number of states and actions, respectively,

H

is the number of steps per episode,

I

is the number of constraint functions, and

\ell=\log(\frac{SAT}{p})

. We note that this is the first result on PAC kind of analysis for CMDP with peak constraints, where the transition probabilities are not known apriori. We demonstrate the proposed algorithm on an energy harvesting problem where it outperforms state-of-the-art and performs close to the theoretical upper bound of the studied optimization problem

arXiv.org e-Print Archive

Sample-based Search Methods for Bayes-Adaptive Planning

Author: Guez AG
Publication venue: UCL (University College London)
Publication date: 28/02/2015
Field of study

A fundamental issue for control is acting in the face of uncertainty about the environment. Amongst other things, this induces a trade-off between exploration and exploitation. A model-based Bayesian agent optimizes its return by maintaining a posterior distribution over possible environments, and considering all possible future paths. This optimization is equivalent to solving a Markov Decision Process (MDP) whose hyperstate comprises the agent's beliefs about the environment, as well as its current state in that environment. This corresponding process is called a Bayes-Adaptive MDP (BAMDP). Even for MDPs with only a few states, it is generally intractable to solve the corresponding BAMDP exactly. Various heuristics have been devised, but those that are computationally tractable often perform indifferently, whereas those that perform well are typically so expensive as to be applicable only in small domains with limited structure. Here, we develop new tractable methods for planning in BAMDPs based on recent advances in the solution to large MDPs and general partially observable MDPs. Our algorithms are sample-based, plan online in a way that is focused on the current belief, and, critically, avoid expensive belief updates during simulations. In discrete domains, we use Monte-Carlo tree search to search forward in an aggressive manner. The derived algorithm can scale to large MDPs and provably converges to the Bayes-optimal solution asymptotically. We then consider a more general class of simulation-based methods in which approximation methods can be employed to allow value function estimates to generalize between hyperstates during search. This allows us to tackle continuous domains. We validate our approach empirically in standard domains by comparison with existing approximations. Finally, we explore Bayes-adaptive planning in environments that are modelled by rich, non-parametric probabilistic models. We demonstrate that a fully Bayesian agent can be advantageous in the exploration of complex and even infinite, structured domains

UCL Discovery

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

Author: Brochu Eric
Cora Vlad M.
de Freitas Nando
Publication venue
Publication date: 01/01/2009
Field of study

We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Provably Learning Nash Policies in Constrained Markov Potential Games

Author: Alatur Pragnya
He Niao
Krause Andreas
Ramponi Giorgia
Publication venue
Publication date: 13/06/2023
Field of study

Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collisions (safety). Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable. In this work, we introduce and study Constrained Markov Potential Games (CMPGs), an important class of CMGs. We first show that a Nash policy for CMPGs can be found via constrained optimization. One tempting approach is to solve it by Lagrangian-based primal-dual methods. As we show, in contrast to the single-agent setting, however, CMPGs do not satisfy strong duality, rendering such approaches inapplicable and potentially unsafe. To solve the CMPG problem, we propose our algorithm Coordinate-Ascent for CMPGs (CA-CMPG), which provably converges to a Nash policy in tabular, finite-horizon CMPGs. Furthermore, we provide the first sample complexity bounds for learning Nash policies in unknown CMPGs, and, which under additional assumptions, guarantee safe exploration.Comment: 30 page

arXiv.org e-Print Archive

Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning

Author: Bogunovic Ilija
Corman Francesco
Janik Tadeusz
Jusup Matej
Krause Andreas
Pásztor Barna
Zhang Kenan
Publication venue
Publication date: 29/06/2023
Field of study

Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-

\text{M}^3

-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-

\text{M}^3

-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.Comment: 25 pages, 14 figures, 3 table

arXiv.org e-Print Archive