36 research outputs found
Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning
Constrained multiagent reinforcement learning (C-MARL) is gaining importance
as MARL algorithms find new applications in real-world systems ranging from
energy systems to drone swarms. Most C-MARL algorithms use a primal-dual
approach to enforce constraints through a penalty function added to the reward.
In this paper, we study the structural effects of this penalty term on the MARL
problem. First, we show that the standard practice of using the constraint
function as the penalty leads to a weak notion of safety. However, by making
simple modifications to the penalty term, we can enforce meaningful
probabilistic (chance and conditional value at risk) constraints. Second, we
quantify the effect of the penalty term on the value function, uncovering an
improved value estimation procedure. We use these insights to propose a
constrained multiagent advantage actor critic (C-MAA2C) algorithm. Simulations
in a simple constrained multiagent environment affirm that our reinterpretation
of the primal-dual method in terms of probabilistic constraints is effective,
and that our proposed value estimate accelerates convergence to a safe joint
policy.Comment: 19 pages, 8 figures. Presented at L4DC 202
Enhancement of Distribution System State Estimation Using Pruned Physics-Aware Neural Networks
Realizing complete observability in the three-phase distribution system
remains a challenge that hinders the implementation of classic state estimation
algorithms. In this paper, a new method, called the pruned physics-aware neural
network (P2N2), is developed to improve the voltage estimation accuracy in the
distribution system. The method relies on the physical grid topology, which is
used to design the connections between different hidden layers of a neural
network model. To verify the proposed method, a numerical simulation based on
one-year smart meter data of load consumptions for three-phase power flow is
developed to generate the measurement and voltage state data. The IEEE 123-node
system is selected as the test network to benchmark the proposed algorithm
against the classic weighted least squares (WLS). Numerical results show that
P2N2 outperforms WLS in terms of data redundancy and estimation accuracy
Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
In multi-timescale multi-agent reinforcement learning (MARL), agents interact
across different timescales. In general, policies for time-dependent behaviors,
such as those induced by multiple timescales, are non-stationary. Learning
non-stationary policies is challenging and typically requires sophisticated or
inefficient algorithms. Motivated by the prevalence of this control problem in
real-world complex systems, we introduce a simple framework for learning
non-stationary policies for multi-timescale MARL. Our approach uses available
information about agent timescales to define a periodic time encoding. In
detail, we theoretically demonstrate that the effects of non-stationarity
introduced by multiple timescales can be learned by a periodic multi-agent
policy. To learn such policies, we propose a policy gradient algorithm that
parameterizes the actor and critic with phase-functioned neural networks, which
provide an inductive bias for periodicity. The framework's ability to
effectively learn multi-timescale policies is validated on a gridworld and
building energy management environment.Comment: Accepted at IEEE CDC'23. 7 pages, 6 figure