36 research outputs found

    Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning

    Full text link
    Constrained multiagent reinforcement learning (C-MARL) is gaining importance as MARL algorithms find new applications in real-world systems ranging from energy systems to drone swarms. Most C-MARL algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of this penalty term on the MARL problem. First, we show that the standard practice of using the constraint function as the penalty leads to a weak notion of safety. However, by making simple modifications to the penalty term, we can enforce meaningful probabilistic (chance and conditional value at risk) constraints. Second, we quantify the effect of the penalty term on the value function, uncovering an improved value estimation procedure. We use these insights to propose a constrained multiagent advantage actor critic (C-MAA2C) algorithm. Simulations in a simple constrained multiagent environment affirm that our reinterpretation of the primal-dual method in terms of probabilistic constraints is effective, and that our proposed value estimate accelerates convergence to a safe joint policy.Comment: 19 pages, 8 figures. Presented at L4DC 202

    Enhancement of Distribution System State Estimation Using Pruned Physics-Aware Neural Networks

    Get PDF
    Realizing complete observability in the three-phase distribution system remains a challenge that hinders the implementation of classic state estimation algorithms. In this paper, a new method, called the pruned physics-aware neural network (P2N2), is developed to improve the voltage estimation accuracy in the distribution system. The method relies on the physical grid topology, which is used to design the connections between different hidden layers of a neural network model. To verify the proposed method, a numerical simulation based on one-year smart meter data of load consumptions for three-phase power flow is developed to generate the measurement and voltage state data. The IEEE 123-node system is selected as the test network to benchmark the proposed algorithm against the classic weighted least squares (WLS). Numerical results show that P2N2 outperforms WLS in terms of data redundancy and estimation accuracy

    Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

    Full text link
    In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.Comment: Accepted at IEEE CDC'23. 7 pages, 6 figure
    corecore