84 research outputs found

    Privacy-Engineered Value Decomposition Networks for Cooperative Multi-Agent Reinforcement Learning

    Full text link
    In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's long-term rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm, an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination, and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guarantee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data, past environment interactions, when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination.Comment: Paper accepted at 62nd IEEE Conference on Decision and Contro

    Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration

    Full text link
    Future AI applications require performance, reliability and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures and new section

    Reasoning Under Uncertainty in Cyber-Physical Systems: Toward Efficient and Secure Operation

    Full text link
    The increased sensing, processing, communication, and control capabilities introduced by cyber-physical systems bring many potential improvements to the operation of society's systems, but also introduce questions as to how one can ensure their efficient and secure operation. This dissertation investigates three questions related to decision-making under uncertainty in cyber-physical systems settings. First, in the context of power systems and electricity markets, how can one design algorithms that guide self-interested agents to a socially optimal and physically feasible outcome, subject to the fact that agents only possess localized information of the system and can only react to local signals? The proposed algorithms, investigated in the context of two distinct models, are iterative in nature and involve the exchange of messages between agents. The first model consists of a network of interconnected power systems controlled by a collection of system operators. Each system operator possesses knowledge of its own localized region and aims to prescribe the cost minimizing set of net injections for its buses. By using relative voltage angles as messages, system operators iteratively communicate to reach a social-cost minimizing and physically feasible set of injections for the whole network. The second model consists of a market operator and market participants (distribution, generation, and transmission companies). Using locational marginal pricing, the market operator is able to guide the market participants to a competitive equilibrium, which, under an assumption on the positivity of prices, is shown to be a globally optimal solution to the non-convex social-welfare maximization problem. Common to both algorithms is the use of a quadratic power flow approximation that preserves important non-linearities (power losses) while maintaining desirable mathematical properties that permit convergence under natural conditions. Second, when a system is under attack from a malicious agent, what models are appropriate for performing real-time and scalable threat assessment and response selection when we only have partial information about the attacker's intent and capabilities? The proposed model, termed the dynamic security model, is based on a type of attack graph, termed a condition dependency graph, and describes how an attacker can infiltrate a cyber network. By embedding a state space on the graph, the model is able to quantify the attacker's progression. Consideration of multiple attacker types, corresponding to attack strategies, allows one to model the defender's uncertainty of the attacker's true strategy/intent. Using noisy security alerts, the defender maintains a belief over both the capabilities/progression of the attacker (via a security state) and its strategy (attacker type). An online, tree-based search method, termed the online defense algorithm, is developed that takes advantage of the model's structure, permitting scalable computation of defense policies. Finally, in partially observable sequential decision-making environments, specifically partially observable Markov decision processes (POMDPs), under what conditions do optimal policies possess desirable structure? Motivated by the dynamic security model, we investigate settings where the underlying state space is partially ordered (i.e. settings where one cannot always say whether one state is better or worse than another state). The contribution lies in the derivation of natural conditions on the problem's parameters such that optimal policies are monotone in the belief for a class of two-action POMDPs. The extension to the partially ordered setting requires defining a new stochastic order, termed the generalized monotone likelihood ratio, and a corresponding class of order-preserving matrices, termed generalized totally positive of order 2.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144026/1/miehling_1.pd

    Formal Methods for Autonomous Systems

    Full text link
    Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification
    • …
    corecore