159 research outputs found

    Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

    Full text link
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.Comment: Accepted in AAMAS 202

    Risk-aware Safe Control for Decentralized Multi-agent Systems via Dynamic Responsibility Allocation

    Full text link
    Decentralized control schemes are increasingly favored in various domains that involve multi-agent systems due to the need for computational efficiency as well as general applicability to large-scale systems. However, in the absence of an explicit global coordinator, it is hard for distributed agents to determine how to efficiently interact with others. In this paper, we present a risk-aware decentralized control framework that provides guidance on how much relative responsibility share (a percentage) an individual agent should take to avoid collisions with others while moving efficiently without direct communications. We propose a novel Control Barrier Function (CBF)-inspired risk measurement to characterize the aggregate risk agents face from potential collisions under motion uncertainty. We use this measurement to allocate responsibility shares among agents dynamically and develop risk-aware decentralized safe controllers. In this way, we are able to leverage the flexibility of robots with lower risk to improve the motion flexibility for those with higher risk, thus achieving improved collective safety. We demonstrate the validity and efficiency of our proposed approach through two examples: ramp merging in autonomous driving and a multi-agent position-swapping game

    Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

    Full text link
    Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency.Comment: Accepted to ACL Findings 202

    MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning

    Full text link
    Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. Together, these provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, we offer a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench paves the way towards a better understanding of the capabilities and limitations of multimodal models, while ensuring ease of use, accessibility, and reproducibility. Our toolkits are publicly available, will be regularly updated, and welcome inputs from the community.Comment: JMLR Open Source Software 2023, Code available at https://github.com/pliang279/MultiBenc

    Early Stage Embodied and Operational Analysis for Guiding Sustainable Architectural Design Decisions

    No full text
    Buildings account for a significant portion of global energy consumption and greenhouse gas emissions. Simulating building performance in the early design stage allows architects and engineers to adjust design decisions to reduce embodied carbon and energy consumption. Life-cycle assessment (LCA) is one of the most comprehensive methodologies to evaluate the environmental impact of architectural production and operation. This thesis aims to address the challenges involved in applying LCA to architectural design in the early design stage. By conducting a literature review of the status quo of architectural LCA and identifying the gaps in existing research and tools, this paper continues the research of a novel workflow in Grasshopper that calculates greenhouse gas (GHG) emissions and costs from both embodied and operational phases. The workflow addresses the early-stage uncertainty through random inputs with a Monte Carlo approach and implements surrogate models to accelerate the process for each iteration. The author's contribution to the workflow includes improving its robustness and accuracy by redesigning the simulation model to generate more accurate training data and transitioning to a new machine-learning algorithm. The results of the study provide insights into design decisions that can reduce embodied and operational carbon. A parallel case study was conducted to assess the trade-offs between embodied and operational carbon with regard to construction material selection. In the end, the thesis also proposes possible future research directions.S.M

    Safe Interactive Autonomy for Multi-Agent Systems

    No full text
    It is envisioned that in the near future autonomous systems such as multi-agent systems, will co-exist with humans, e.g., autonomous vehicles will share roads with human drivers. These safety-critical scenarios require formally provable safety guarantees so that the robots will never collide with humans or with each other. It is challenging to provide such guarantees in the real world due to the stochastic environments and inaccurate models of heterogeneous agents including robots and humans. My PhD research investigates decision-making algorithm design for provably-correct safety guarantees in mixed multi-agent systems

    Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract)

    No full text
    Multi-agent Reinforcement Learning (MARL) has been increasingly used in safety-critical applications but has no safety guarantees, especially during training. In this paper, we propose dynamic shielding, a novel decentralized MARL framework to ensure safety in both training and deployment phases. Our framework leverages Shield, a reactive system running in parallel with the reinforcement learning algorithm to monitor and correct agents' behavior. In our algorithm, shields dynamically split and merge according to the environment state in order to maintain decentralization and avoid conservative behaviors while enjoying formal safety guarantees. We demonstrate the effectiveness of MARL with dynamic shielding in the mobile navigation scenario

    Risk-Aware Decentralized Safe Control via Dynamic Responsibility Allocation (Student Abstract)

    No full text
    In this work, we present a novel risk-aware decentralized Control Barrier Function (CBF)-based controller for multi-agent systems. The proposed decentralized controller is composed based on pairwise agent responsibility shares (a percentage), calculated from the risk evaluation of each individual agent faces in a multi-agent interaction environment. With our proposed CBF-inspired risk evaluation framework, the responsibility portions between pairwise agents are dynamically updated based on the relative risk they face. Our method allows agents with lower risk to enjoy a higher level of freedom in terms of a wider action space, and the agents exposed to higher risk are constrained more tightly on action spaces, and are therefore forced to proceed with caution
    • …
    corecore