159 research outputs found
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize
reward but do not have safety guarantees during the learning and deployment
phases. Although shielding with Linear Temporal Logic (LTL) is a promising
formal method to ensure safety in single-agent Reinforcement Learning (RL), it
results in conservative behaviors when scaling to multi-agent scenarios.
Additionally, it poses computational challenges for synthesizing shields in
complex multi-agent environments. This work introduces Model-based Dynamic
Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes
distributive shields, which are reactive systems running in parallel with each
MARL agent, to monitor and rectify unsafe behaviors. The shields can
dynamically split, merge, and recompute based on agents' states. This design
enables efficient synthesis of shields to monitor agents in complex
environments without coordination overheads. We also propose an algorithm to
synthesize shields without prior knowledge of the dynamics model. The proposed
algorithm obtains an approximate world model by interacting with the
environment during the early stage of exploration, making our MBDS enjoy formal
safety guarantees with high probability. We demonstrate in simulations that our
framework can surpass existing baselines in terms of safety guarantees and
learning performance.Comment: Accepted in AAMAS 202
Risk-aware Safe Control for Decentralized Multi-agent Systems via Dynamic Responsibility Allocation
Decentralized control schemes are increasingly favored in various domains
that involve multi-agent systems due to the need for computational efficiency
as well as general applicability to large-scale systems. However, in the
absence of an explicit global coordinator, it is hard for distributed agents to
determine how to efficiently interact with others. In this paper, we present a
risk-aware decentralized control framework that provides guidance on how much
relative responsibility share (a percentage) an individual agent should take to
avoid collisions with others while moving efficiently without direct
communications. We propose a novel Control Barrier Function (CBF)-inspired risk
measurement to characterize the aggregate risk agents face from potential
collisions under motion uncertainty. We use this measurement to allocate
responsibility shares among agents dynamically and develop risk-aware
decentralized safe controllers. In this way, we are able to leverage the
flexibility of robots with lower risk to improve the motion flexibility for
those with higher risk, thus achieving improved collective safety. We
demonstrate the validity and efficiency of our proposed approach through two
examples: ramp merging in autonomous driving and a multi-agent
position-swapping game
Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control
Pretrained language models have demonstrated extraordinary capabilities in
language generation. However, real-world tasks often require controlling the
distribution of generated text in order to mitigate bias, promote fairness, and
achieve personalization. Existing techniques for controlling the distribution
of generated text only work with quantified distributions, which require
pre-defined categories, proportions of the distribution, or an existing corpus
following the desired distributions. However, many important distributions,
such as personal preferences, are unquantified. In this work, we tackle the
problem of generating text following arbitrary distributions (quantified and
unquantified) by proposing Nano, a few-shot human-in-the-loop training
algorithm that continuously learns from human feedback. Nano achieves
state-of-the-art results on single topic/attribute as well as quantified
distribution control compared to previous works. We also show that Nano is able
to learn unquantified distributions, achieves personalization, and captures
differences between different individuals' personal preferences with high
sample efficiency.Comment: Accepted to ACL Findings 202
MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning
Learning multimodal representations involves integrating information from
multiple heterogeneous sources of data. In order to accelerate progress towards
understudied modalities and tasks while ensuring real-world robustness, we
release MultiZoo, a public toolkit consisting of standardized implementations
of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark
spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
Together, these provide an automated end-to-end machine learning pipeline that
simplifies and standardizes data loading, experimental setup, and model
evaluation. To enable holistic evaluation, we offer a comprehensive methodology
to assess (1) generalization, (2) time and space complexity, and (3) modality
robustness. MultiBench paves the way towards a better understanding of the
capabilities and limitations of multimodal models, while ensuring ease of use,
accessibility, and reproducibility. Our toolkits are publicly available, will
be regularly updated, and welcome inputs from the community.Comment: JMLR Open Source Software 2023, Code available at
https://github.com/pliang279/MultiBenc
Early Stage Embodied and Operational Analysis for Guiding Sustainable Architectural Design Decisions
Buildings account for a significant portion of global energy consumption and greenhouse gas emissions. Simulating building performance in the early design stage allows architects and engineers to adjust design decisions to reduce embodied carbon and energy consumption. Life-cycle assessment (LCA) is one of the most comprehensive methodologies to evaluate the environmental impact of architectural production and operation. This thesis aims to address the challenges involved in applying LCA to architectural design in the early design stage. By conducting a literature review of the status quo of architectural LCA and identifying the gaps in existing research and tools, this paper continues the research of a novel workflow in Grasshopper that calculates greenhouse gas (GHG) emissions and costs from both embodied and operational phases. The workflow addresses the early-stage uncertainty through random inputs with a Monte Carlo approach and implements surrogate models to accelerate the process for each iteration. The author's contribution to the workflow includes improving its robustness and accuracy by redesigning the simulation model to generate more accurate training data and transitioning to a new machine-learning algorithm. The results of the study provide insights into design decisions that can reduce embodied and operational carbon. A parallel case study was conducted to assess the trade-offs between embodied and operational carbon with regard to construction material selection. In the end, the thesis also proposes possible future research directions.S.M
Safe Interactive Autonomy for Multi-Agent Systems
It is envisioned that in the near future autonomous systems such as multi-agent systems, will co-exist with humans, e.g., autonomous vehicles will share roads with human drivers. These safety-critical scenarios require formally provable safety guarantees so that the robots will never collide with humans or with each other. It is challenging to provide such guarantees in the real world due to the stochastic environments and inaccurate models of heterogeneous agents including robots and humans. My PhD research investigates decision-making algorithm design for provably-correct safety guarantees in mixed multi-agent systems
Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract)
Multi-agent Reinforcement Learning (MARL) has been increasingly used in safety-critical applications but has no safety guarantees, especially during training. In this paper, we propose dynamic shielding, a novel decentralized MARL framework to ensure safety in both training and deployment phases. Our framework leverages Shield, a reactive system running in parallel with the reinforcement learning algorithm to monitor and correct agents' behavior. In our algorithm, shields dynamically split and merge according to the environment state in order to maintain decentralization and avoid conservative behaviors while enjoying formal safety guarantees. We demonstrate the effectiveness of MARL with dynamic shielding in the mobile navigation scenario
Risk-Aware Decentralized Safe Control via Dynamic Responsibility Allocation (Student Abstract)
In this work, we present a novel risk-aware decentralized Control Barrier Function (CBF)-based controller for multi-agent systems. The proposed decentralized controller is composed based on pairwise agent responsibility shares (a percentage), calculated from the risk evaluation of each individual agent faces in a multi-agent interaction environment. With our proposed CBF-inspired risk evaluation framework, the responsibility portions between pairwise agents are dynamically updated based on the relative risk they face. Our method allows agents with lower risk to enjoy a higher level of freedom in terms of a wider action space, and the agents exposed to higher risk are constrained more tightly on action spaces, and are therefore forced to proceed with caution
- …