115 research outputs found

    Multi-Agent Chance-Constrained Stochastic Shortest Path with Application to Risk-Aware Intelligent Intersection

    Full text link
    In transportation networks, where traffic lights have traditionally been used for vehicle coordination, intersections act as natural bottlenecks. A formidable challenge for existing automated intersections lies in detecting and reasoning about uncertainty from the operating environment and human-driven vehicles. In this paper, we propose a risk-aware intelligent intersection system for autonomous vehicles (AVs) as well as human-driven vehicles (HVs). We cast the problem as a novel class of Multi-agent Chance-Constrained Stochastic Shortest Path (MCC-SSP) problems and devise an exact Integer Linear Programming (ILP) formulation that is scalable in the number of agents' interaction points (e.g., potential collision points at the intersection). In particular, when the number of agents within an interaction point is small, which is often the case in intersections, the ILP has a polynomial number of variables and constraints. To further improve the running time performance, we show that the collision risk computation can be performed offline. Additionally, a trajectory optimization workflow is provided to generate risk-aware trajectories for any given intersection. The proposed framework is implemented in CARLA simulator and evaluated under a fully autonomous intersection with AVs only as well as in a hybrid setup with a signalized intersection for HVs and an intelligent scheme for AVs. As verified via simulations, the featured approach improves intersection's efficiency by up to 200%200\% while also conforming to the specified tunable risk threshold

    Weakly Coupled Deep Q-Networks

    Full text link
    We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of multiple independent subproblems connected by an action space constraint, which is a structural property that frequently emerges in practice. Despite this appealing structure, WCMDPs quickly become intractable as the number of subproblems grows. WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value. This guides the main DQN agent towards optimality. We show that the tabular version, weakly coupled Q-learning (WCQL), converges almost surely to the optimal action value. Numerical experiments show faster convergence compared to DQN and related techniques in settings with as many as 10 subproblems, 3103^{10} total actions, and a continuous state space.Comment: To appear in proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Approximate Shielding of Atari Agents for Safe Exploration

    Full text link
    Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.Comment: Accepted for presentation at the ALA workshop as part of AAMAS 202

    LEARNING TO ACT WITH ROBUSTNESS

    Get PDF
    Reinforcement Learning (RL) is learning to act in different situations to maximize a numerical reward signal. The most common approach of formalizing RL is to use the frameworkof optimal control in an inadequately known Markov Decision Process (MDP). Traditional approaches toward solving RL problems build on two common assumptions: i) exploration is allowed for the purpose of learning the MDP model and ii) optimizing for the expected objective is sufficient. These assumptions comfortably hold for many simulated domains like games (e.g. Atari, Go), but are not sufficient for many real-world problems. Consider for example the domain of precision medicine for personalized treatment. Adopting a medical treatment for the sole purpose of learning its impact is prohibitive. It is also not permissible to embrace a specific treatment procedure by considering only the expected outcome, ignoring the potential of worst-case undesirable effects. Therefore, applying RL to solve real-world problems brings some additional challenges to address. In this thesis, we assume that exploration is impossible because of the sensitivity of actions in the domain. We therefore adopt a Batch RL framework, which operates with a logged set of fixed dataset without interacting with the environment. We also accept the need of finding solutions that work well in both average and worst case situations, we label such solutions as robust. We consider the robust MDP (RMDP) framework for handling these challenges. RMDPs provide the foundations of quantifying the uncertainties about the model by using so called ambiguity sets. Ambiguity sets represent the set of plausible transition probabilities - which is usually constructed as a multi-dimensional confidence region. Ambiguity sets determine the trade-off between robustness and average-case performance of an RMDP. This thesis presents a novel approach to optimizing the shape of ambiguity sets constructed with weighted L1−norm. We derive new high-confidence sampling bounds for weighted L1 ambiguity sets and describe how to compute near-optimal weights from coarse estimates of value functions. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees. In addition to reshaping the ambiguity sets, it is also desirable to optimize the size and position of the sets for further improvement in performance. In this regard, this thesis presents a method for constructing ambiguity sets that can achieve less conservative solutions with the same worst-case guarantees by 1) leveraging a Bayesian prior, and 2) relaxing the requirement that the set is a confidence interval. Our theoretical analysis establishes the safety of the proposed method, and the empirical results demonstrate its practical promise. In addition to optimizing ambiguity sets for RMDPs, this thesis also proposes a new paradigm for incorporating robustness into the constrained-MDP framework. We apply robustness to both the rewards and constrained-costs, because robustness is equally (if not more) important for the constrained costs as well. We derive required gradient update rules and propose a policy gradient class of algorithm. The performance of the proposed algorithm is evaluated on several problem domains. Parallel to Robust-MDPs, a slightly different perspective on handling model uncertainties is to compute soft-robust solutions using a risk measure (e.g. Value-at-Risk or Conditional Value-at-Risk). In high-stakes domains, it is important to quantify and manage risk that arises from inherently stochastic transitions between different states of the model. Most prior work on robust RL and risk-averse RL address the inherent transition uncertainty and model uncertainty independently. This thesis proposes a unified Risk-Averse Soft-Robust (RASR) framework that quantifies both model and transition uncertainties together. We show that the RASR objective can be solved efficiently when formulated using the Entropic risk measure. We also report theoretical analysis and empirical evidences on several problem domains. The methods presented in this thesis can potentially be applied in many practical applications of artificial intelligence, such as agriculture, healthcare, robotics and so on. They help us to broaden our understanding toward computing robust solutions to safety critical domains. Having robust and more realistic solutions to sensitive practical problems can inspire widespread adoption of AI to solve challenging real world problems, potentially leading toward the pinnacle of the age of automation
    • …
    corecore