2,837 research outputs found

    Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

    Full text link
    Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.Comment: Accepted in AAMAS 202

    Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe

    Full text link
    In this work, we consider the problem of learning a feed-forward neural network (NN) controller to safely steer an arbitrarily shaped planar robot in a compact and obstacle-occluded workspace. Unlike existing methods that depend strongly on the density of data points close to the boundary of the safe state space to train NN controllers with closed-loop safety guarantees, we propose an approach that lifts such assumptions on the data that are hard to satisfy in practice and instead allows for graceful safety violations, i.e., of a bounded magnitude that can be spatially controlled. To do so, we employ reachability analysis methods to encapsulate safety constraints in the training process. Specifically, to obtain a computationally efficient over-approximation of the forward reachable set of the closed-loop system, we partition the robot's state space into cells and adaptively subdivide the cells that contain states which may escape the safe set under the trained control law. To do so, we first design appropriate under- and over-approximations of the robot's footprint to adaptively subdivide the configuration space into cells. Then, using the overlap between each cell's forward reachable set and the set of infeasible robot configurations as a measure for safety violations, we introduce penalty terms into the loss function that penalize this overlap in the training process. As a result, our method can learn a safe vector field for the closed-loop system and, at the same time, provide numerical worst-case bounds on safety violation over the whole configuration space, defined by the overlap between the over-approximation of the forward reachable set of the closed-loop system and the set of unsafe states. Moreover, it can control the tradeoff between computational complexity and tightness of these bounds. Finally, we provide a simulation study that verifies the efficacy of the proposed scheme

    How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review

    Full text link
    Context: Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called 'safety-critical' systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches. Objective: This paper aims to elucidate challenges related to the certification of ML-based safety-critical systems, as well as the solutions that are proposed in the literature to tackle them, answering the question 'How to Certify Machine Learning Based Safety-critical Systems?'. Method: We conduct a Systematic Literature Review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems. In total, we identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification. We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted. Results: The SLR results highlighted the enthusiasm of the community for this subject, as well as the lack of diversity in terms of datasets and type of models. It also emphasized the need to further develop connections between academia and industries to deepen the domain study. Finally, it also illustrated the necessity to build connections between the above mention main pillars that are for now mainly studied separately. Conclusion: We highlighted current efforts deployed to enable the certification of ML based software systems, and discuss some future research directions.Comment: 60 pages (92 pages with references and complements), submitted to a journal (Automated Software Engineering). Changes: Emphasizing difference traditional software engineering / ML approach. Adding Related Works, Threats to Validity and Complementary Materials. Adding a table listing papers reference for each section/subsection

    Tracking Foodborne Pathogens from Farm to Table: Data Needs to Evaluate Control Options

    Get PDF
    Food safety policymakers and scientists came together at a conference in January 1995 to evaluate data available for analyzing control of foodborne microbial pathogens. This proceedings starts with data regarding human illnesses associated with foodborne pathogens and moves backwards in the food chain to examine pathogen data in the processing sector and at the farm level. Of special concern is the inability to link pathogen data throughout the food chain. Analytical tools to evaluate the impact of changing production and consumption practices on foodborne disease risks and their economic consequences are presented. The available data are examined to see how well they meet current analytical needs to support policy analysis. The policymaker roundtable highlights the tradeoffs involved in funding databases, the economic evaluation of USDA's Hazard Analysis Critical Control Point (HACCP) proposal and other food safety policy issues, and the necessity of a multidisciplinary approach toward improving food safety databases.food safety, cost benefit analysis, foodborne disease risk, foodborne pathogens, Hazard Analysis Critical Control Point (HACCP), probabilistic scenario analysis, fault-tree analysis, Food Consumption/Nutrition/Food Safety,

    Bowtie models as preventive models in maritime safety

    Get PDF
    Aquest treball ha sorgit d’una proposta del Dr. Rodrigo de Larrucea que ha acabat de publicar un llibre ambiciós sobre Seguretat Marítima. Com ell mateix diu, el tema “excedeix amb molt les potencialitats de l’autor”, així que en el meu cas això és més cert. Es pot aspirar, però, a fer una modesta contribució a l’estudi i difusió de la seguretat de la cultura marítima, que només apareix a les notícies quan tenen lloc desastres molt puntuals. En qualsevol cas, el professor em va proposar que em centrés en els Bowtie Models, models en corbatí, que integren l’arbre de causes y el de conseqüències (en anglès el Fault Tree Analysis, FTA, i l’Event Tree Analysis, ETA). Certament, existeixen altres metodologies i aproximacions (i en el seu llibre en presenta vàries, resumides), però per la seva senzillesa conceptual i possibilitat de generalització i integració dels resultats era una bona aposta. Així, després d’una fase de meditació i recopilació de informació, em vaig decidir a presentar un model en corbatí molt general on caben les principals causes d’accidents (factores ambientals, error humà i fallada mecànica), comptant també que pot existir una combinació de causes. De tota manera, a l’hora d’explotar aquest model existeix la gran dificultat de donar una probabilitat de ocurrència, un nombre entre 0 i 1, a cada branca. Normalment les probabilitats d’ocurrència són petites i degut a això difícils d’estimar. Cada accident és diferent, de grans catàstrofes n’hi ha poques, i cada accident ja és estudiat de manera exhaustiva (més exhaustiva quan més greu és). Un altre factor que dificulta l’estima de la probabilitat de fallada és l’evolució constant del món marítim, tant des del punt de vista tècnic, de formació, legal i fins i tot generacional doncs cada generació de marins és diferent. Els esforços estan doncs enfocats a augmentar la seguretat, encara que sempre amb un ull posat sobre els costs. Així, he presentat un model en corbatí pel seu valor didàctic i gràfic però sense entrar en detalls numèrics, que si s’escau ja aniré afinant i interioritzant en l’exercici de la professió. En aquest treball també he intentat no mantenir-me totalment al costat de la teoria (ja se sap que si tot es fa bé, tot surt perfecte, etc…) sinó presentar amb cert detall 2 casos ben coneguts d’accidents marítims: el petroler Exxon Valdez, el 1989 i el ferry Estonia en 1994, entre altres esmentats. Són casos ja una mica vells però que van contribuir a augmentar la cultura de la seguretat, fins a arribar al nivell del que gaudim actualment, al menys als països occidentals. Doncs la seguretat, com esmenta Rodrigo de Larrucea “és una actitud i mai és fortuïta; sempre és el resultat d’una voluntat decidida, un esforç sincer, una direcció intel·ligent i una execució acurada. Sens lloc a dubtes, sempre suposa la millor alternativa”. The work has been inspired in its initial aspects by the book of my tutor Jaime Rodrigo de Larrucea, that presents a state of the art of all the maritime aspects related to safety. Evidently, since it covers all the topics, it cannot deepen on every topic. It was my opportunity to deepen in the Bowtie Model but finally I have also covered a wide variety of topics. Later, when I began to study the topics, I realized that the people in the maritime world usually do not understand to a great extent statistics. Everybody is concerned about safety but few nautical students take a probabilistic approach to the accidents. For this it is extremely important to study the population that is going to be studied: in our case the SOLAS ships Also, during my time at Riga, I have been very concerned with the most diverse accidents, some of them studied during the courses at Barcelona. I have seen that it is difficult to model mathematically the accidents, since each one has different characteristics, angles, and surely there are not 2 equal. Finally, it was accorded that I should concentrate on the Bowtie Model, which is not very complex from a statistical point of view. It is simply a fault tree of events model and a tree of effects. I present some examples in this Chapter 2. The difficulty I point out is to try to estimate the probabilities of occurrence of events that are unusual. We concentrated at major accidents, those that may cause victims or heavy losses. Then, for the sake of generality, at Chapter 4, I have divided the causes in 4 great classes: Natural hazards, human factor, mechanical failure and attacks (piracy and terrorism). The last concern maybe should not be included beside the others since terrorism and piracy acts are not accidents, but since there is an important code dedicated to prevent security threats, ISPS, it is example of design of barriers to prevent an undesired event (although it gives mainly guidelines to follow by the States, Port Terminals and Shipping Companies). I have presented a detailed study of the tragedy of the Estonia, showing how a mechanical failure triggered the failure of the ferry, by its nature a delicate ship, but there were other factors such as poor maintenance and heavy seas. At the next Chapter, certain characteristics of error chains are analyzed. Finally, the conclusions are drawn, offering a pretty optimistic view of the safety (and security) culture at the Western World but that may not easily permeate the entire World, due to the associated costs
    • …
    corecore