214 research outputs found

    Safe Deep Reinforcement Learning: Enhancing the Reliability of Intelligent Systems

    Get PDF
    In the last few years, the impressive success of deep reinforcement learning (DRL) agents in a wide variety of applications has led to the adoption of these systems in safety-critical contexts (e.g., autonomous driving, robotics, and medical applications), where expensive hardware and human safety can be involved. In such contexts, an intelligent learning agent must adhere to certain requirements that go beyond the simple accomplishment of the task and typically include constraints on the agent's behavior. Against this background, this thesis proposes a set of training and validation methodologies that constitute a unified pipeline to generate safe and reliable DRL agents. In the first part of this dissertation, we focus on the problem of constrained DRL, leaving the challenging problem of the formal verification of deep neural networks for the second part of this work. As humans, in our growing process, the help of a mentor is crucial to learn effective strategies to solve a problem while a learning process driven only by a trial-and-error approach usually leads to unsafe and inefficient solutions. Similarly, a pure end-to-end deep reinforcement learning approach often results in suboptimal policies, which typically translates into unpredictable, and thus unreliable, behaviors. Following this intuition, we propose to impose a set of constraints into the DRL loop to guide the training process. These requirements, which typically encode domain expert knowledge, can be seen as suggestions that the agent should follow but is allowed to sometimes ignore if useful to maximize the reward signal. A foundational requirement for our work is finding a proper strategy to define and formally encode these constraints (which we refer to as \textit{rules}). In this thesis, we propose to exploit a formal language inherited from the software engineering community: scenario-based programming (SBP). For the actual training, we rely on the constrained reinforcement learning paradigm, proposing an extended version of the Lagrangian PPO algorithm. Recalling the parallelism with human beings, before being authorized to perform safety-critical operations, we must obtain a certification (e.g., a license to drive a car or a degree to perform medical operations). In the second part of this dissertation, we apply this concept in a deep reinforcement learning context, where the intelligent agents are controlled by artificial neural networks. In particular, we propose to perform a model selection phase after the training to find models that formally respect some given safety requirements before the deployment. However, DNNs have long been considered unpredictable black boxes and thus unsuitable for safety-critical contexts. Against this background, we build upon the emerging field of formal verification for neural networks to extend state-of-the-art approaches to robotic decision-making contexts. We propose ``ProVe", a verification tool for decision-making DNNs that quantifies the probability of violating the specified requirements. In the last chapter of this thesis, we provide a complete case study on a popular robotic problem: ``mapless navigation". Here, we show a concrete example of the application of our pipeline, starting from the definition of the requirements to the training and the final formal verification phase, to finally obtain a provably safe and effective agent

    Threshold Bipower Variation and the Impact of Jumps on Volatility Forecasting

    Get PDF
    This study reconsiders the role of jumps for volatility forecasting by showing that jumps have a positive and mostly significant impact on future volatility. This result becomes apparent once volatility is separated into its continuous and discontinuous component using estimators which are not only consistent, but also scarcely plagued by small-sample bias. To this purpose, we introduce the concept of threshold bipower variation, which is based on the joint use of bipower variation and threshold estimation. We show that its generalization (threshold multipower vari- ation) admits a feasible central limit theorem in the presence of jumps and provides less biased estimates, with respect to the standard multipower variation, of the continuous quadratic varia- tion in finite samples. We further provide a new test for jump detection which has substantially more power than tests based on multipower variation. Empirical analysis (on the S&P500 index, individual stocks and US bond yields) shows that the proposed techniques improve significantly the accuracy of volatility forecasts especially in periods following the occurrence of a jump.volatility estimation, jump detection, volatility forecasting, threshold estimation, financial markets

    Volatility forecasting: the jumps do matter

    Get PDF
    This study reconsiders the role of jumps for volatility forecasting by showing that jumps have positive and mostly significant impact on future volatility. This result becomes apparent once volatility is correctly separated into its continuous and discontinuous component. To this purpose, we introduce the concept of threshold multipower variation (TMPV), which is based on the joint use of bipower variation and threshold estimation. With respect to alternative methods, our TMPV estimator provides less biased and robust estimates of the continuous quadratic variation and jumps. This technique also provides a new test for jump detection which has substantially more power than traditional tests. We use this separation to forecast volatility by employing an heterogeneous autoregressive (HAR) model which is suitable to parsimoniously model long memory in realized volatility time series. Empirical analysis shows that the proposed techniques improve significantly the accuracy of volatility forecasts for the S&P500 index, single stocks and US bond yields, especially in periods following the occurrence of a jumpvolatility forecasting, jumps, bipower variation, threshold estimation, stock, bond

    Volatility Forecasting: The Jumps Do Matter

    Get PDF
    This study reconsiders the role of jumps for volatility forecasting by showing that jumps have a positive and mostly significant impact on future volatility. This result becomes apparent once volatility is correctly separated into its continuous and discontinuous component. To this purpose, we introduce the concept of threshold multipower variation (TMPV), which is based on the joint use of bipower variation and threshold estimation. With respect to alternative methods, our TMPV estimator provides less biased and robust estimates of the continuous quadratic variation and jumps. This technique also provides a new test for jump detection which has substantially more power than traditional tests. We use this separation to forecast volatility by employing an heterogeneous autoregressive (HAR) model which is suitable to parsimoniously model long memory in realized volatility time series. Empirical analysis shows that the proposed techniques improve significantly the accuracy of volatility forecasts for the S&P500 index, single stocks and US bond yields, especially in periods following the occurrence of a jump.volatility forecasting, jumps, bipower variation, threshold estimation, stock, bond

    Farm size and farmers environmental-friendly practices in livestock farming

    Get PDF
    Agriculture is among the major contributors to climate change, accounting for 24 percent of global CO2 emissions. Within the agricultural sector, livestock has a major role in greenhouse gas emissions. However, animal husbandry also affects the environment through nitrogen leaching to water tables from manure and slurry spread or stored on the soil. Both impacts can be diminished by appropriate practices, concerning the effluents storage and the modalities of their spreading on the soil. We investigate to what extent farmers adopt such practices and, more importantly, which are farm and farmersā€™ characteristics more conducive to the adoption of such practices. In particular, given the predominance of small farms in Italian agriculture, we assess the effect of farm size on the adoption of appropriate practices. To this purpose, we estimate ordered and binomial probit models of the adoption of virtuous practices from data of the 2010 Agricultural Census in Piedmont (Italy). The results suggest that, in general, larger farms are more likely to adopt virtuous practices, but the effect of farm size is nevertheless rather weak. Technical and cost issues linked to the physical conditions (location in hills and mountains) are apparently a relevant impediment to these practices

    The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks

    Full text link
    Deep Neural Networks are increasingly adopted in critical tasks that require a high level of safety, e.g., autonomous driving. While state-of-the-art verifiers can be employed to check whether a DNN is unsafe w.r.t. some given property (i.e., whether there is at least one unsafe input configuration), their yes/no output is not informative enough for other purposes, such as shielding, model selection, or training improvements. In this paper, we introduce the #DNN-Verification problem, which involves counting the number of input configurations of a DNN that result in a violation of a particular safety property. We analyze the complexity of this problem and propose a novel approach that returns the exact count of violations. Due to the #P-completeness of the problem, we also propose a randomized, approximate method that provides a provable probabilistic bound of the correct count while significantly reducing computational requirements. We present experimental results on a set of safety-critical benchmarks that demonstrate the effectiveness of our approximate method and evaluate the tightness of the bound.Comment: Accepted in the International Joint Conference on Artificial Intelligence (IJCAI), 2023. [Marzari and Corsi contributed equally

    Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

    Full text link
    The field of robotic Flexible Endoscopes (FEs) has progressed significantly, offering a promising solution to reduce patient discomfort. However, the limited autonomy of most robotic FEs results in non-intuitive and challenging manoeuvres, constraining their application in clinical settings. While previous studies have employed lumen tracking for autonomous navigation, they fail to adapt to the presence of obstructions and sharp turns when the endoscope faces the colon wall. In this work, we propose a Deep Reinforcement Learning (DRL)-based navigation strategy that eliminates the need for lumen tracking. However, the use of DRL methods poses safety risks as they do not account for potential hazards associated with the actions taken. To ensure safety, we exploit a Constrained Reinforcement Learning (CRL) method to restrict the policy in a predefined safety regime. Moreover, we present a model selection strategy that utilises Formal Verification (FV) to choose a policy that is entirely safe before deployment. We validate our approach in a virtual colonoscopy environment and report that out of the 300 trained policies, we could identify three policies that are entirely safe. Our work demonstrates that CRL, combined with model selection through FV, can improve the robustness and safety of robotic behaviour in surgical applications.Comment: Accepted in the IEEE International Conference on Intelligent Robots and Systems (IROS), 2023. [Corsi, Marzari and Pore contributed equally

    An upper-limit on the linear polarization fraction of the GW170817 radio continuum

    Get PDF
    We present late-time radio observations of GW170817, the first binary neutron star merger discovered through gravitational waves by the advanced LIGO and Virgo detectors. Our observations, carried out with the Karl G. Jansky Very Large Array, were optimized to detect polarized radio emission, and thus to constrain the linear polarization fraction of GW170817. At an epoch of ~244 days after the merger, we rule out linearly polarized emission above a fraction of ~12% at a frequency of 2.8 GHz (99% confidence). Within the structured jet scenario (a.k.a. successful jet plus cocoon system) for GW170817, the derived upper-limit on the radio continuum linear polarization fraction strongly constrains the magnetic field configuration in the shocked ejecta. We show that our results for GW170817 are compatible with the low level of linear polarization found in afterglows of cosmological long gamma-ray bursts. Finally, we discuss our findings in the context of future expectations for the study of radio counterparts of binary neutron star mergers identified by ground-based gravitational-wave detectors.Comment: 5 pages, 2 figures, 1 tabl

    Safe and Efficient Reinforcement Learning for Environmental Monitoring

    Get PDF
    This paper discusses the challenges of applying reinforcement techniques to real-world environmental monitoring problems and proposes innovative solutions to overcome them. In particular, we focus on safety, a fundamental problem in RL that arises when it is applied to domains involving humans or hazardous uncertain situations. We propose to use deep neural networks, formal verification, and online refinement of domain knowledge to improve the transparency and efficiency of the learning process, as well as the quality of the final policies. We present two case studies, specifically (i) autonomous water monitoring and (ii) smart control of air quality indoors. In particular, we discuss the challenges and solutions to these problems, addressing crucial issues such as anomaly detection and prevention, real-time control, and online learning. We believe that the proposed techniques can be used to overcome some limitations of RL, providing safe and efficient solutions to complex and urgent problems
    • ā€¦
    corecore