152 research outputs found

    Safety-guided deep reinforcement learning via online gaussian process estimation

    Full text link
    An important facet of reinforcement learning (RL) has to do with how the agent goes about exploring the environment. Traditional exploration strategies typically focus on efficiency and ignore safety. However, for practical applications, ensuring safety of the agent during exploration is crucial since performing an unsafe action or reaching an unsafe state could result in irreversible damage to the agent. The main challenge of safe exploration is that characterizing the unsafe states and actions is difficult for large continuous state or action spaces and unknown environments. In this paper, we propose a novel approach to incorporate estimations of safety to guide exploration and policy search in deep reinforcement learning. By using a cost function to capture trajectory-based safety, our key idea is to formulate the state-action value function of this safety cost as a candidate Lyapunov function and extend control-theoretic results to approximate its derivative using online Gaussian Process (GP) estimation. We show how to use these statistical models to guide the agent in unknown environments to obtain high-performance control policies with provable stability certificates.Accepted manuscrip

    DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

    Full text link
    Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specifically, we introduce an auxiliary objective based on the multi-view in-formation bottleneck (MIB) principle which quantifies the amount of task-irrelevant information and encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions. This enables us to train high-performance policies that are robust to visual distractions and can generalize to unseen environments. We demonstrate that our approach can achieve SOTA performance on diverse visual control tasks on the DeepMind Control Suite, even when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark. Our code is open-sourced and available at https://github.com/JmfanBU/DRIBO.Comment: 27 page

    Adversarial Training and Provable Robustness: A Tale of Two Objectives

    Full text link
    We propose a principled framework that combines adversarial training and provable robustness verification for training certifiably robust neural networks. We formulate the training problem as a joint optimization problem with both empirical and provable robustness objectives and develop a novel gradient-descent technique that can eliminate bias in stochastic multi-gradients. We perform both theoretical analysis on the convergence of the proposed technique and experimental comparison with state-of-the-arts. Results on MNIST and CIFAR-10 show that our method can consistently match or outperform prior approaches for provable l infinity robustness. Notably, we achieve 6.60% verified test error on MNIST at epsilon = 0.3, and 66.57% on CIFAR-10 with epsilon = 8/255.Comment: Accepted at AAAI 202

    Brain MR Image Segmentation Based on an Adaptive Combination of Global and Local Fuzzy Energy

    Get PDF
    This paper presents a novel fuzzy algorithm for segmentation of brain MR images and simultaneous estimation of intensity inhomogeneity. The proposed algorithm defines an objective function including a local fuzzy energy and a global fuzzy energy. Based on the assumption that the local image intensities belonging to each different tissue satisfy Gaussian distributions with different means, we derive the local fuzzy energy by utilizing maximum a posterior probability (MAP) and Bayes rule. The global fuzzy energy is defined by measuring the distance between the original image and the corresponding inhomogeneity-free image. We combine the global fuzzy energy with the local fuzzy energy using an adaptive weight function whose value varies with the local contrast of the image. This combination enables the proposed algorithm to address intensity inhomogeneity and to improve the accuracy of segmentation and its robustness to initialization. Besides, the proposed algorithm incorporates neighborhood spatial information into the membership function to reduce the impact of noise. Experimental results for synthetic and real images validate the desirable performances of the proposed algorithm

    DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning

    Full text link
    Online Class-Incremental (OCI) learning has sparked new approaches to expand the previously trained model knowledge from sequentially arriving data streams with new classes. Unfortunately, OCI learning can suffer from catastrophic forgetting (CF) as the decision boundaries for old classes can become inaccurate when perturbated by new ones. Existing literature have applied the data augmentation (DA) to alleviate the model forgetting, while the role of DA in OCI has not been well understood so far. In this paper, we theoretically show that augmented samples with lower correlation to the original data are more effective in preventing forgetting. However, aggressive augmentation may also reduce the consistency between data and corresponding labels, which motivates us to exploit proper DA to boost the OCI performance and prevent the CF problem. We propose the Enhanced Mixup (EnMix) method that mixes the augmented samples and their labels simultaneously, which is shown to enhance the sample diversity while maintaining strong consistency with corresponding labels. Further, to solve the class imbalance problem, we design an Adaptive Mixup (AdpMix) method to calibrate the decision boundaries by mixing samples from both old and new classes and dynamically adjusting the label mixing ratio. Our approach is demonstrated to be effective on several benchmark datasets through extensive experiments, and it is shown to be compatible with other replay-based techniques.Comment: 10 pages, 7 figures and 3 table

    Iron Porphyrin Carbenes As Catalytic Intermediates: Structures, Mössbauer and NMR Spectroscopic Properties, and Bonding

    Get PDF
    Iron porphyrin carbenes (IPCs) are thought to be intermediates involved in the metabolism of various xenobiotics by cytochrome P450, as well as in chemical reactions catalyzed by metalloporphyrins and engineered P450s. While early work proposed IPCs to contain FeII, more recent work invokes a double-bond description of the iron–carbon bond, similar to that found in FeIV porphyrin oxenes. Reported herein is the first quantum chemical investigation of IPC Mössbauer and NMR spectroscopic properties, as well as their electronic structures, together with comparisons to ferrous heme proteins and an FeIV oxene model. The results provide the first accurate predictions of the experimental spectroscopic observables as well as the first theoretical explanation of their electrophilic nature, as deduced from experiment. The preferred resonance structure is FeII←{:C(X)Y}0 and not FeIV{C(X)Y}2−, a result that will facilitate research on IPC reactivities in various chemical and biochemical systems

    POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems

    Get PDF
    We propose POLAR, a \textbf{pol}ynomial \textbf{ar}ithmetic framework that leverages polynomial overapproximations with interval remainders for bounded-time reachability analysis of neural network-controlled systems (NNCSs). Compared with existing arithmetic approaches that use standard Taylor models, our framework uses a novel approach to iteratively overapproximate the neuron output ranges layer-by-layer with a combination of Bernstein polynomial interpolation for continuous activation functions and Taylor model arithmetic for the other operations. This approach can overcome the main drawback in the standard Taylor model arithmetic, i.e. its inability to handle functions that cannot be well approximated by Taylor polynomials, and significantly improve the accuracy and efficiency of reachable states computation for NNCSs. To further tighten the overapproximation, our method keeps the Taylor model remainders symbolic under the linear mappings when estimating the output range of a neural network. We show that POLAR can be seamlessly integrated with existing Taylor model flowpipe construction techniques, and demonstrate that POLAR significantly outperforms the current state-of-the-art techniques on a suite of benchmarks

    Divide and Slide: Layer-Wise Refinement for Output Range Analysis of Deep Neural Networks

    Get PDF
    In this article, we present a layer-wise refinement method for neural network output range analysis. While approaches such as nonlinear programming (NLP) can directly model the high nonlinearity brought by neural networks in output range analysis, they are known to be difficult to solve in general. We propose to use a convex polygonal relaxation (overapproximation) of the activation functions to cope with the nonlinearity. This allows us to encode the relaxed problem into a mixed-integer linear program (MILP), and control the tightness of the relaxation by adjusting the number of segments in the polygon. Starting with a segment number of 1 for each neuron, which coincides with a linear programming (LP) relaxation, our approach selects neurons layer by layer to iteratively refine this relaxation. To tackle the increase of the number of integer variables with tighter refinement, we bridge the propagation-based method and the programming-based method by dividing and sliding the layerwise constraints. Specifically, given a sliding number s, for the neurons in layer l, we only encode the constraints of the layers between l - s and l. We show that our overall framework is sound and provides a valid overapproximation. Experiments on deep neural networks demonstrate significant improvement on output range analysis precision using our approach compared to the state-of-the-art

    ReachNN: Reachability Analysis of Neural-Network Controlled Systems

    Get PDF
    Applying neural networks as controllers in dynamical systems has shown great promises. However, it is critical yet challenging to verify the safety of such control systems with neural-network controllers in the loop. Previous methods for verifying neural network controlled systems are limited to a few specific activation functions. In this work, we propose a new reachability analysis approach based on Bernstein polynomials that can verify neural-network controlled systems with a more general form of activation functions, i.e., as long as they ensure that the neural networks are Lipschitz continuous. Specifically, we consider abstracting feedforward neural networks with Bernstein polynomials for a small subset of inputs. To quantify the error introduced by abstraction, we provide both theoretical error bound estimation based on the theory of Bernstein polynomials and more practical sampling based error bound estimation, following a tight Lipschitz constant estimation approach based on forward reachability analysis. Compared with previous methods, our approach addresses a much broader set of neural networks, including heterogeneous neural networks that contain multiple types of activation functions. Experiment results on a variety of benchmarks show the effectiveness of our approach

    ARCH-COMP22 category report: Artificial intelligence and neural network control systems (AINNCS) for continuous and hybrid systems plants

    Get PDF
    This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with artificial intelligence (AI) components. Specifically, machine learning (ML) components in cyber-physical systems (CPS), such as feedforward neural networks used as feedback controllers in closed-loop systems are considered, which is a class of systems classically known as intelligent control systems, or in more modern and specific terms, neural network control systems (NNCS). We more broadly refer to this category as AI and NNCS (AINNCS). The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2022. In the fourth edition of this AINNCS category at ARCH-COMP, four tools have been applied to solve 10 different benchmark problems. There are two new participants: CORA and POLAR, and two previous participants: JuliaReach and NNV. The goal of this report is to be a snapshot of the current landscape of tools and the types of benchmarks for which these tools are suited. The results of this iteration significantly outperform those of any previous year, demonstrating the continuous advancement of this community in the past decade.</jats:p
    corecore