30 research outputs found

    Enhancing Safe Exploration Using Safety State Augmentation

    Get PDF
    Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations - a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that “simmering” a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints

    Diagnosing and Preventing Instabilities in Recurrent Video Processing.

    Get PDF
    Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stablility in recurrent video processing models without a significant performance loss

    Reinforcement Learning in Presence of Discrete Markovian Context Evolution

    Get PDF
    We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian model-based approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for infinite Markov chain modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows inferring the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures

    On the existence of block-diagonal solutions to Lyapunov and H∞ Riccati inequalities

    No full text
    In this note, we describe sufficient conditions when block-diagonal solutions to Lyapunov and H∞ Riccati inequalities exist. In order to derive our results, we define a new type of comparison systems, which are positive and are computed using the statespace matrices of the original (possibly nonpositive) systems. Computing the comparison system involves only the calculation of H∞ norms of its subsystems. We show that the stability of this comparison system implies the existence of block-diagonal solutions to Lyapunov and Riccati inequalities. Furthermore, our proof is constructive and the overall framework allows the computation of block-diagonal solutions to these matrix inequalities with linear algebra and linear programming. Numerical examples illustrate our theoretical results

    Block factor-width-two matrices and their applications to semidefinite and sum-of-squares optimization

    No full text
    Semidefinite and sum-of-squares (SOS) optimization are fundamental computational tools in many areas, including linear and nonlinear systems theory. However, the scale of problems that can be addressed reliably and efficiently is still limited. In this paper, we introduce a new notion of block factor-width-two matrices and build a new hierarchy of inner and outer approximations of the cone of positive semidefinite (PSD) matrices. This notion is a block extension of the standard factor-width-two matrices, and allows for an improved inner-approximation of the PSD cone. In the context of SOS optimization, this leads to a block extension of the scaled diagonally dominant sum-of-squares (SDSOS) polynomials. By varying a matrix partition, the notion of block factor-width-two matrices can balance a trade-off between the computation scalability and solution quality for solving semidefinite and SOS optimization problems. Numerical experiments on a range of large-scale instances confirm our theoretical findings

    Block-diagonal solutions to Lyapunov inequalities and generalisations of diagonal dominance

    No full text
    Diagonally dominant matrices have many applications in systems and control theory. Linear dynamical systems with scaled diagonally dominant drift matrices, which include stable positive systems, allow for scalable stability analysis. For example, it is known that Lyapunov inequalities for this class of systems admit diagonal solutions. In this paper, we present an extension of scaled diagonally dominance to block partitioned matrices. We show that our definition describes matrices admitting block-diagonal solutions to Lyapunov inequalities and that these solutions can be computed using linear algebraic tools. We also show how in some cases the Lyapunov inequalities can be decoupled into a set of lower dimensional linear matrix inequalities, thus leading to improved scalability. We conclude by illustrating some advantages and limitations of our results with numerical examples

    Block-diagonal solutions to Lyapunov inequalities and generalisations of diagonal dominance

    No full text
    Diagonally dominant matrices have many applications in systems and control theory. Linear dynamical systems with scaled diagonally dominant drift matrices, which include stable positive systems, allow for scalable stability analysis. For example, it is known that Lyapunov inequalities for this class of systems admit diagonal solutions. In this paper, we present an extension of scaled diagonally dominance to block partitioned matrices. We show that our definition describes matrices admitting block-diagonal solutions to Lyapunov inequalities and that these solutions can be computed using linear algebraic tools. We also show how in some cases the Lyapunov inequalities can be decoupled into a set of lower dimensional linear matrix inequalities, thus leading to improved scalability. We conclude by illustrating some advantages and limitations of our results with numerical examples

    Toggling a Genetic Switch Using Reinforcement Learning

    No full text
    In this paper, we consider the problem of optimal exogenous control of gene regulatory networks. Our approach consists in adapting an established reinforcement learning algorithm called the fitted Q iteration. This algorithm infers the control law directly from the measurements of the systems response to external control inputs without the use of a mathematical model of the system. The measurement data set can either be collected from wet-lab experiments or artificially created by computer simulations of dynamical models of the system. The algorithm is applicable to a wide range of biological systems due to its ability to deal with nonlinear and stochastic system dynamics. To illustrate the application of the algorithm to a gene regulatory network, the regulation of the toggle switch system is considered. The control objective of this problem is to drive the concentrations of two specific proteins to a target region in the state space
    corecore