21,009 research outputs found

    Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems

    Get PDF
    Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an \underline{Info}rmation-cost \underline{S}tochastic \underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem (Info-SNOC). The optimization objective encodes both optimal performance and exploration for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach.Comment: Submitted to RA-L 2020, review-

    Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

    Full text link
    Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.Comment: Accepted at AISTATS 2018

    Sparse Wide-Area Control of Power Systems using Data-driven Reinforcement Learning

    Full text link
    In this paper we present an online wide-area oscillation damping control (WAC) design for uncertain models of power systems using ideas from reinforcement learning. We assume that the exact small-signal model of the power system at the onset of a contingency is not known to the operator and use the nominal model and online measurements of the generator states and control inputs to rapidly converge to a state-feedback controller that minimizes a given quadratic energy cost. However, unlike conventional linear quadratic regulators (LQR), we intend our controller to be sparse, so its implementation reduces the communication costs. We, therefore, employ the gradient support pursuit (GraSP) optimization algorithm to impose sparsity constraints on the control gain matrix during learning. The sparse controller is thereafter implemented using distributed communication. Using the IEEE 39-bus power system model with 1149 unknown parameters, it is demonstrated that the proposed learning method provides reliable LQR performance while the controller matched to the nominal model becomes unstable for severely uncertain systems.Comment: Submitted to IEEE ACC 2019. 8 pages, 4 figure

    Design of interpolative sigma delta modulators via a semi- infinite programming approach

    Get PDF
    This paper considers the design of interpolative sigma delta modulators (SDMs). The design problem is formulated as two different optimization problems. The first optimization problem is to determine the denominator coefficients. The objective of the optimization problem is to minimize the energy of the error function in the passband of the loop filter in which the error function reflects the noise output transfer function and the ripple of the input output transfer function. The constraint of the optimization problem refers to the specification of the error function defined in the frequency domain. The second optimization problem is to determine the numerator coefficients in which the cost function is to minimize the stopband ripple energy of the loop filter subject to the stability condition of the noise output and input output transfer functions. These two optimization problems are actually quadratic semi-infinite programming (SIP) problems. By employing our recently proposed dual parameterization method for solving the problems, global optimal solutions that satisfy the corresponding continuous constraint are guaranteed if the solutions exist. The advantages of this formulation are the guarantee of the stability of the noise output and input output transfer functions, applicability to design rational IIR filters without imposing specific filter structures such as Laguerre filter and Butterworth filter structures, and the avoidance of the iterative design of numerator and the denominator coefficients because the convergence of the iterative design is not guaranteed. Our simulation results show that this proposed design yields a significant improvement in the signal-to-noise ratio (SNR) compared to the existing designs

    Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering

    Get PDF
    Let Q be a given n×n square symmetric matrix of nonnegative elements between 0 and 1, similarities. Fuzzy clustering results in fuzzy assignment of individuals to K clusters. In additive fuzzy clustering, the n×K fuzzy memberships matrix P is found by least-squares approximation of the off-diagonal elements of Q by inner products of rows of P. By contrast, kernelized fuzzy c-means is not least-squares and requires an additional fuzziness parameter. The aim is to popularize additive fuzzy clustering by interpreting it as a latent class model, whereby the elements of Q are modeled as the probability that two individuals share the same class on the basis of the assignment probability matrix P. Two new algorithms are provided, a brute force genetic algorithm (differential evolution) and an iterative row-wise quadratic programming algorithm of which the latter is the more effective. Simulations showed that (1) the method usually has a unique solution, except in special cases, (2) both algorithms reached this solution from random restarts and (3) the number of clusters can be well estimated by AIC. Additive fuzzy clustering is computationally efficient and combines attractive features of both the vector model and the cluster mode
    • 

    corecore