21,009 research outputs found
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
Learning-based control algorithms require data collection with abundant
supervision for training. Safe exploration algorithms ensure the safety of this
data collection process even when only partial knowledge is available. We
present a new approach for optimal motion planning with safe exploration that
integrates chance-constrained stochastic optimal control with dynamics learning
and feedback control. We derive an iterative convex optimization algorithm that
solves an \underline{Info}rmation-cost \underline{S}tochastic
\underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem
(Info-SNOC). The optimization objective encodes both optimal performance and
exploration for learning, and the safety is incorporated as distributionally
robust chance constraints. The dynamics are predicted from a robust regression
model that is learned from data. The Info-SNOC algorithm is used to compute a
sub-optimal pool of safe motion plans that aid in exploration for learning
unknown residual dynamics under safety constraints. A stable feedback
controller is used to execute the motion plan and collect data for model
learning. We prove the safety of rollout from our exploration method and
reduction in uncertainty over epochs, thereby guaranteeing the consistency of
our learning method. We validate the effectiveness of Info-SNOC by designing
and implementing a pool of safe trajectories for a planar robot. We demonstrate
that our approach has higher success rate in ensuring safety when compared to a
deterministic trajectory optimization approach.Comment: Submitted to RA-L 2020, review-
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However,
the majority of autonomous RL algorithms require a large number of interactions
with the environment. A large number of interactions may be impractical in many
real-world applications, such as robotics, and many practical systems have to
obey limitations in the form of state space or control constraints. To reduce
the number of system interactions while simultaneously handling constraints, we
propose a model-based RL framework based on probabilistic Model Predictive
Control (MPC). In particular, we propose to learn a probabilistic transition
model using Gaussian Processes (GPs) to incorporate model uncertainty into
long-term predictions, thereby, reducing the impact of model errors. We then
use MPC to find a control sequence that minimises the expected long-term cost.
We provide theoretical guarantees for first-order optimality in the GP-based
transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach does not only achieve
state-of-the-art data efficiency, but also is a principled way for RL in
constrained environments.Comment: Accepted at AISTATS 2018
Sparse Wide-Area Control of Power Systems using Data-driven Reinforcement Learning
In this paper we present an online wide-area oscillation damping control
(WAC) design for uncertain models of power systems using ideas from
reinforcement learning. We assume that the exact small-signal model of the
power system at the onset of a contingency is not known to the operator and use
the nominal model and online measurements of the generator states and control
inputs to rapidly converge to a state-feedback controller that minimizes a
given quadratic energy cost. However, unlike conventional linear quadratic
regulators (LQR), we intend our controller to be sparse, so its implementation
reduces the communication costs. We, therefore, employ the gradient support
pursuit (GraSP) optimization algorithm to impose sparsity constraints on the
control gain matrix during learning. The sparse controller is thereafter
implemented using distributed communication. Using the IEEE 39-bus power system
model with 1149 unknown parameters, it is demonstrated that the proposed
learning method provides reliable LQR performance while the controller matched
to the nominal model becomes unstable for severely uncertain systems.Comment: Submitted to IEEE ACC 2019. 8 pages, 4 figure
Design of interpolative sigma delta modulators via a semi- infinite programming approach
This paper considers the design of interpolative sigma delta modulators (SDMs). The design problem is formulated as two different optimization problems. The first optimization problem is to determine the denominator coefficients. The objective of the optimization problem is to minimize the energy of the error function in the passband of the loop filter in which the error function reflects the noise output transfer function and the ripple of the input output transfer function. The constraint of the optimization problem refers to the specification of the error function defined in the frequency domain. The second optimization problem is to determine the numerator coefficients in which the cost function is to minimize the stopband ripple energy of the loop filter subject to the stability condition of the noise output and input output transfer functions. These two optimization problems are actually quadratic semi-infinite programming (SIP) problems. By employing our recently proposed dual parameterization method for solving the problems, global optimal solutions that satisfy the corresponding continuous constraint are guaranteed if the solutions exist. The advantages of this formulation are the guarantee of the stability of the noise output and input output transfer functions, applicability to design rational IIR filters without imposing specific filter structures such as Laguerre filter and Butterworth filter structures, and the avoidance of the iterative design of numerator and the denominator coefficients because the convergence of the iterative design is not guaranteed. Our simulation results show that this proposed design yields a significant improvement in the signal-to-noise ratio (SNR) compared to the existing designs
Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering
Let Q be a given nĂn square symmetric matrix of nonnegative elements between 0 and 1, similarities. Fuzzy clustering results in fuzzy assignment of individuals to K clusters. In additive fuzzy clustering, the nĂK fuzzy memberships matrix P is found by least-squares approximation of the off-diagonal elements of Q by inner products of rows of P. By contrast, kernelized fuzzy c-means is not least-squares and requires an additional fuzziness parameter. The aim is to popularize additive fuzzy clustering by interpreting it as a latent class model, whereby the elements of Q are modeled as the probability that two individuals share the same class on the basis of the assignment probability matrix P. Two new algorithms are provided, a brute force genetic algorithm (differential evolution) and an iterative row-wise quadratic programming algorithm of which the latter is the more effective. Simulations showed that (1) the method usually has a unique solution, except in special cases, (2) both algorithms reached this solution from random restarts and (3) the number of clusters can be well estimated by AIC. Additive fuzzy clustering is computationally efficient and combines attractive features of both the vector model and the cluster mode
- âŠ