863 research outputs found
Learning an Approximate Model Predictive Controller with Guarantees
A supervised learning framework is proposed to approximate a model predictive
controller (MPC) with reduced computational complexity and guarantees on
stability and constraint satisfaction. The framework can be used for a wide
class of nonlinear systems. Any standard supervised learning technique (e.g.
neural networks) can be employed to approximate the MPC from samples. In order
to obtain closed-loop guarantees for the learned MPC, a robust MPC design is
combined with statistical learning bounds. The MPC design ensures robustness to
inaccurate inputs within given bounds, and Hoeffding's Inequality is used to
validate that the learned MPC satisfies these bounds with high confidence. The
result is a closed-loop statistical guarantee on stability and constraint
satisfaction for the learned MPC. The proposed learning-based MPC framework is
illustrated on a nonlinear benchmark problem, for which we learn a neural
network controller with guarantees.Comment: 6 pages, 3 figures, to appear in IEEE Control Systems Letter
Unconstrained receding-horizon control of nonlinear systems
It is well known that unconstrained infinite-horizon optimal control may be used to construct a stabilizing controller for a nonlinear system. We show that similar stabilization results may be achieved using unconstrained finite horizon optimal control. The key idea is to approximate the tail of the infinite horizon cost-to-go using, as terminal cost, an appropriate control Lyapunov function. Roughly speaking, the terminal control Lyapunov function (CLF) should provide an (incremental) upper bound on the cost. In this fashion, important stability characteristics may be retained without the use of terminal constraints such as those employed by a number of other researchers. The absence of constraints allows a significant speedup in computation. Furthermore, it is shown that in order to guarantee stability, it suffices to satisfy an improvement property, thereby relaxing the requirement that truly optimal trajectories be found. We provide a complete analysis of the stability and region of attraction/operation properties of receding horizon control strategies that utilize finite horizon approximations in the proposed class. It is shown that the guaranteed region of operation contains that of the CLF controller and may be made as large as desired by increasing the optimization horizon (restricted, of course, to the infinite horizon domain). Moreover, it is easily seen that both CLF and infinite-horizon optimal control approaches are limiting cases of our receding horizon strategy. The key results are illustrated using a familiar example, the inverted pendulum, where significant improvements in guaranteed region of operation and cost are noted
Stochastic Model Predictive Control via Fixed Structure Policies
In this work, the model predictive control problem is extended to include not only open-loop control sequences but also state-feedback control laws by directly optimizing parameters of a control policy. Additionally, continuous cost functions are developed to allow training of the control policy in making discrete decisions, which is typically done with model-free learning algorithms. This general control policy encompasses a wide class of functions and allows the optimization to occur both online and offline while adding robustness to unmodelled dynamics and outside disturbances. General formulations regarding nonlinear discrete-time dynamics and abstract cost functions are formed for both deterministic and stochastic problems. Analytical solutions are derived for linear cases and compared to existing theory, such as the classical linear quadratic regulator. It is shown that, given some assumptions hold, there exists a finite horizon in which a constant linear state-feedback control law will stabilize a nonlinear system around the origin. Several control policy architectures are used to regulate the cart-pole system in deterministic and stochastic settings, and neural network-based policies are trained to analyze and intercept bodies following stochastic projectile motion
Learning a Structured Neural Network Policy for a Hopping Task
In this work we present a method for learning a reactive policy for a simple
dynamic locomotion task involving hard impact and switching contacts where we
assume the contact location and contact timing to be unknown. To learn such a
policy, we use optimal control to optimize a local controller for a fixed
environment and contacts. We learn the contact-rich dynamics for our
underactuated systems along these trajectories in a sample efficient manner. We
use the optimized policies to learn the reactive policy in form of a neural
network. Using a new neural network architecture, we are able to preserve more
information from the local policy and make its output interpretable in the
sense that its output in terms of desired trajectories, feedforward commands
and gains can be interpreted. Extensive simulations demonstrate the robustness
of the approach to changing environments, outperforming a model-free gradient
policy based methods on the same tasks in simulation. Finally, we show that the
learned policy can be robustly transferred on a real robot.Comment: IEEE Robotics and Automation Letters 201
- …