26 research outputs found
Safe Controller Optimization for Quadrotors with Gaussian Processes
One of the most fundamental problems when designing controllers for dynamic
systems is the tuning of the controller parameters. Typically, a model of the
system is used to obtain an initial controller, but ultimately the controller
parameters must be tuned manually on the real system to achieve the best
performance. To avoid this manual tuning step, methods from machine learning,
such as Bayesian optimization, have been used. However, as these methods
evaluate different controller parameters on the real system, safety-critical
system failures may happen. In this paper, we overcome this problem by
applying, for the first time, a recently developed safe optimization algorithm,
SafeOpt, to the problem of automatic controller parameter tuning. Given an
initial, low-performance controller, SafeOpt automatically optimizes the
parameters of a control law while guaranteeing safety. It models the underlying
performance measure as a Gaussian process and only explores new controller
parameters whose performance lies above a safe performance threshold with high
probability. Experimental results on a quadrotor vehicle indicate that the
proposed method enables fast, automatic, and safe optimization of controller
parameters without human intervention.Comment: IEEE International Conference on Robotics and Automation, 2016. 6
pages, 4 figures. A video of the experiments can be found at
http://tiny.cc/icra16_video . A Python implementation of the algorithm is
available at https://github.com/befelix/SafeOp
No-Regret Bayesian Optimization with Unknown Hyperparameters
Bayesian optimization (BO) based on Gaussian process models is a powerful
paradigm to optimize black-box functions that are expensive to evaluate. While
several BO algorithms provably converge to the global optimum of the unknown
function, they assume that the hyperparameters of the kernel are known in
advance. This is not the case in practice and misspecification often causes
these algorithms to converge to poor local optima. In this paper, we present
the first BO algorithm that is provably no-regret and converges to the optimum
without knowledge of the hyperparameters. During optimization we slowly adapt
the hyperparameters of stationary kernels and thereby expand the associated
function class over time, so that the BO algorithm considers more complex
function candidates. Based on the theoretical insights, we propose several
practical algorithms that achieve the empirical sample efficiency of BO with
online hyperparameter estimation, but retain theoretical convergence
guarantees. We evaluate our method on several benchmark problems
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning
A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or
distribution shift, between the dataset and the distribution over states and
actions visited by the learned policy. This problem is exacerbated in the fully
offline setting. The main approach to correct this shift has been through
importance sampling, which leads to high-variance gradients. Other approaches,
such as conservatism or behavior-regularization, regularize the policy at the
cost of performance. In this paper, we propose a new approach for stable
off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is
a novel actor-critic algorithm that simultaneously reweights off-policy samples
and constrains the policy to prevent divergence and reduce value-approximation
error. In our experiments, POP-QL not only shows competitive performance on
standard benchmarks, but also out-performs competing methods in tasks where the
data-collection policy is significantly sub-optimal.Comment: 10 page
Safe and Robust Learning Control with Gaussian Processes
Abstract-This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle