39 research outputs found
Online Optimization with Memory and Competitive Control
This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous p decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems
Meta-Learning-Based Robust Adaptive Flight Control Under Uncertain Wind Conditions
Realtime model learning proves challenging for complex dynamical systems,
such as drones flying in variable wind conditions. Machine learning technique
such as deep neural networks have high representation power but is often too
slow to update onboard. On the other hand, adaptive control relies on simple
linear parameter models can update as fast as the feedback control loop. We
propose an online composite adaptation method that treats outputs from a deep
neural network as a set of basis functions capable of representing different
wind conditions. To help with training, meta-learning techniques are used to
optimize the network output useful for adaptation. We validate our approach by
flying a drone in an open air wind tunnel under varying wind conditions and
along challenging trajectories. We compare the result with other adaptive
controller with different basis function sets and show improvement over
tracking and prediction errors.Comment: This article presents preliminary results and will be update
Neural-Swarm: Decentralized Close-Proximity Multirotor Control Using Learned Interactions
In this paper, we present Neural-Swarm, a nonlinear decentralized stable controller for close-proximity flight of multirotor swarms. Close-proximity control is challenging due to the complex aerodynamic interaction effects between multirotors, such as downwash from higher vehicles to lower ones. Conventional methods often fail to properly capture these interaction effects, resulting in controllers that must maintain large safety distances between vehicles, and thus are not capable of close-proximity flight. Our approach combines a nominal dynamics model with a regularized permutation-invariant Deep Neural Network (DNN) that accurately learns the high-order multi-vehicle interactions. We design a stable nonlinear tracking controller using the learned model. Experimental results demonstrate that the proposed controller significantly outperforms a baseline nonlinear tracking controller with up to four times smaller worst-case height tracking errors. We also empirically demonstrate the ability of our learned model to generalize to larger swarm sizes
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
Learning-based control algorithms require data collection with abundant
supervision for training. Safe exploration algorithms ensure the safety of this
data collection process even when only partial knowledge is available. We
present a new approach for optimal motion planning with safe exploration that
integrates chance-constrained stochastic optimal control with dynamics learning
and feedback control. We derive an iterative convex optimization algorithm that
solves an \underline{Info}rmation-cost \underline{S}tochastic
\underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem
(Info-SNOC). The optimization objective encodes both optimal performance and
exploration for learning, and the safety is incorporated as distributionally
robust chance constraints. The dynamics are predicted from a robust regression
model that is learned from data. The Info-SNOC algorithm is used to compute a
sub-optimal pool of safe motion plans that aid in exploration for learning
unknown residual dynamics under safety constraints. A stable feedback
controller is used to execute the motion plan and collect data for model
learning. We prove the safety of rollout from our exploration method and
reduction in uncertainty over epochs, thereby guaranteeing the consistency of
our learning method. We validate the effectiveness of Info-SNOC by designing
and implementing a pool of safe trajectories for a planar robot. We demonstrate
that our approach has higher success rate in ensuring safety when compared to a
deterministic trajectory optimization approach.Comment: Submitted to RA-L 2020, review-
Safe Deep Policy Adaptation
A critical goal of autonomy and artificial intelligence is enabling
autonomous robots to rapidly adapt in dynamic and uncertain environments.
Classic adaptive control and safe control provide stability and safety
guarantees but are limited to specific system classes. In contrast, policy
adaptation based on reinforcement learning (RL) offers versatility and
generalizability but presents safety and robustness challenges. We propose
SafeDPA, a novel RL and control framework that simultaneously tackles the
problems of policy adaptation and safe reinforcement learning. SafeDPA jointly
learns adaptive policy and dynamics models in simulation, predicts environment
configurations, and fine-tunes dynamics models with few-shot real-world data. A
safety filter based on the Control Barrier Function (CBF) on top of the RL
policy is introduced to ensure safety during real-world deployment. We provide
theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA
against learning errors and extra perturbations. Comprehensive experiments on
(1) classic control problems (Inverted Pendulum), (2) simulation benchmarks
(Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate
great superiority of SafeDPA in both safety and task performance, over
state-of-the-art baselines. Particularly, SafeDPA demonstrates notable
generalizability, achieving a 300% increase in safety rate compared to the
baselines, under unseen disturbances in real-world experiments.Comment: 8 pages, 7 figure
Neural-Swarm2: Planning and Control of Heterogeneous Multirotor Swarms using Learned Interactions
We present Neural-Swarm2, a learning-based method for motion planning and control that allows heterogeneous multirotors in a swarm to safely fly in close proximity. Such operation for drones is challenging due to complex aerodynamic interaction forces, such as downwash generated by nearby drones and ground effect. Conventional planning and control methods neglect capturing these interaction forces, resulting in sparse swarm configuration during flight. Our approach combines a physics-based nominal dynamics model with learned Deep Neural Networks (DNNs) with strong Lipschitz properties. We evolve two techniques to accurately predict the aerodynamic interactions between heterogeneous multirotors: i) spectral normalization for stability and generalization guarantees of unseen data and ii) heterogeneous deep sets for supporting any number of heterogeneous neighbors in a permutation-invariant manner without reducing expressiveness. The learned residual dynamics benefit both the proposed interaction-aware multi-robot motion planning and the nonlinear tracking control designs because the learned interaction forces reduce the modelling errors. Experimental results demonstrate that Neural-Swarm2 is able to generalize to larger swarms beyond training cases and significantly outperforms a baseline nonlinear tracking controller with up to three times reduction in worst-case tracking errors
Car-following method based on inverse reinforcement learning for autonomous vehicle decision-making
There are still some problems need to be solved though there are a lot of achievements in the fields of automatic driving. One of those problems is the difficulty of designing a car-following decision-making system for complex traffic conditions. In recent years, reinforcement learning shows the potential in solving sequential decision optimization problems. In this article, we establish the reward function R of each driver data based on the inverse reinforcement learning algorithm, and r visualization is carried out, and then driving characteristics and following strategies are analyzed. At last, we show the efficiency of the proposed method by simulation in a highway environment