Search CORE

6 research outputs found

Modeling and adaptive control of indoor unmanned aerial vehicles

Author: Michini Bernard (Bernard J.)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 91-94).The operation of unmanned aerial vehicles (UAVs) in constrained indoor environments presents many unique challenges in control and planning. This thesis investigates modeling, adaptive control and trajectory optimization methods as applied to indoor autonomous flight vehicles in both a theoretical and experimental context. Three types of small-scale UAVs, including a custom-built three-wing tailsitter, are combined with a motion capture system and ground computer network to form a testbed capable of indoor autonomous flight. An L1 adaptive output feedback control design process is presented in which control parameters are systematically determined based on intuitive desired performance and robustness metrics set by the designer. Flight test results using a quadrotor helicopter demonstrate that designer specifications correspond to the expected physical responses. Multi-input multi-output (MIMO) L1 adaptive control is applied to a three-wing tailsitter. An inner-loop body rate adaptation structure is used to bypass the non-linearities of the closed-loop system, producing an adaptive architecture that is invariant to the choice of baseline controller. Simulations and flight experiments confirm that the MIMO adaptive augmentation effectively recovers nominal reference performance of the vehicle in the presence of substantial physical actuator failures. A method for developing a low-fidelity model of propeller-driven UAVs is presented and compared to data collected from flight hardware.(cont.) The method is used to derive a model of a fixed-wing aerobatic aircraft which is then used by a Gauss pseudospectral optimization tool to find dynamically feasible trajectories for specified flight maneuvers. Several trajectories are generated and implemented on flight hardware to experimentally validate both the modeling and trajectory generation methods.by Bernard Michini.S.M

DSpace@MIT

Bayesian nonparametric reward learning from demonstration

Author: Michini Bernard (Bernard J.)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 123-132).Learning from demonstration provides an attractive solution to the problem of teaching autonomous systems how to perform complex tasks. Demonstration opens autonomy development to non-experts and is an intuitive means of communication for humans, who naturally use demonstration to teach others. This thesis focuses on a specific form of learning from demonstration, namely inverse reinforcement learning, whereby the reward of the demonstrator is inferred. Formally, inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given knowledge of the transition function and a set of observed demonstrations. While reward learning is a promising method of inferring a rich and transferable representation of the demonstrator's intents, current algorithms suffer from intractability and inefficiency in large, real-world domains. This thesis presents a reward learning framework that infers multiple reward functions from a single, unsegmented demonstration, provides several key approximations which enable scalability to large real-world domains, and generalizes to fully continuous demonstration domains without the need for discretization of the state space, all of which are not handled by previous methods. In the thesis, modifications are proposed to an existing Bayesian IRL algorithm to improve its efficiency and tractability in situations where the state space is large and the demonstrations span only a small portion of it. A modified algorithm is presented and simulation results show substantially faster convergence while maintaining the solution quality of the original method. Even with the proposed efficiency improvements, a key limitation of Bayesian IRL (and most current IRL methods) is the assumption that the demonstrator is maximizing a single reward function. This presents problems when dealing with unsegmented demonstrations containing multiple distinct tasks, common in robot learning from demonstration (e.g. in large tasks that may require multiple subtasks to complete). A key contribution of this thesis is the development of a method that learns multiple reward functions from a single demonstration. The proposed method, termed Bayesian nonparametric inverse reinforcement learning (BNIRL), uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Simulation results demonstrate the ability of BNIRL to handle cyclic tasks that break existing algorithms due to the existence of multiple subgoal rewards in the demonstration. The BNIRL algorithm is easily parallelized, and several approximations to the demonstrator likelihood function are offered to further improve computational tractability in large domains. Since BNIRL is only applicable to discrete domains, the Bayesian nonparametric reward learning framework is extended to general continuous demonstration domains using Gaussian process reward representations. The resulting algorithm, termed Gaussian process subgoal reward learning (GPSRL), is the only learning from demonstration method that is able to learn multiple reward functions from unsegmented demonstration in general continuous domains. GPSRL does not require discretization of the continuous state space and focuses computation efficiently around the demonstration itself. Learned subgoal rewards are cast as Markov decision process options to enable execution of the learned behaviors by the robotic system and provide a principled basis for future learning and skill refinement. Experiments conducted in the MIT RAVEN indoor test facility demonstrate the ability of both BNIRL and GPSRL to learn challenging maneuvers from demonstration on a quadrotor helicopter and a remote-controlled car.by Bernard J. Michini.Ph. D

DSpace@MIT

Improving the efficiency of Bayesian inverse reinforcement learning

Author: How Jonathan P.
Michini Bernard J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2012
Field of study

Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given knowledge of the transition function and a set of expert demonstrations. While many IRL algorithms exist, Bayesian IRL [1] provides a general and principled method of reward learning by casting the problem in the Bayesian inference framework. However, the algorithm as originally presented suffers from several inefficiencies that prohibit its use for even moderate problem sizes. This paper proposes modifications to the original Bayesian IRL algorithm to improve its efficiency and tractability in situations where the state space is large and the expert demonstrations span only a small portion of it. The key insight is that the inference task should be focused on states that are similar to those encountered by the expert, as opposed to making the naive assumption that the expert demonstrations contain enough information to accurately infer the reward function over the entire state space. A modified algorithm is presented and experimental results show substantially faster convergence while maintaining the solution quality of the original method.United States. Office of Naval Research (Science of Autonomy Program Contract N000140910625)

CiteSeerX

DSpace@MIT

Crossref

Lightweight infrared sensing for relative navigation of quadrotors

Author: Cutler Mark Johnson
How Jonathan P.
Michini Bernard J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

A lightweight solution for estimating position and velocity relative to a known marker is presented. The marker consists of three infrared (IR) LEDs in a fixed pattern. Using an IR camera with a 100 Hz update rate, the range and bearing to the marker are calculated. This information is then fused with inertial sensor information to produce state estimates at 1 kHz using a sigma point Kalman filter. The computation takes place on a 14 gram custom autopilot, yielding a lightweight system for generating high-rate relative state information. The estimation scheme is compared to data recorded with a motion capture system.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant No. 0645960

CiteSeerX

DSpace@MIT

Crossref

L1 Adaptive Control for Indoor Autonomous Vehicles: Design Process and Flight Testing

Author: How Jonathan P.
Michini Bernard J.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/08/2009
Field of study

Adaptive control techniques have the potential to address many of the special performance and robustness requirements of flight control for unmanned aerial vehicles. L[subscript 1] adaptive control offers potential benefits in terms of performance and robustness. An L[subscript 1] adaptive output feedback control design process is presented here in which control parameters are systematically determined based on intuitive desired performance and robustness metrics set by the designer. Flight test results verify the process for an indoor autonomous quadrotor helicopter, demonstrating that designer specifications correspond to the expected physical responses. In flight tests comparing it with the baseline linear controller, the augmented adaptive system shows definite performance and robustness improvements conforming the potential of L[subscript 1] adaptive control as a useful tool for autonomous aircraft.United States. Air Force Office of Scientific Research (Grant FA9550-08-1-0086

Crossref

DSpace@MIT

Automated Battery Swap and Recharge to Enable Persistent UAV Missions

Author: How Jonathan P.
Michini Bernard J.
Michini Matthew
Redding Joshua
Toksoz Tuna
Vavrina Matthew
Vian John
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/03/2011
Field of study

This paper introduces a hardware platform for automated battery changing and charging for multiple UAV agents. The automated station holds a bu er of 8 batteries in a novel dual-drum structure that enables a "hot" battery swap, thus allowing the vehicle to remain powered on throughout the battery changing process. Each drum consists of four battery bays, each of which is connected to a smart-charger for proper battery maintenance and charging. The hot-swap capability in combination with local recharging and a large 8-battery capacity allow this platform to refuel multiple UAVs for long-duration and persistent missions with minimal delays and no vehicle shutdowns. Experimental results from the RAVEN indoor flight test facility are presented that demonstrate the capability and robustness of the battery change/charge station in the context of a multi-agent, persistent mission where surveillance is continuously required over a speci ed region.Boeing Scientific Research LaboratoriesUnited States. Air Force Office of Scientific Research (FA9550-09-1-0522

Crossref

DSpace@MIT