167 research outputs found

    Adapt-to-learn policy transfer in reinforcement learning and deep model reference adaptive control

    Get PDF
    Adaptation and Learning from exploration have been a key in biological learning; Humans and animals do not learn every task in isolation; rather are able to quickly adapt the learned behaviors between similar tasks and learn new skills when presented with new situations. Inspired by this, adaptation has been an important direction of research in control as Adaptive Controllers. However, the Adaptive Controllers like Model Reference Adaptive Controller are mainly model-based controllers and do not rely on exploration instead make informed decisions exploiting the model's structure. Therefore such controllers are characterized by high sample efficiency and stability conditions and, therefore, suitable for safety-critical systems. On the other hand, we have Learning-based optimal control algorithms like Reinforcement Learning. Reinforcement learning is a trial and error method, where an agent explores the environment by taking random action and maximizing the likelihood of those particular actions that result in a higher return. However, these exploration techniques are expected to fail many times before exploring optimal policy. Therefore, they are highly sample-expensive and lack stability guarantees and hence not suitable for safety-critical systems. This thesis presents control algorithms for robotics where the best of both worlds that is ``Adaptation'' and ``Learning from exploration'' are brought together to propose new algorithms that can perform better than their conventional counterparts. In this effort, we first present an Adapt to learn policy transfer Algorithm, where we use control theoretical ideas of adaptation to transfer policy between two related but different tasks using the policy gradient method of reinforcement learning. Efficient and robust policy transfer remains a key challenge in reinforcement learning. Policy transfer through warm initialization, imitation, or interacting over a large set of agents with randomized instances, have been commonly applied to solve a variety of Reinforcement Learning (RL) tasks. However, this is far from how behavior transfer happens in the biological world: Here, we seek to answer the question: Will learning to combine adaptation reward with environmental reward lead to a more efficient transfer of policies between domains? We introduce a principled mechanism that can ``Adapt-to-Learn", which is adapt the source policy to learn to solve a target task with significant transition differences and uncertainties. Through theory and experiments, we show that our method leads to a significantly reduced sample complexity of transferring the policies between the tasks. In the second part of this thesis, information-enabled learning-based adaptive controllers like ``Gaussian Process adaptive controller using Model Reference Generative Network'' (GP-MRGeN), ``Deep Model Reference Adaptive Controller'' (DMRAC) are presented. Model reference adaptive control (MRAC) is a widely studied adaptive control methodology that aims to ensure that a nonlinear plant with significant model uncertainty behaves like a chosen reference model. MRAC methods try to adapt the system to changes by representing the system uncertainties as weighted combinations of known nonlinear functions and using weight update law that ensures that network weights are moved in the direction of minimizing the instantaneous tracking error. However, most MRAC adaptive controllers use a shallow network and only the instantaneous data for adaptation, restricting their representation capability and limiting their performance under fast-changing uncertainties and faults in the system. In this thesis, we propose a Gaussian process based adaptive controller called GP-MRGeN. We present a new approach to the online supervised training of GP models using a new architecture termed as Model Reference Generative Network (MRGeN). Our architecture is very loosely inspired by the recent success of generative neural network models. Nevertheless, our contributions ensure that the inclusion of such a model in closed-loop control does not affect the stability properties. The GP-MRGeN controller, through using a generative network, is capable of achieving higher adaptation rates without losing robustness properties of the controller, hence suitable for mitigating faults in fast-evolving systems. Further, in this thesis, we present a new neuroadaptive architecture: Deep Neural Network-based Model Reference Adaptive Control. This architecture utilizes deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning-based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties. Theoretical proofs of the controller generalizing capability over unseen data points and boundedness properties of the tracking error are also presented. Experiments with the quadrotor vehicle demonstrate the controller performance in achieving reference model tracking in the presence of significant matched uncertainties. A software+communication architecture is designed to ensure online real-time inference of the deep network on a high-bandwidth computation-limited platform to achieve these results. These results demonstrate the efficacy of deep networks for high bandwidth closed-loop attitude control of unstable and nonlinear robots operating in adverse situations. We expect that this work will benefit other closed-loop deep-learning control architectures for robotics

    Certification Considerations for Adaptive Systems

    Get PDF
    Advanced capabilities planned for the next generation of aircraft, including those that will operate within the Next Generation Air Transportation System (NextGen), will necessarily include complex new algorithms and non-traditional software elements. These aircraft will likely incorporate adaptive control algorithms that will provide enhanced safety, autonomy, and robustness during adverse conditions. Unmanned aircraft will operate alongside manned aircraft in the National Airspace (NAS), with intelligent software performing the high-level decision-making functions normally performed by human pilots. Even human-piloted aircraft will necessarily include more autonomy. However, there are serious barriers to the deployment of new capabilities, especially for those based upon software including adaptive control (AC) and artificial intelligence (AI) algorithms. Current civil aviation certification processes are based on the idea that the correct behavior of a system must be completely specified and verified prior to operation. This report by Rockwell Collins and SIFT documents our comprehensive study of the state of the art in intelligent and adaptive algorithms for the civil aviation domain, categorizing the approaches used and identifying gaps and challenges associated with certification of each approach

    Investigations into implementation of an iterative feedback tuning algorithm into microcontroller

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 73-75).Implementation of an Iterative Feedback Tuning (IFT) and Myopic Unfalsified Control (MUC) Algorithm into microcontroller is investigated in this dissertation. Motivation in carrying out this research emanates from successful results obtained in application of IFT algorithm to various physical systems since the method was originated in 1995 by Hjalmarsson [4]. The Motorola DSP56F807C microcontroller is selected for use in the investigations due to its matching characteristics with the requirements of IFT algorithm. Speed of program execution, large memory, in-built ADC & DAC and C compiler type are the key parameters qualifying for its usage. The Analog Devices ARM7024 microcontroller was chosen as an alternative to the DSP56F807C where it is not available. Myopic Unfalsified Control (MUC) is noted to be similar to IFT since it also employs ‘myopic’ gradient based steepest descent approach to parameter optimization. It is easier to implement in that its algorithm is not as complex as the IFT one, meaning that successful implementation of IFT algorithm in a microcontroller would obviously permit the implementation of MUC into microcontroller as well

    Adaptive Multi-objective Optimizing Flight Controller

    Get PDF
    The problem of synthesizing online optimal flight controllers in the presence of multiple objectives is considered. A hybrid adaptive-optimal control architecture is presented, which is suitable for implementation on systems with fast, nonlinear and uncertain dynamics subject to constraints. The problem is cast as an adaptive Multi-Objective Optimization (MO-Op) flight control problem wherein control policy is sought that attempts to optimize over multiple, sometimes conflicting objectives. A solution strategy utilizing Gaussian Process (GP)-based adaptive-optimal control is presented, in which the system uncertainties are learned with an online updated budgeted GP. The mean of the GP is used to feedback-linearize the system and reference model shaping Model Predictive Control (MPC) is utilized for optimization. To make the MO-Op problem online-realizable, a relaxation strategy that poses some objectives as adaptively updated soft constraints is proposed. The strategy is validated on a nonlinear roll dynamics model with simulated state-dependent flexible-rigid mode interaction. In order to demonstrate low probability of failure in the presence of stochastic uncertainty and state constraints, we can take advantage of chance-constrained programming in Model Predictive Control. The results for the single objective case of chance-constrained MPC is also shown to reflect the low probability of constraint violation in safety critical systems such as aircrafts. Optimizing the system over multiple objectives is only one application of the adapive-optimal controller. Another application we considered using the adaptive-optimal controller setup is to design an architecture capable of adapting to the dynamics of different aerospace platforms. This architecture brings together three key elements, MPC-based reference command shaping, Gaussian Process (GP)-based Bayesian nonparametric Model Reference Adaptive Control (MRAC) which both were used in the previous application as well, and online GP clustering over nonstationary (time-varying) GPs. The key salient feature of our architecture is that not only can it detect changes, but it uses online GP clustering to enable the controller to utilize past learning of similar models to significantly reduce learning transients. Stability of the architecture is argued theoretically and performance is validated empirically.Mechanical & Aerospace Engineerin

    Data-Driven Architecture to Increase Resilience In Multi-Agent Coordinated Missions

    Get PDF
    The rise in the use of Multi-Agent Systems (MASs) in unpredictable and changing environments has created the need for intelligent algorithms to increase their autonomy, safety and performance in the event of disturbances and threats. MASs are attractive for their flexibility, which also makes them prone to threats that may result from hardware failures (actuators, sensors, onboard computer, power source) and operational abnormal conditions (weather, GPS denied location, cyber-attacks). This dissertation presents research on a bio-inspired approach for resilience augmentation in MASs in the presence of disturbances and threats such as communication link and stealthy zero-dynamics attacks. An adaptive bio-inspired architecture is developed for distributed consensus algorithms to increase fault-tolerance in a network of multiple high-order nonlinear systems under directed fixed topologies. In similarity with the natural organisms’ ability to recognize and remember specific pathogens to generate its immunity, the immunity-based architecture consists of a Distributed Model-Reference Adaptive Control (DMRAC) with an Artificial Immune System (AIS) adaptation law integrated within a consensus protocol. Feedback linearization is used to modify the high-order nonlinear model into four decoupled linear subsystems. A stability proof of the adaptation law is conducted using Lyapunov methods and Jordan decomposition. The DMRAC is proven to be stable in the presence of external time-varying bounded disturbances and the tracking error trajectories are shown to be bounded. The effectiveness of the proposed architecture is examined through numerical simulations. The proposed controller successfully ensures that consensus is achieved among all agents while the adaptive law v simultaneously rejects the disturbances in the agent and its neighbors. The architecture also includes a health management system to detect faulty agents within the global network. Further numerical simulations successfully test and show that the Global Health Monitoring (GHM) does effectively detect faults within the network

    Indirect adaptive higher-order sliding-mode control using the certainty-equivalence principle

    Get PDF
    Seit den 50er Jahren werden große Anstrengungen unternommen, Algorithmen zu entwickeln, welche in der Lage sind Unsicherheiten und Störungen in Regelkreisen zu kompensieren. Früh wurden hierzu adaptive Verfahren, die eine kontinuierliche Anpassung der Reglerparameter vornehmen, genutzt, um die Stabilisierung zu ermöglichen. Die fortlaufende Modifikation der Parameter sorgt dabei dafür, dass strukturelle Änderungen im Systemmodell sich nicht auf die Regelgüte auswirken. Eine deutlich andere Herangehensweise wird durch strukturvariable Systeme, insbesondere die sogenannte Sliding-Mode Regelung, verfolgt. Hierbei wird ein sehr schnell schaltendes Stellsignal für die Kompensation auftretender Störungen und Modellunsicherheiten so genutzt, dass bereits ohne besonderes Vorwissen über die Störeinflüsse eine beachtliche Regelgüte erreicht werden kann. Die vorliegende Arbeit befasst sich mit dem Thema, diese beiden sehr unterschiedlichen Strategien miteinander zu verbinden und dabei die Vorteile der ursprünglichen Umsetzung zu erhalten. So benötigen Sliding-Mode Verfahren generell nur wenige Informationen über die Störung, zeigen jedoch Defizite bei Unsicherheiten, die vom Systemzustand abhängen. Auf der anderen Seite können adaptive Regelungen sehr gut parametrische Unsicherheiten kompensieren, wohingegen unmodellierte Störungen zu einer verschlechterten Regelgüte führen. Ziel dieser Arbeit ist es daher, eine kombinierte Entwurfsmethodik zu entwickeln, welche die verfügbaren Informationen über die Störeinflüsse bestmöglich ausnutzt. Hierbei wird insbesondere Wert auf einen theoretisch fundierten Stabilitätsnachweis gelegt, welcher erst durch Erkenntnisse der letzten Jahre im Bereich der Lyapunov-Theorie im Zusammenhang mit Sliding-Mode ermöglicht wurde. Anhand der gestellten Anforderungen werden Regelalgorithmen entworfen, die eine Kombination von Sliding-Mode Reglern höherer Ordnung und adaptiven Verfahren darstellen. Neben den theoretischen Betrachtungen werden die Vorteile des Verfahrens auch anhand von Simulationsbeispielen und eines Laborversuchs nachgewiesen. Es zeigt sich hierbei, dass die vorgeschlagenen Algorithmen eine Verbesserung hinsichtlich der Regelgüte als auch der Robustheit gegenüber den konventionellen Verfahren erzielen.Since the late 50s, huge efforts have been made to improve the control algorithms that are capable of compensating for uncertainties and disturbances. Adaptive controllers that adjust their parameters continuously have been used from the beginning to solve this task. This adaptation of the controller allows to maintain a constant performance even under changing conditions. A different idea is proposed by variable structure systems, in particular by the so-called sliding-mode control. The idea is to employ a very fast switching signal to compensate for disturbances or uncertainties. This thesis deals with a combination of these two rather different approaches while preserving the advantages of each method. The design of a sliding-mode controller normally does not demand sophisticated knowledge about the disturbance, while the controller's robustness against state-dependent uncertainties might be poor. On the other hand, adaptive controllers are well suited to compensate for parametric uncertainties while unstructured influence may result in a degraded performance. Hence, the objective of this work is to design sliding-mode controllers that use as much information about the uncertainty as possible and exploit this knowledge in the design. An important point is that the design procedure is based on a rigorous proof of the stability of the combined approach. Only recent results on Lyapunov theory in the field of sliding-mode made this analysis possible. It is shown that the Lyapunov function of the nominal sliding-mode controller has a direct impact on the adaptation law. Therefore, this Lyapunov function has to meet certain conditions in order to allow a proper implementation of the proposed algorithms. The main contributions of this thesis are sliding-mode controllers, extended by an adaptive part using the certainty-equivalence principle. It is shown that the combination of both approaches results in a novel controller design that is able to solve a control task even in the presence of different classes of uncertainties. In addition to the theoretical analysis, the advantages of the proposed method are demonstrated in a selection of simulation examples and on a laboratory test-bench. The experiments show that the proposed control algorithm delivers better performance in regard to chattering and robustness compared to classical sliding-mode controllers

    Adaptive Augmentation of Non-Minimum Phase Flexible Aerospace Systems

    Get PDF
    This work demonstrates the efficacy of direct adaptive augmentation on a robotic flexible system as an analogue of a large flexible aerospace structure such as a launch vehicle or aircraft. To that end, a robot was constructed as a control system testbed. This robot, named “Penny,” contains the command and data acquisition capabilities necessary to influence and record system state data, including the flex states of its flexible structures. This robot was tested in two configurations, one with a vertically cantilevered flexible beam, and one with a flexible inverted pendulum (a flexible cart-pole system). The physical system was then characterized so that linear analysis and control design could be performed. These characterizations resulted in linear and nonlinear models developed for each testing configuration. The linear models were used to design linear controllers to regulate the nominal plant’s dynamical states. These controllers were then augmented with direct adaptive output regulation and disturbance accommodation. To accomplish this, sensor blending was used to shape the output such that the nonminimum phase open loop plant appears to be minimum phase to the controller. It was subsequently shown that augmenting linear controllers with direct adaptive output regulation and disturbance accommodation was effective in enhancing system performance and mitigating oscillation in the flexible structures through the system’s own actuation effort
    corecore