105 research outputs found

    Adapt-to-learn policy transfer in reinforcement learning and deep model reference adaptive control

    Get PDF
    Adaptation and Learning from exploration have been a key in biological learning; Humans and animals do not learn every task in isolation; rather are able to quickly adapt the learned behaviors between similar tasks and learn new skills when presented with new situations. Inspired by this, adaptation has been an important direction of research in control as Adaptive Controllers. However, the Adaptive Controllers like Model Reference Adaptive Controller are mainly model-based controllers and do not rely on exploration instead make informed decisions exploiting the model's structure. Therefore such controllers are characterized by high sample efficiency and stability conditions and, therefore, suitable for safety-critical systems. On the other hand, we have Learning-based optimal control algorithms like Reinforcement Learning. Reinforcement learning is a trial and error method, where an agent explores the environment by taking random action and maximizing the likelihood of those particular actions that result in a higher return. However, these exploration techniques are expected to fail many times before exploring optimal policy. Therefore, they are highly sample-expensive and lack stability guarantees and hence not suitable for safety-critical systems. This thesis presents control algorithms for robotics where the best of both worlds that is ``Adaptation'' and ``Learning from exploration'' are brought together to propose new algorithms that can perform better than their conventional counterparts. In this effort, we first present an Adapt to learn policy transfer Algorithm, where we use control theoretical ideas of adaptation to transfer policy between two related but different tasks using the policy gradient method of reinforcement learning. Efficient and robust policy transfer remains a key challenge in reinforcement learning. Policy transfer through warm initialization, imitation, or interacting over a large set of agents with randomized instances, have been commonly applied to solve a variety of Reinforcement Learning (RL) tasks. However, this is far from how behavior transfer happens in the biological world: Here, we seek to answer the question: Will learning to combine adaptation reward with environmental reward lead to a more efficient transfer of policies between domains? We introduce a principled mechanism that can ``Adapt-to-Learn", which is adapt the source policy to learn to solve a target task with significant transition differences and uncertainties. Through theory and experiments, we show that our method leads to a significantly reduced sample complexity of transferring the policies between the tasks. In the second part of this thesis, information-enabled learning-based adaptive controllers like ``Gaussian Process adaptive controller using Model Reference Generative Network'' (GP-MRGeN), ``Deep Model Reference Adaptive Controller'' (DMRAC) are presented. Model reference adaptive control (MRAC) is a widely studied adaptive control methodology that aims to ensure that a nonlinear plant with significant model uncertainty behaves like a chosen reference model. MRAC methods try to adapt the system to changes by representing the system uncertainties as weighted combinations of known nonlinear functions and using weight update law that ensures that network weights are moved in the direction of minimizing the instantaneous tracking error. However, most MRAC adaptive controllers use a shallow network and only the instantaneous data for adaptation, restricting their representation capability and limiting their performance under fast-changing uncertainties and faults in the system. In this thesis, we propose a Gaussian process based adaptive controller called GP-MRGeN. We present a new approach to the online supervised training of GP models using a new architecture termed as Model Reference Generative Network (MRGeN). Our architecture is very loosely inspired by the recent success of generative neural network models. Nevertheless, our contributions ensure that the inclusion of such a model in closed-loop control does not affect the stability properties. The GP-MRGeN controller, through using a generative network, is capable of achieving higher adaptation rates without losing robustness properties of the controller, hence suitable for mitigating faults in fast-evolving systems. Further, in this thesis, we present a new neuroadaptive architecture: Deep Neural Network-based Model Reference Adaptive Control. This architecture utilizes deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning-based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties. Theoretical proofs of the controller generalizing capability over unseen data points and boundedness properties of the tracking error are also presented. Experiments with the quadrotor vehicle demonstrate the controller performance in achieving reference model tracking in the presence of significant matched uncertainties. A software+communication architecture is designed to ensure online real-time inference of the deep network on a high-bandwidth computation-limited platform to achieve these results. These results demonstrate the efficacy of deep networks for high bandwidth closed-loop attitude control of unstable and nonlinear robots operating in adverse situations. We expect that this work will benefit other closed-loop deep-learning control architectures for robotics

    Neural Controller Utilizing Genetic Algorithm Technique For Dynamic Systems

    Get PDF
    Kajian ini mengemukakan kaedah pembelajaran rangkaian neural (NN) pelbilang lapisan dengan menggunakan teknik algoritma genetik (GA). Teknik evolusionari berasakan GA ini dikaji dan digunakan untuk skima model rujukan kawalan suai (MRAC) bagi loji-loji yang berbeza. This research presents a method of learning multilayer Neural Network (NN) using Genetic Algorithms (GAs) techniques. The evolutionary techniques based on GAs are studied and employed for the Model Reference Adaptive Control (MRAC) scheme of different plants

    Neurocontrol using dynamic learning

    Get PDF

    Certification Considerations for Adaptive Systems

    Get PDF
    Advanced capabilities planned for the next generation of aircraft, including those that will operate within the Next Generation Air Transportation System (NextGen), will necessarily include complex new algorithms and non-traditional software elements. These aircraft will likely incorporate adaptive control algorithms that will provide enhanced safety, autonomy, and robustness during adverse conditions. Unmanned aircraft will operate alongside manned aircraft in the National Airspace (NAS), with intelligent software performing the high-level decision-making functions normally performed by human pilots. Even human-piloted aircraft will necessarily include more autonomy. However, there are serious barriers to the deployment of new capabilities, especially for those based upon software including adaptive control (AC) and artificial intelligence (AI) algorithms. Current civil aviation certification processes are based on the idea that the correct behavior of a system must be completely specified and verified prior to operation. This report by Rockwell Collins and SIFT documents our comprehensive study of the state of the art in intelligent and adaptive algorithms for the civil aviation domain, categorizing the approaches used and identifying gaps and challenges associated with certification of each approach

    Adaptive nonlinear control using fuzzy logic and neural networks

    Get PDF
    The problem of adaptive nonlinear control, i.e. the control of nonlinear dynamic systems with unknown parameters, is considered. Current techniques usually assume that either the control system is linearizable or the type of nonlinearity is known. This results in poor control quality for many practical problems. Moreover, the control system design becomes too complex for a practicing engineer. The objective of this thesis is to provide a practical, systematic approach for solving the problem of identification and control of nonlinear systems with unknown parameters, when the explicit linear parametrization is either unknown or impossible. Fuzzy logic (FL) and neural networks (NNs) have proven to be the tools for universal approximation, and hence are considered. However, FL requires expert knowledge and there is a lack of systematic procedures to design NNs for control. A hybrid technique, called fuzzy logic adaptive network (FLAN), which combines the structure of an FL controller with the learning aspects of the NNs is developed. FLAN is designed such that it is capable of both structure learning and parameter learning. Gradient descent based technique is utilized for the parameter learning in FLAN, and it is tested through a variety of simulated experiments in identification and control of nonlinear systems. The results indicate the success of FLAN in terms of accuracy of estimation, speed of convergence, insensitivity against a range of initial learning rates, robustness against sudden changes in the input as well as noise in the training data. The performance of FLAN is also compared with the techniques based on FL and NNs, as well as several hybrid techniques

    Knowledge-Based Control for Robot Arm

    Get PDF

    Development of New Adaptive Control Strategies for a Two-Link Flexible Manipulator

    Get PDF
    Manipulators with thin and light weight arms or links are called as Flexible-Link Manipulators (FLMs). FLMs offer several advantages over rigid-link manipulators such as achieving highspeed operation, lower energy consumption, and increase in payload carrying capacity and find applications where manipulators are to be operated in large workspace like assembly of freeflying space structures, hazardous material management from safer distance, detection of flaws in large structure like airplane and submarines. However, designing a feedback control system for a flexible-link manipulator is challenging due the system being non-minimum phase, underactuated and non-collocated. Further difficulties are encountered when such manipulators handle unknown payloads. Overall deflection of the flexible manipulator are governed by the different vibrating modes (excited at different frequencies) present along the length of the link. Due to change in payload, the flexible modes (at higher frequencies) are excited giving rise to uncertainties in the dynamics of the FLM. To achieve effective tip trajectory tracking whilst quickly suppressing tip deflections when the FLM carries varying payloads adaptive control is necessary instead of fixed gain controller to cope up with the changing dynamics of the manipulator. Considerable research has been directed in the past to design adaptive controllers based on either linear identified model of a FLM or error signal driven intelligent supervised learning e.g. neural network, fuzzy logic and hybrid neuro-fuzzy. However, the dynamics of the FLM being nonlinear there is a scope of exploiting nonlinear modeling approach to design adaptive controllers. The objective of the thesis is to design advanced adaptive control strategies for a two-link flexible manipulator (TLFM) to control the tip trajectory tracking and its deflections while handling unknown payloads. To achieve tip trajectory control and simultaneously suppressing the tip deflection quickly when subjected to unknown payloads, first a direct adaptive control (DAC) is proposed. The proposed DAC uses a Lyapunov based nonlinear adaptive control scheme ensuring overall system stability for the control of TLFM. For the developed control laws, the stability proof of the closed-loop system is also presented. The design of this DAC involves choosing a control law with tunable TLFM parameters, and then an adaptation law is developed using the closed loop error dynamics. The performance of the developed controller is then compared with that of a fuzzy learning based adaptive controller (FLAC). The FLAC consists of three major components namely a fuzzy logic controller, a reference model and a learning mechanism. It utilizes a learning mechanism, which automatically adjusts the rule base of the fuzzy controller so that the closed loop performs according to the user defined reference model containing information of the desired behavior of the controlled system. Although the proposed DAC shows better performance compared to FLAC but it suffers from the complexity of formulating a multivariable regressor vector for the TLFM. Also, the adaptive mechanism for parameter updates of both the DAC and FLAC depend upon feedback error based supervised learning. Hence, a reinforcement learning (RL) technique is employed to derive an adaptive controller for the TLFM. The new reinforcement learning based adaptive control (RLAC) has an advantage that it attains optimal control adaptively in on-line. Also, the performance of the RLAC is compared with that of the DAC and FLAC. In the past, most of the indirect adaptive controls for a FLM are based on linear identified model. However, the considered TLFM dynamics is highly nonlinear. Hence, a nonlinear autoregressive moving average with exogenous input (NARMAX) model based new Self-Tuning Control (NMSTC) is proposed. The proposed adaptive controller uses a multivariable Proportional Integral Derivative (PID) self-tuning control strategy. The parameters of the PID are adapted online using a nonlinear autoregressive moving average with exogenous-input (NARMAX) model of the TLFM. Performance of the proposed NMSTC is compared with that of RLAC. The proposed NMSTC law suffers from over-parameterization of the controller. To overcome this a new nonlinear adaptive model predictive control using the NARMAX model of the TLFM (NMPC) developed next. For the proposed NMPC, the current control action is obtained by solving a finite horizon open loop optimal control problem on-line, at each sampling instant, using the future predicted model of the TLFM. NMPC is based on minimization of a set of predicted system errors based on available input-output data, with some constraints placed on the projected control signals resulting in an optimal control sequence. The performance of the proposed NMPC is also compared with that of the NMSTC. Performances of all the developed algorithms are assessed by numerical simulation in MATLAB/SIMULINK environment and also validated through experimental studies using a physical TLFM set-up available in Advanced Control and Robotics Research Laboratory, National Institute of Technology Rourkela. It is observed from the comparative assessment of the performances of the developed adaptive controllers that proposed NMPC exhibits superior 7performance in terms of accurate tip position tracking (steady state error ≈ 0.01°) while suppressing the tip deflections (maximum amplitude of the tip deflection ≈ 0.1 mm) when the manipulator handles variation in payload (increased payload of 0.3 kg). The adaptive control strategies proposed in this thesis can be applied to control of complex flexible space shuttle systems, long reach manipulators for hazardous waste management from safer distance and for damping of oscillations for similar vibration systems

    Data-driven Adaptive Stabilizer for Unknown Nonlinear Dynamic MIMO Systems Using a Cognition-based Framework

    Get PDF
    This thesis focuses on a cognitive stabilizer concept which is an adaptive discrete control method based on a cognition-based framework. The aim of the cognitive stabilizer is to autonomously stabilize a specific class of unknown nonlinear multi-input-multi-output (MIMO) systems. The cognitive stabilizer is able to gain useful local knowledge of the unknown system and can autonomously define suitable control inputs to stabilize the system. The development of different kinds of adaptive, data-driven, and model-free controllers shows a clear tendency towards research on control methods with high autonomy. Here the term autonomy is used to describe the fact that the control approach/the related programming is organized such that the algorithm is able to handle the feedback design autonomously without instructions from outside the algorithm. Typical methods affected by this definition are adaptive control method, data-driven control method, and model-free control method. In this thesis, the state-of-the-art of them is reviewed. The main focus is given to the autonomy of the realized approaches. It can be concluded that the existing methods still show some open points achieving highly autonomous control. In order to address these open points, a framework similar to modeling approaches concerning the human cognition processes [Cac98] can be introduced in the engineering context, which is denoted as cognition-based framework. As stabilization control task is the most basic control task, the cognition-based framework for stabilization is established in this thesis. It is assumed, that the mathematical model of the system to be controlled is unknown and fully controllable, as well as the state vector can be measured. The cognitive stabilizer is realized based on the cognitive framework by its four main modules: (1) “perception and interpretation” using system identifier for the system local dynamic online identification and multi-step-ahead prediction; (2) “expert knowledge” relating to the stability criterion to guarantee the stability of the considered motion of the controlled system; (3) “planning” to generate a suitable control input sequence according to certain cost functions; (4) “execution” to generate the optimal control input in a corresponding feedback form. Each module can be realized using different methods. In this thesis, “perception and interpretation” is realized using neural networks, Gaussian process regression, or combined identifier. “Expert knowledge” consists of the data-driven quadratic stability criterion, the quadratic Lyapunov stability criterion with a certain Lyapunov function, and the uniform stability criterion. The modules “planning” and “execution” are realized together with exhaustive grid search method or direct input optimization using inverse model. The whole cognitive stabilizer is realized using the autonomous communication among each module. The cognitive stabilizer are tested using numerical examples or experimental results in this thesis. Pendulum system and Lorenz-system are considered as simulation examples. Both are benchmark examples for the nonlinear dynamic control design. The cognitive stabilizer is experimentally implemented and evaluated to a threetank-system. All the numerical examples and experimental results demonstrate the successful application of the proposed methods.Das Thema dieser Arbeit ist ein kognitives Stabilisierungsverfahren, das basierend auf einem kognitionsbasierten Framework ein adaptives diskretes Regelungsverfahren darstellt. Ziel des kognitiven Stabilisierungsverfahrens ist es, eine spezifische Klasse von unbekannten, nichtlinearen, Mehrgrößensystemen autonom zu stabilisieren. Das kognitive Stabilisierungsverfahren ist in der Lage, relevante lokale Informationen über das unbekannte System zu erlangen. Es kann autonom geeignete Steuergrößen definieren, um das System zu stabilisieren. Die Entwicklung von verschiedenen adaptiven, datenbasierten und modellfreien Reglern zeigte bereits die Tendenz der Erforschung von Regelungsmethoden mit hoher Autonomie. Der Begriff Autonomie wird hier verwendet, um die Tatsache zu beschreiben, dass das Regelungsverfahren bzw. die dazugehörige Programmierung so durchgeführt wird, dass der zugehörige Algorithmus den Rückführungsentwurf autonom ohne Einwirkungen von außerhalb des Algorithmus festlegen kann. Typische Methoden, die von dieser Definition beeinflusst werden sind die adaptive Regelungsmethode, die datenbasierte Regelungsmethode oder die modellfreie Regelungsmethode, deren Stand der Forschung in dieser Arbeit zusammengefasst wird. Der Hauptfokus liegt dabei auf der Autonomie der realisierten Verfahren. Es kann gezeigt werden, dass die existierenden Methoden immer noch einige offene Probleme aufweisen, um eine hohe autonome Regelung zu erreichen. Um diese offenen Probleme weiterzuentwickeln, kann ein Framework in den Ingenieurskontext eingeführt werden, das den Modellierungsverfahren bezüglich der menschlichen Kognitionsprozesse [Cac98] ähnelt und als kognitives Framework bezeichnet werden kann. Da Stabilisierungsaufgaben die elementarsten Regelungsaufgaben sind, wird in dieser Arbeit ein kognitionsbasiertes Framework zur Stabilisierung entwickelt. Zunächst wird angenommen, dass das mathematische Modell des zu regelnden Systems unbekannt, vollständig steuerbar und der Zustandsvektor messbar ist. Der kognitive Stabilisierungsregler wird basierend auf dem kognitiven System durch seine vier Hauptmodule realisiert: (1) ”Wahrnehmung und Interpretation“ durch einen Systemidentifikator zur Echtzeit-Identifikation der lokalen Systemdynamik und Mehr-Schritt-Vorhersage; (2) ”Expertenwissen“ bezogen auf das Stabilitätskriterium um die Stabilität der betrachteten Bewegung des geregelten Systems zu garantieren; (3) ”Planung“ um eine geeignete Eingangsgrößensequenz nach bestimmten Gütefunktionen zu erzeugen; (4) ”Ausführung“ um die optimalen Steuergrößen in eine entsprechende Rückführungsform zu generieren. Jedes Modul kann durch verschiedene Methoden realisiert werden. In dieser Arbeit wird das Modul ”Wahrnehmung und Interpretation“ durch neuronale Netzwerke, Gauß-Prozess-Regression oder einen kombinierten Identifikator umgesetzt. Das Modul ”Expertenwissen“ besteht aus dem datenbasierten quadratischen Stabilitätskriterium, dem quadratischen Lyapunov Stabilitätskriterium mit einer bestimmten Lyapunov-Funktion und dem gleichmäßigen Stabilitätskriterium. Die Module ”Planung“ und ”Ausführung“ werden zusammen durch das inverse Modell mit dem vollständigen ”Grid-Search“-Verfahren oder direkter Steuergrößenoptimierung realisiert. Die gesamte kognitive Stabilisierungsmethode wird durch die autonome Kommunikation zwischen jedem Modul realisiert. Die kognitive Stabilisierungsmethode wird in dieser Arbeit durch numerische Beispiele oder experimentelle Ergebnisse getestet. Zwei Simulationsbeispiele (Pendel-System sowie Lorenz-System) werden betrachtet. Beide sind Benchmarkbeispiele für den nichtlinearen dynamischen Regelungsentwurf. Die kognitive Stabilisierungsmethode wird experimentell auf das Drei-Tank-System angewendet und die entsprechenden Ergebnisse werden bewertet. Die numerischen Beispiele sowie die experimentelle Umsetzung zeigen die erfolgreiche Anwendung des dargestellten Verfahrens

    Development of neural units with higher-order synaptic operations and their applications to logic circuits and control problems

    Get PDF
    Neural networks play an important role in the execution of goal-oriented paradigms. They offer flexibility, adaptability and versatility, so that a variety of approaches may be used to meet a specific goal, depending upon the circumstances and the requirements of the design specifications. Development of higher-order neural units with higher-order synaptic operations will open a new window for some complex problems such as control of aerospace vehicles, pattern recognition, and image processing. The neural models described in this thesis consider the behavior of a single neuron as the basic computing unit in neural information processing operations. Each computing unit in the network is based on the concept of an idealized neuron in the central nervous system (CNS). Most recent mathematical models and their architectures for neuro-control systems have generated many theoretical and industrial interests. Recent advances in static and dynamic neural networks have created a profound impact in the field of neuro-control. Neural networks consisting of several layers of neurons, with linear synaptic operation, have been extensively used in different applications such as pattern recognition, system identification and control of complex systems such as flexible structures, and intelligent robotic systems. The conventional linear neural models are highly simplified models of the biological neuron. Using this model, many neural morphologies, usually referred to as multilayer feedforward neural networks (MFNNs), have been reported in the literature. The performance of the neurons is greatly affected when a layer of neurons are implemented for system identification, pattern recognition and control problems. Through simulation studies of the XOR logic it was concluded that the neurons with linear synaptic operation are limited to only linearly separable forms of pattern distribution. However, they perform a variety of complex mathematical operations when they are implemented in the form of a network structure. These networks suffer from various limitations such as computational efficiency and learning capabilities and moreover, these models ignore many salient features of the biological neurons such as time delays, cross and self correlations, and feedback paths which are otherwise very important in the neural activity. In this thesis an effort is made to develop new mathematical models of neurons that belong to the class of higher-order neural units (HONUs) with higher-order synaptic operations such as quadratic and cubic synaptic operations. The advantage of using this type of neural unit is associated with performance of the neurons but the performance comes at the cost of exponential increase in parameters that hinders the speed of the training process. In this context, a novel method of representation of weight parameters without sacrificing the neural performance has been introduced. A generalised representation of the higher-order synaptic operation for these neural structures was proposed. It was shown that many existing neural structures can be derived from this generalized representation of the higher-order synaptic operation. In the late 1960’s, McCulloch and Pitts modeled the stimulation-response of the primitive neuron using the threshold logic. Since then, it has become a practice to implement the logic circuits using neural structures. In this research, realization of the logic circuits such as OR, AND, and XOR were implemented using the proposed neural structures. These neural structures were also implemented as neuro-controllers for the control problems such as satellite attitude control and model reference adaptive control. A comparative study of the performance of these neural structures compared to that of the conventional linear controllers has been presented. The simulation results obtained in this research were applicable only for the simplified model presented in the simulation studies
    corecore