125 research outputs found

    Computation Approaches for Continuous Reinforcement Learning Problems

    Get PDF
    Optimisation theory is at the heart of any control process, where we seek to control the behaviour of a system through a set of actions. Linear control problems have been extensively studied, and optimal control laws have been identified. But the world around us is highly non-linear and unpredictable. For these dynamic systems, which don’t possess the nice mathematical properties of the linear counterpart, the classic control theory breaks and other methods have to be employed. But nature thrives by optimising non-linear and over-complicated systems. Evolutionary Computing (EC) methods exploit nature’s way by imitating the evolution process and avoid to solve the control problem analytically. Reinforcement Learning (RL) from the other side regards the optimal control problem as a sequential one. In every discrete time step an action is applied. The transition of the system to a new state is accompanied by a sole numerical value, the “reward” that designate the quality of the control action. Even though the amount of feedback information is limited into a sole real number, the introduction of the Temporal Difference method made possible to have accurate predictions of the value-functions. This paved the way to optimise complex structures, like the Neural Networks, which are used to approximate the value functions. In this thesis we investigate the solution of continuous Reinforcement Learning control problems by EC methodologies. The accumulated reward of such problems throughout an episode suffices as information to formulate the required measure, fitness, in order to optimise a population of candidate solutions. Especially, we explore the limits of applicability of a specific branch of EC, that of Genetic Programming (GP). The evolving population in the GP case is comprised from individuals, which are immediately translated to mathematical functions, which can serve as a control law. The major contribution of this thesis is the proposed unification of these disparate Artificial Intelligence paradigms. The provided information from the systems are exploited by a step by step basis from the RL part of the proposed scheme and by an episodic basis from GP. This makes possible to augment the function set of the GP scheme with adaptable Neural Networks. In the quest to achieve stable behaviour of the RL part of the system a modification of the Actor-Critic algorithm has been implemented. Finally we successfully apply the GP method in multi-action control problems extending the spectrum of the problems that this method has been proved to solve. Also we investigated the capability of GP in relation to problems from the food industry. These type of problems exhibit also non-linearity and there is no definite model describing its behaviour

    Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin

    Get PDF
    Photovoltaic (PV) energy, representing a renewable source of energy, plays a key role in the reduction of greenhouse gas emissions and the achievement of a sustainable mix of energy generation. To achieve the maximum solar energy harvest, PV power systems require the implementation of Maximum Power Point Tracking (MPPT). Traditional MPPT controllers, such as P&O, are easy to implement, but they are by nature slow and oscillate around the MPP losing efficiency. This work presents a Reinforcement learning (RL)-based control to increase the speed and the efficiency of the controller. Deep Deterministic Policy Gradient (DDPG), the selected RL algorithm, works with continuous actions and space state to achieve a stable output at MPP. A Digital Twin (DT) enables simulation training, which accelerates the process and allows it to operate independent of weather conditions. In addition, we use the maximum power achieved in the DT to adjust the reward function, making the training more efficient. The RL control is compared with a traditional P&O controller to validate the speed and efficiency increase both in simulations and real implementations. The results show an improvement of 10.45% in total power output and a settling time 24.54 times faster in simulations. Moreover, in real-time tests, an improvement of 51.45% in total power output and a 0.25 s settling time of the DDPG compared with 4.26 s of the P&O is obtained

    Walking Motion Generation and Neuro-Fuzzy Control with Push Recovery for Humanoid Robot

    Get PDF
    Push recovery is an essential requirement for a humanoid robot with the objective of safely performing tasks within a real dynamic environment. In this environment, the robot is susceptible to external disturbance that in some cases is inevitable, requiring push recovery strategies to avoid possible falls, damage in humans and the environment. In this paper, a novel push recovery approach to counteract disturbance from any direction and any walking phase is developed. It presents a pattern generator with the ability to be modified according to the push recovery strategy. The result is a humanoid robot that can maintain its balance in the presence of strong disturbance taking into account its magnitude and determining the best push recovery strategy. Push recovery experiments with different disturbance directions have been performed using a 20 DOF Darwin-OP robot. The adaptability and low computational cost of the whole scheme allows is incorporation into an embedded system

    Development and Implementation of Novel Intelligent Motor Control for Performance Enhancement of PMSM Drive in Electrified Vehicle Application

    Get PDF
    The demand for electrified vehicles has grown significantly over the last decade causing a shift in the automotive industry from traditional gasoline vehicles to electric vehicles (EVs). With the growing evolution of EVs, high power density, and high efficiency of electric powertrains (e–drive) are of the utmost need to achieve an extended driving range. However, achieving an extended driving range with enhanced e-drive performance is still a bottleneck. The control algorithm of e–drive plays a vital role in its performance and reliability over time. Artificial intelligence (AI) and machine learning (ML) based intelligent control methods have proven their continued success in fault determination and analysis of motor–drive systems. Considering the potential of intelligent control, this thesis investigates the legacy space vector modulation (SVM) strategy for wide–bandgap (WBG) inverter and conventional current PI controller for permanent magnet synchronous motor (PMSM) control to reduce the switching loss, computation time and enhance transient performance in the available state–of–the-art e–drive systems. The thesis converges on AI– and ML–based control for e–drives to enhance the performance by focusing in reducing switching loss using ANN–based modulation technique for GaN–based inverter and improving transient performance of PMSM by incorporating ML–based parameter independent controller

    Self-Learning Longitudinal Control for On-Road Vehicles

    Get PDF
    Fahrerassistenzsysteme (Advanced Driver Assistance Systems) sind ein wichtiges Verkaufsargument fĂŒr PKWs, fordern jedoch hohe Entwicklungskosten. Insbesondere die Parametrierung fĂŒr LĂ€ngsregelung, die einen wichtigen Baustein fĂŒr Fahrerassistenzsysteme darstellt, benötigt viel Zeit und Geld, um die richtige Balance zwischen Insassenkomfort und RegelgĂŒte zu treffen. Reinforcement Learning scheint ein vielversprechender Ansatz zu sein, um dies zu automatisieren. Diese Klasse von Algorithmen wurde bislang allerdings vorwiegend auf simulierte Aufgaben angewendet, die unter idealen Bedingungen stattfinden und nahezu unbegrenzte Trainingszeit ermöglichen. Unter den grĂ¶ĂŸten Herausforderungen fĂŒr die Anwendung von Reinforcement Learning in einem realen Fahrzeug sind Trajektorienfolgeregelung und unvollstĂ€ndige Zustandsinformationen aufgrund von nur teilweise beobachteter Dynamik. DarĂŒber hinaus muss ein Algorithmus, der in realen Systemen angewandt wird, innerhalb von Minuten zu einem Ergebnis kommen. Außerdem kann das Regelziel sich wĂ€hrend der Laufzeit beliebig Ă€ndern, was eine zusĂ€tzliche Schwierigkeit fĂŒr Reinforcement Learning Methoden darstellt. Diese Arbeit stellt zwei Algorithmen vor, die wenig Rechenleistung benötigen und diese HĂŒrden ĂŒberwinden. Einerseits wird ein modellfreier Reinforcement Learning Ansatz vorgeschlagen, der auf der Actor-Critic-Architektur basiert und eine spezielle Struktur in der Zustandsaktionswertfunktion verwendet, um mit teilweise beobachteten Systemen eingesetzt werden zu können. Um eine Vorsteuerung zu lernen, wird ein Regler vorgeschlagen, der sich auf eine Projektion und Trainingsdatenmanipulation stĂŒtzt. Andererseits wird ein modellbasierter Algorithmus vorgeschlagen, der auf Policy Search basiert. Diesem wird eine automatisierte Entwurfsmethode fĂŒr eine inversionsbasierte Vorsteuerung zur Seite gestellt. Die vorgeschlagenen Algorithmen werden in einer Reihe von Szenarien verglichen, in denen sie online, d.h. wĂ€hrend der Fahrt und bei geschlossenem Regelkreis, in einem realen Fahrzeug lernen. Obwohl die Algorithmen etwas unterschiedlich auf verschiedene Randbedingungen reagieren, lernen beide robust und zĂŒgig und sind in der Lage, sich an verschiedene Betriebspunkte, wie zum Beispiel Geschwindigkeiten und GĂ€nge, anzupassen, auch wenn Störungen wĂ€hrend des Trainings einwirken. Nach bestem Wissen des Autors ist dies die erste erfolgreiche Anwendung eines Reinforcement Learning Algorithmus, der online in einem realen Fahrzeug lernt

    Hybrid Modeling Approaches Integrating Physics-Based Models with Machine Learning for Predictive Control of Biological and Chemical Processes

    Get PDF
    Recently, there has been growing interest in data-based modeling as the amount of data available has increased tremendously. One such method is Dynamic Mode Decomposition with Control technique, which builds temporally local linear models using data. But its limited domain of applicability (DA) hinders its use for prediction purposes. To overcome this challenge, we proposed an algorithm that utilizes multiple "local" training datasets, and it was applied successfully to hydraulic fracturing. Although data-based modeling offers simplicity and ease of construction, it lacks robustness and parametric interpretability, unlike first-principles modeling. To balance the advantages and disadvantages of data-based models and first-principles models, hybrid modeling was proposed using artificial neural networks (ANNs). Since then, Machine Learning (ML) has advanced where deep neural networks (DNNs) with more than three layers can be trained to approximate any function accurately. In this work, we proposed a deep hybrid modeling (DHM) framework that integrates first-principles with DNNs and successfully applied it to two complex processes, i.e., hydraulic fracturing and full-scale fermentation reactor. Similarly, Universal Differential Equations (UDEs) was proposed in ML where DNNs are represented as ODEs and solved using ODE solvers. We utilized UDEs to successfully build a DHM using simulation and experimental data for batch production of ϐ-carotene. One limitation of DHM is that its DA is affected by the DNN within it, and its accuracy is high within its DA. Therefore, it is important to consider its DA when designing a model-based controller. To this end, we proposed a Control Lyapunov-Barrier Function (CLBF)-MPC to stabilize and ensure that the closed-loop system stays within DA of DHM. Theoretical guarantees were provided for the CLBF-MPC controller, and it was successfully implemented on a CSTR. The idea of integrating physics with ML can be extended to Reinforcement Learning (RL). In case when model-based controller design is not possible, we proposed a model-free Deep RL (DRL) controller that utilizes prior knowledge in its reward function to quicken the learning process. This DRL controller was successfully applied to hydraulic fracturing wherein Nolte’s law was included in the reward function for fast convergence
    • 

    corecore