337 research outputs found

    Computation Approaches for Continuous Reinforcement Learning Problems

    Get PDF
    Optimisation theory is at the heart of any control process, where we seek to control the behaviour of a system through a set of actions. Linear control problems have been extensively studied, and optimal control laws have been identified. But the world around us is highly non-linear and unpredictable. For these dynamic systems, which don’t possess the nice mathematical properties of the linear counterpart, the classic control theory breaks and other methods have to be employed. But nature thrives by optimising non-linear and over-complicated systems. Evolutionary Computing (EC) methods exploit nature’s way by imitating the evolution process and avoid to solve the control problem analytically. Reinforcement Learning (RL) from the other side regards the optimal control problem as a sequential one. In every discrete time step an action is applied. The transition of the system to a new state is accompanied by a sole numerical value, the “reward” that designate the quality of the control action. Even though the amount of feedback information is limited into a sole real number, the introduction of the Temporal Difference method made possible to have accurate predictions of the value-functions. This paved the way to optimise complex structures, like the Neural Networks, which are used to approximate the value functions. In this thesis we investigate the solution of continuous Reinforcement Learning control problems by EC methodologies. The accumulated reward of such problems throughout an episode suffices as information to formulate the required measure, fitness, in order to optimise a population of candidate solutions. Especially, we explore the limits of applicability of a specific branch of EC, that of Genetic Programming (GP). The evolving population in the GP case is comprised from individuals, which are immediately translated to mathematical functions, which can serve as a control law. The major contribution of this thesis is the proposed unification of these disparate Artificial Intelligence paradigms. The provided information from the systems are exploited by a step by step basis from the RL part of the proposed scheme and by an episodic basis from GP. This makes possible to augment the function set of the GP scheme with adaptable Neural Networks. In the quest to achieve stable behaviour of the RL part of the system a modification of the Actor-Critic algorithm has been implemented. Finally we successfully apply the GP method in multi-action control problems extending the spectrum of the problems that this method has been proved to solve. Also we investigated the capability of GP in relation to problems from the food industry. These type of problems exhibit also non-linearity and there is no definite model describing its behaviour

    Prediction and control in human neuromusculoskeletal models

    Get PDF
    Computational neuromusculoskeletal modelling enables the generation and testing of hypotheses about human movement on a large scale, in silico. Humanoid models, which increasingly aim to replicate the full complexity of the human nervous and musculoskeletal systems, are built on extensive prior knowledge, extracted from anatomical imaging, kinematic and kinetic measurement, and codified as model description. Where inverse dynamic analysis is applied, its basis is in Newton's laws of motion, and in solving for muscular redundancy it is necessary to invoke knowledge of central nervous motor strategy. This epistemological approach contrasts strongly with the models of machine learning, which are generally over-parameterised and largely data-driven. Even as spectacular performance has been delivered by the application of these models in a number of discrete domains of artificial intelligence, work towards general human-level intelligence has faltered, leading many to wonder if the data-driven approach is fundamentally limited, and spurring efforts to combine machine learning with knowledge-based modelling. Through a series of five studies, this thesis explores the combination of neuromusculoskeletal modelling with machine learning in order to enhance the core tasks of prediction and control. Several principles for the development of clinically useful artificially intelligent systems emerge: stability, computational efficiency and incorporation of prior knowledge. The first study concerns the use of neural network function approximators for the prediction of internal forces during human movement, an important task with many clinical applications, but one for which the standard tools of modelling are slow and cumbersome. By training on a large dataset of motions and their corresponding forces, state of the art performance is demonstrated, with many-fold increases in inference speed enabling the deployment of trained models for use in a real time biofeedback system. Neural networks trained in this way, to imitate some optimal controller, encode a mapping from high-level movement descriptors to actuator commands, and may thus be deployed in simulation as \textit{policies} to control the actions of humanoid models. Unfortunately, the high complexity of realistic simulation makes stable control a challenging task, beyond the capabilities of such naively trained models. The objective of the second study was to improve performance and stability of policy-based controllers for humanoid models in simulation. A novel technique was developed, borrowing from established unsupervised adversarial methods in computer vision. This technique enabled significant gains in performance relative to a neural network baseline, without the need for additional access to the optimal controller. For the third study, increases in the capabilities of these policy-based controllers were sought. Reinforcement learning is widely considered the most powerful means of optimising such policies, but it is computationally inefficient, and this inefficiency limits its clinical utility. To mitigate this problem, a novel framework, making use of domain-specific knowledge present in motion data, and in an inverse model of the biomechanical system, was developed. Training on simple desktop hardware, this framework enabled rapid initialisation of humanoid models that were able to move naturally through a 3-dimensional simulated environment, with 900-fold improvements in sample efficiency relative to a related technique based on pure reinforcement learning. After training with subject-specific anatomical parameters, and motion data, learned policies represent personalised models of motor control that may be further interrogated to test hypotheses about movement. For the fourth study, subject-specific controllers were taken and used as the substrate for transfer learning, by removing kinematic constraints and optimising with respect to the magnitude of the medial knee joint reaction force, an important biomechanical variable in osteoarthritis of the knee. Models learned new kinematic strategies for the reduction of this biomarker, which were subsequently validated by their use, in the real world, to construct subject-specific routines for real time gait retraining. Six out of eight subjects were able to reduce medial knee joint loading by pursuing the personalised kinematic targets found in simulation. Personalisation of assistive devices, such as limb prostheses, is another area of growing interest, and one for which computational frameworks promise cost-effective solutions. Reinforcement learning provides powerful techniques for this task but the expansion of the scope of optimisation, to include previously static elements of a prosthesis, is problematic for its complexity and resulting sample inefficiency. The fifth and final study demonstrates a new algorithm that leverages the methods described in the previous studies, and additional techniques for variance control, to surmount this problem, improving sample efficiency and simultaneously, through the use of prior knowledge encoded in motion data, providing a rational means of determining optimality in the prosthesis. Trained models were able to jointly optimise motor control and prosthesis design to enable improved performance in a walking task, and optimised designs were robust to both random seed and reward specification. This algorithm could be used to speed the design and production of real personalised prostheses, representing a potent realisation of the potential benefits of combined reinforcement learning and realistic neuromusculoskeletal modelling.Open Acces

    Surrogate model for real time signal control: theories and applications

    Get PDF
    Traffic signal controls play a vital role in urban road traffic networks. Compared with fixed-time signal control, which is solely based on historical data, real time signal control is flexible and responsive to varying traffic conditions, and hence promises better performance and robustness in managing traffic congestion. Real time signal control can be divided into model-based and model-free approaches. The former requires a traffic model (analytical or simulation-based) in the generation, optimisation and evaluation of signal control plans, which means that its efficacy in real-world deployment depends on the validity and accuracy of the underlying traffic model. Model-free real time signal control, on the other hand, is constructed based on expert experience and empirical observations. Most of the existing model-free real time signal controls, however, focus on learning-based and rule-based approaches, and either lack interpretability or are non-optimised. This thesis proposes a surrogate-based real time signal control and optimisation framework, that can determine signal decisions in a centralised manner without the use of any traffic model. Surrogate models offer analytical and efficient approximations of complex models or black-box processes by fitting their input-output structures with appropriate mathematical tools. Current research on surrogate-based optimisation is limited to strategic and off-line optimisation, which only approximates the relationship between decisions and outputs under highly specific conditions based on certain traffic simulation models and is still to be attempted for real time optimisation. This thesis proposes a framework for surrogate-based real time signal control, by constructing a response surface that encompasses, (1) traffic states, (2) control parameters, and (3) network performance indicators at the same time. A series of comprehensive evaluations are conducted to assess the effectiveness, robustness and computational efficiency of the surrogate-based real time signal control. In the numerical test, the Kriging model is selected to approximate the traffic dynamics of the test network. The results show that this Kriging-based real time signal control can increase the total throughput by 5.3% and reduce the average delay by 8.1% compared with the fixed-time baseline signal plan. In addition, the optimisation time can be reduced by more than 99% if the simulation model is replaced by a Kriging model. The proposed signal controller is further investigated via multi-scenario analyses involving different levels of information availability, network saturation and traffic uncertainty, which shows the robustness and reliability of the controller. Moreover, the influence of the baseline signal on the Kriging-based signal control can be eliminated by a series of off-line updates. By virtue of the model-free nature and the adaptive learning capability of the surrogate model, the Kriging-based real time signal control can adapt to systematic network changes (such as seasonal variations in traffic demand). The adaptive Kriging-based real time signal control can update the response surface according to the feedback from the actual traffic environment. The test results show that the adaptive Kriging-based real time signal control maintains the signal control performance better in response to systematic network changes than either fixed-time signal control or non-adaptive Kriging-based signal control.Open Acces

    Advances in Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a very dynamic area in terms of theory and application. This book brings together many different aspects of the current research on several fields associated to RL which has been growing rapidly, producing a wide variety of learning algorithms for different applications. Based on 24 Chapters, it covers a very broad variety of topics in RL and their application in autonomous systems. A set of chapters in this book provide a general overview of RL while other chapters focus mostly on the applications of RL paradigms: Game Theory, Multi-Agent Theory, Robotic, Networking Technologies, Vehicular Navigation, Medicine and Industrial Logistic

    Adaptive Railway Traffic Control using Approximate Dynamic Programming

    Get PDF
    Railway networks around the world have become challenging to operate in recent decades, with a mixture of track layouts running several different classes of trains with varying operational speeds. This complexity has come about as a result of the sustained increase in passenger numbers where in many countries railways are now more popular than ever before as means of commuting to cities. To address operational challenges, governments and railway undertakings are encouraging development of intelligent and digital transport systems to regulate and optimise train operations in real-time to increase capacity and customer satisfaction by improved usage of existing railway infrastructure. Accordingly, this thesis presents an adaptive railway traffic control system for realtime operations based on a data-based approximate dynamic programming (ADP) approach with integrated reinforcement learning (RL). By assessing requirements and opportunities, the controller aims to reduce delays resulting from trains that entered a control area behind schedule by re-scheduling control plans in real-time at critical locations in a timely manner. The present data-based approach depends on an approximation to the value function of dynamic programming after optimisation from a specified state, which is estimated dynamically from operational experience using RL techniques. By using this approximation, ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this thesis, formulations of the approximation function and variants of the RL learning techniques used to estimate it are explored. Evaluation of this controller shows considerable improvements in delays by comparison with current industry practices

    Brief Survey on Attack Detection Methods for Cyber-Physical Systems

    Get PDF

    Reinforcement Learning For The Control Of Large-Scale Power Systems

    Get PDF

    Humanoid Robots

    Get PDF
    For many years, the human being has been trying, in all ways, to recreate the complex mechanisms that form the human body. Such task is extremely complicated and the results are not totally satisfactory. However, with increasing technological advances based on theoretical and experimental researches, man gets, in a way, to copy or to imitate some systems of the human body. These researches not only intended to create humanoid robots, great part of them constituting autonomous systems, but also, in some way, to offer a higher knowledge of the systems that form the human body, objectifying possible applications in the technology of rehabilitation of human beings, gathering in a whole studies related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, among other areas. This book presents a series of researches inspired by this ideal, carried through by various researchers worldwide, looking for to analyze and to discuss diverse subjects related to humanoid robots. The presented contributions explore aspects about robotic hands, learning, language, vision and locomotion
    corecore