347 research outputs found

    A Survey of Offline and Online Learning-Based Algorithms for Multirotor UAVs

    Full text link
    Multirotor UAVs are used for a wide spectrum of civilian and public domain applications. Navigation controllers endowed with different attributes and onboard sensor suites enable multirotor autonomous or semi-autonomous, safe flight, operation, and functionality under nominal and detrimental conditions and external disturbances, even when flying in uncertain and dynamically changing environments. During the last decade, given the faster-than-exponential increase of available computational power, different learning-based algorithms have been derived, implemented, and tested to navigate and control, among other systems, multirotor UAVs. Learning algorithms have been, and are used to derive data-driven based models, to identify parameters, to track objects, to develop navigation controllers, and to learn the environment in which multirotors operate. Learning algorithms combined with model-based control techniques have been proven beneficial when applied to multirotors. This survey summarizes published research since 2015, dividing algorithms, techniques, and methodologies into offline and online learning categories, and then, further classifying them into machine learning, deep learning, and reinforcement learning sub-categories. An integral part and focus of this survey are on online learning algorithms as applied to multirotors with the aim to register the type of learning techniques that are either hard or almost hard real-time implementable, as well as to understand what information is learned, why, and how, and how fast. The outcome of the survey offers a clear understanding of the recent state-of-the-art and of the type and kind of learning-based algorithms that may be implemented, tested, and executed in real-time.Comment: 26 pages, 6 figures, 4 tables, Survey Pape

    Self-Learning Longitudinal Control for On-Road Vehicles

    Get PDF
    Reinforcement Learning is a promising tool to automate controller tuning. However, significant extensions are required for real-world applications to enable fast and robust learning. This work proposes several additions to the state of the art and proves their capability in a series of real world experiments

    Reinforcement Q-learning for Model-Free Optimal Control: Real-Time Implementation and Challenges

    Get PDF
    Traditional feedback control methods are often model-based and the mathematical system models need to be identified before or during control. A reinforcement learning method called Q-learning can be used for model-free state feedback control. In theory, the optimal adaptive control is learned online without a system model with Q-learning. This data-driven learning is based on the output or state measurements and input control actions of the system. Theoretical results are promising, but the real-time applications are not widely used. Real-time implementation can be difficult because of e.g. hardware restrictions and stability issues. This research aimed to determine whether a set of already existing Q-learning algorithms is capable of learning the optimal control in real-time applications. Both batch offline and adaptive online algorithms were chosen for this study. The selected Q-learning algorithms were implemented in a marginally stable linear system and an unstable nonlinear system using the Quanser QUBE™-Servo 2 experiment with an inertia disk and an inverted pendulum attachments. The results learned from the real-time system were compared to the theoretical Q-learning results when a simulated system model was used. The results proved that the chosen algorithms solve the Linear Quadratic Regulator (LQR) problem with the theoretical linear system model. The algorithm chosen for the nonlinear system approximated the Hamilton-Jacobi-Bellman equation solution with the theoretical model, when the inverted pendulum was balanced upright. The results also showed that some challenges caused by the real-time system can be avoided by a proper selection of control noise. These include e.g. constrained input voltage and measurement disturbances such as small measurement noise and quantization due to measurement resolution. In the best, but rare, adaptive test cases, a near optimal policy was learned online for the linear real-time system. However, learning is reliable only with some batch learning methods. Lastly, some suggestions for improvements were proposed for Q-learning to be more suitable for real-time applications

    Automated Reinforcement Learning:An Overview

    Get PDF
    Reinforcement Learning and recently Deep Reinforcement Learning are popular methods for solving sequential decision making problems modeled as Markov Decision Processes. RL modeling of a problem and selecting algorithms and hyper-parameters require careful considerations as different configurations may entail completely different performances. These considerations are mainly the task of RL experts; however, RL is progressively becoming popular in other fields where the researchers and system designers are not RL experts. Besides, many modeling decisions, such as defining state and action space, size of batches and frequency of batch updating, and number of timesteps are typically made manually. For these reasons, automating different components of RL framework is of great importance and it has attracted much attention in recent years. Automated RL provides a framework in which different components of RL including MDP modeling, algorithm selection and hyper-parameter optimization are modeled and defined automatically. In this article, we explore the literature and present recent work that can be used in automated RL. Moreover, we discuss the challenges, open questions and research directions in AutoRL

    Event sampled optimal adaptive regulation of linear and a class of nonlinear systems

    Get PDF
    In networked control systems (NCS), wherein a communication network is used to close the feedback loop, the transmission of feedback signals and execution of the controller is currently carried out at periodic sampling instants. Thus, this scheme requires a significant computational power and network bandwidth. In contrast, the event-based aperiodic sampling and control, which is introduced recently, appears to relieve the computational burden and high network resource utilization. Therefore, in this dissertation, a suite of novel event sampled adaptive regulation schemes in both discrete and continuous time domain for uncertain linear and nonlinear systems are designed. Event sampled Q-learning and adaptive/neuro dynamic programming (ADP) schemes without value and policy iterations are utilized for the linear and nonlinear systems, respectively, in both the time domains. Neural networks (NN) are employed as approximators for nonlinear systems and, hence, the universal approximation property of NN in the event-sampled framework is introduced. The tuning of the parameters and the NN weights are carried out in an aperiodic manner at the event sampled instants leading to a further saving in computation when compared to traditional NN based control. The adaptive regulator when applied on a linear NCS with time-varying network delays and packet losses shows a 30% and 56% reduction in computation and network bandwidth usage, respectively. In case of nonlinear NCS with event sampled ADP based regulator, a reduction of 27% and 66% is observed when compared to periodic sampled schemes. The sampling and transmission instants are determined through adaptive event sampling conditions derived using Lyapunov technique by viewing the closed-loop event sampled linear and nonlinear systems as switched and/or impulsive dynamical systems. --Abstract, page iii

    Scalable and data efficient deep reinforcement learning methods for healthcare applications

    Get PDF
    2019 Fall.Includes bibliographical references.Artificial intelligence driven medical devices have created the potential for significant breakthroughs in healthcare technology. Healthcare applications using reinforcement learning are still very sparse as the medical domain is very complex and decision making requires domain expertise. High volumes of data generated from medical devices – a key input for delivering on the promise of AI, suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated. Unlike other data sets, medical data annotation, which is critical for accurate ground truth, requires medical domain expertise for a high-quality patient outcome. While accurate recommendation of decisions is vital in this context, making them in near real-time on devices with computational resource constraint requires that we build efficient, compact representations of models such as deep neural networks. While deeper and wider neural networks are designed for complex healthcare applications, model compression can be an effective way to deploy networks on medical devices that often have hardware and speed constraints. Most state-of-the-art model compression techniques require a resource centric manual process that explores a large model architecture space to find a trade-off solution between model size and accuracy. Recently, reinforcement learning (RL) approaches are proposed to automate such a hand-crafted process. However, most RL model compression algorithms are model-free which require longer time with no assumptions of the model. On the contrary, model-based (MB) approaches are data driven; have faster convergence but are sensitive to the bias in the model. In this work, we report on the use of reinforcement learning to mimic the decision-making process of annotators for medical events, to automate annotation and labelling. The reinforcement agent learns to annotate alarm data based on annotations done by an expert. Our method shows promising results on medical alarm data sets. We trained deep Q-network and advantage actor-critic agents using the data from monitoring devices that are annotated by an expert. Initial results from these RL agents learning the expert-annotated behavior are encouraging and promising. The advantage actor-critic agent performs better in terms of learning the sparse events in a given state, thereby choosing more right actions compared to deep Q-network agent. To the best of our knowledge, this is the first reinforcement learning application for the automation of medical events annotation, which has far-reaching practical use. In addition, a data-driven model-based algorithm is developed, which integrates seamlessly with model-free RL approaches for automation of deep neural network model compression. We evaluate our algorithm on a variety of imaging data from dermoscopy to X-ray on different popular and public model architectures. Compared to model-free RL approaches, our approach achieves faster convergence; exhibits better generalization across different data sets; and preserves comparable model performance. The new RL methods' application to healthcare domain from this work for both false alarm detection and model compression is generic and can be applied to any domain where sequential decision making is partially random and practically controlled by the decision maker
    • …
    corecore