8,227 research outputs found

    Learning Unmanned Aerial Vehicle Control for Autonomous Target Following

    Full text link
    While deep reinforcement learning (RL) methods have achieved unprecedented successes in a range of challenging problems, their applicability has been mainly limited to simulation or game domains due to the high sample complexity of the trial-and-error learning process. However, real-world robotic applications often need a data-efficient learning process with safety-critical constraints. In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire a strategy that combines perception and control, we represent the policy by a convolutional neural network. We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic failure. The neural network is trained by a combination of supervised learning from raw images and reinforcement learning from games of self-play. We show that the proposed approach can learn a target following policy in a simulator efficiently and the learned behavior can be successfully transferred to the DJI quadrotor platform for real-world UAV control

    Autonomic management of virtualized resources in cloud computing

    Get PDF
    The last five years have witnessed a rapid growth of cloud computing in business, governmental and educational IT deployment. The success of cloud services depends critically on the effective management of virtualized resources. A key requirement of cloud management is the ability to dynamically match resource allocations to actual demands, To this end, we aim to design and implement a cloud resource management mechanism that manages underlying complexity, automates resource provisioning and controls client-perceived quality of service (QoS) while still achieving resource efficiency. The design of an automatic resource management centers on two questions: when to adjust resource allocations and how much to adjust. In a cloud, applications have different definitions on capacity and cloud dynamics makes it difficult to determine a static resource to performance relationship. In this dissertation, we have proposed a generic metric that measures application capacity, designed model-independent and adaptive approaches to manage resources and built a cloud management system scalable to a cluster of machines. To understand web system capacity, we propose to use a metric of productivity index (PI), which is defined as the ratio of yield to cost, to measure the system processing capability online. PI is a generic concept that can be applied to different levels to monitor system progress in order to identify if more capacity is needed. We applied the concept of PI to the problem of overload prevention in multi-tier websites. The overload predictor built on the PI metric shows more accurate and responsive overload prevention compared to conventional approaches. To address the issue of the lack of accurate server model, we propose a model-independent fuzzy control based approach for CPU allocation. For adaptive and stable control performance, we embed the controller with self-tuning output amplification and flexible rule selection. Finally, we build a QoS provisioning framework that supports multi-objective QoS control and service differentiation. Experiments on a virtual cluster with two service classes show the effectiveness of our approach in both performance and power control. To address the problems of complex interplay between resources and process delays in fine-grained multi-resource allocation, we consider capacity management as a decision-making problem and employ reinforcement learning (RL) to optimize the process. The optimization depends on the trial-and-error interactions with the cloud system. In order to improve the initial management performance, we propose a model-based RL algorithm. The neural network based environment model, which is learned from previous management history, generates simulated resource allocations for the RL agent. Experiment results on heterogeneous applications show that our approach makes efficient use of limited interactions and find near optimal resource configurations within 7 steps. Finally, we present a distributed reinforcement learning approach to the cluster-wide cloud resource management. We decompose the cluster-wide resource allocation problem into sub-problems concerning individual VM resource configurations. The cluster-wide allocation is optimized if individual VMs meet their SLA with a high resource utilization. For scalability, we develop an efficient reinforcement learning approach with continuous state space. For adaptability, we use VM low-level runtime statistics to accommodate workload dynamics. Prototyped in a iBalloon system, the distributed learning approach successfully manages 128 VMs on a 16-node close correlated cluster

    Advanced control techniques for modern inertia based inverters

    Get PDF
    ”In this research three artificial intelligent (AI)-based techniques are proposed to regulate the voltage and frequency of a grid-connected inverter. The increase in the penetration of renewable energy sources (RESs) into the power grid has led to the increase in the penetration of fast-responding inertia-less power converters. The increase in the penetration of these power electronics converters changes the nature of the conventional grid, in which the existing kinetic inertia in the rotating parts of the enormous generators plays a vital role. The concept of virtual inertia control scheme is proposed to make the behavior of grid connected inverters more similar to the synchronous generators, by mimicking the mechanical behavior of a synchronous generator. Conventional control techniques lack to perform optimally in nonlinear, uncertain, inaccurate power grids. Besides, the decoupled control assumption in conventional VSGs makes them nonoptimal in resistive grids. The neural network predictive controller, the heuristic dynamic programming, and the dual heuristic dynamic programming techniques are presented in this research to overcome the draw backs of conventional VSGs. The nonlinear characteristics of neural networks, and the online training enable the proposed methods to perform as robust and optimal controllers. The simulation and the experimental laboratory prototype results are provided to demonstrate the effectiveness of the proposed techniques”--Abstract, page iv

    Superando la brecha de la realidad: Algoritmos de aprendizaje por imitación y por refuerzos para problemas de locomoción robótica bípeda

    Get PDF
    ilustraciones, diagramas, fotografíasEsta tesis presenta una estrategia de entrenamiento de robots que utiliza técnicas de aprendizaje artificial para optimizar el rendimiento de los robots en tareas complejas. Motivado por los impresionantes logros recientes en el aprendizaje automático, especialmente en juegos y escenarios virtuales, el proyecto tiene como objetivo explorar el potencial de estas técnicas para mejorar las capacidades de los robots más allá de la programación humana tradicional a pesar de las limitaciones impuestas por la brecha de la realidad. El caso de estudio seleccionado para esta investigación es la locomoción bípeda, ya que permite dilucidar los principales desafíos y ventajas de utilizar métodos de aprendizaje artificial para el aprendizaje de robots. La tesis identifica cuatro desafíos principales en este contexto: la variabilidad de los resultados obtenidos de los algoritmos de aprendizaje artificial, el alto costo y riesgo asociado con la realización de experimentos en robots reales, la brecha entre la simulación y el comportamiento del mundo real, y la necesidad de adaptar los patrones de movimiento humanos a los sistemas robóticos. La propuesta consiste en tres módulos principales para abordar estos desafíos: Enfoques de Control No Lineal, Aprendizaje por Imitación y Aprendizaje por Reforzamiento. El módulo de Enfoques de Control No Lineal establece una base al modelar robots y emplear técnicas de control bien establecidas. El módulo de Aprendizaje por Imitación utiliza la imitación para generar políticas iniciales basadas en datos de captura de movimiento de referencia o resultados preliminares de políticas para crear patrones de marcha similares a los humanos y factibles. El módulo de Aprendizaje por Refuerzos complementa el proceso mejorando de manera iterativa las políticas paramétricas, principalmente a través de la simulación pero con el rendimiento en el mundo real como objetivo final. Esta tesis enfatiza la modularidad del enfoque, permitiendo la implementación de los módulos individuales por separado o su combinación para determinar la estrategia más efectiva para diferentes escenarios de entrenamiento de robots. Al utilizar una combinación de técnicas de control establecidas, aprendizaje por imitación y aprendizaje por refuerzos, la estrategia de entrenamiento propuesta busca desbloquear el potencial para que los robots alcancen un rendimiento optimizado en tareas complejas, contribuyendo al avance de la inteligencia artificial en la robótica no solo en sistemas virtuales sino en sistemas reales.The thesis introduces a comprehensive robot training framework that utilizes artificial learning techniques to optimize robot performance in complex tasks. Motivated by recent impressive achievements in machine learning, particularly in games and virtual scenarios, the project aims to explore the potential of these techniques for improving robot capabilities beyond traditional human programming. The case study selected for this investigation is bipedal locomotion, as it allows for elucidating key challenges and advantages of using artificial learning methods for robot learning. The thesis identifies four primary challenges in this context: the variability of results obtained from artificial learning algorithms, the high cost and risk associated with conducting experiments on real robots, the reality gap between simulation and real-world behavior, and the need to adapt human motion patterns to robotic systems. The proposed approach consists of three main modules to address these challenges: Non-linear Control Approaches, Imitation Learning, and Reinforcement Learning. The Non-linear Control module establishes a foundation by modeling robots and employing well-established control techniques. The Imitation Learning module utilizes imitation to generate initial policies based on reference motion capture data or preliminary policy results to create feasible human-like gait patterns. The Reinforcement Learning module complements the process by iteratively improving parametric policies, primarily through simulation but ultimately with real-world performance as the ultimate goal. The thesis emphasizes the modularity of the approach, allowing for the implementation of individual modules separately or their combination to determine the most effective strategy for different robot training scenarios. By employing a combination of established control techniques, imitation learning, and reinforcement learning, the framework seeks to unlock the potential for robots to achieve optimized performances in complex tasks, contributing to the advancement of artificial intelligence in robotics.DoctoradoDoctor en ingeniería mecánica y mecatrónic

    A Bayesian optimization framework for the automatic tuning of MPC-based shared controllers

    Full text link
    This paper presents a Bayesian optimization framework for the automatic tuning of shared controllers which are defined as a Model Predictive Control (MPC) problem. The proposed framework includes the design of performance metrics as well as the representation of user inputs for simulation-based optimization. The framework is applied to the optimization of a shared controller for an Image Guided Therapy robot. VR-based user experiments confirm the increase in performance of the automatically tuned MPC shared controller with respect to a hand-tuned baseline version as well as its generalization ability

    Intelligent Learning Control System Design Based on Adaptive Dynamic Programming

    Get PDF
    Adaptive dynamic programming (ADP) controller is a powerful neural network based control technique that has been investigated, designed, and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. Experience replay is a powerful technique showing potential to accelerate the training process of learning and control. However, its existing design can not be directly used for model-free ADP design, because it focuses on the forward temporal difference (TD) information (e.g., state-action pair) between the current time step and the future time step, and will need a model network for future information prediction. Uniform random sampling again used for experience replay, is not an efficient technique to learn. Prioritized experience replay (PER) presents important transitions more frequently and has proven to be efficient in the learning process. In order to solve long training periods of ADP controller, the first goal of this thesis is to avoid the usage of model network or identifier of the system. Specifically, the experience tuple is designed with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. The proposed approach is tested for two case studies: cart-pole and triple-link pendulum balancing tasks. The proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link. The second goal of this thesis is to integrate the efficient learning capability of PER into ADP. The detailed theoretical analysis is presented in order to verify the stability of the proposed control technique. The proposed approach improved the required average trial to succeed compared to traditional ADP controller by 60.56% for cart-pole and 56.89% for triple-link balancing tasks. The final goal of this thesis is to validate ADP controller in smart grid to improve current control performance of virtual synchronous machine (VSM) at sudden load changes and a single line to ground fault and reduce harmonics in shunt active filters (SAF) during different loading conditions. The ADP controller produced the fastest response time, low overshoot and in general, the best performance in comparison to the traditional current controller. In SAF, ADP controller reduced total harmonic distortion (THD) of the source current by an average of 18.41% compared to a traditional current controller alone

    Beyond Basins of Attraction: Quantifying Robustness of Natural Dynamics

    Full text link
    Properly designing a system to exhibit favorable natural dynamics can greatly simplify designing or learning the control policy. However, it is still unclear what constitutes favorable natural dynamics and how to quantify its effect. Most studies of simple walking and running models have focused on the basins of attraction of passive limit-cycles and the notion of self-stability. We instead emphasize the importance of stepping beyond basins of attraction. We show an approach based on viability theory to quantify robust sets in state-action space. These sets are valid for the family of all robust control policies, which allows us to quantify the robustness inherent to the natural dynamics before designing the control policy or specifying a control objective. We illustrate our formulation using spring-mass models, simple low dimensional models of running systems. We then show an example application by optimizing robustness of a simulated planar monoped, using a gradient-free optimization scheme. Both case studies result in a nonlinear effective stiffness providing more robustness.Comment: 15 pages. This work has been accepted to IEEE Transactions on Robotics (2019
    corecore