Search CORE

29,304 research outputs found

Shaping in Practice: Training Wheels to Learn Fast Hopping Directly in Hardware

Author: Heim Steve
Ruppert Felix
Sarvestani Alborz A.
Spröwitz Alexander
Publication venue
Publication date: 01/01/2018
Field of study

Learning instead of designing robot controllers can greatly reduce engineering effort required, while also emphasizing robustness. Despite considerable progress in simulation, applying learning directly in hardware is still challenging, in part due to the necessity to explore potentially unstable parameters. We explore the concept of shaping the reward landscape with training wheels: temporary modifications of the physical hardware that facilitate learning. We demonstrate the concept with a robot leg mounted on a boom learning to hop fast. This proof of concept embodies typical challenges such as instability and contact, while being simple enough to empirically map out and visualize the reward landscape. Based on our results we propose three criteria for designing effective training wheels for learning in robotics. A video synopsis can be found at https://youtu.be/6iH5E3LrYh8.Comment: Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2018, 6 pages, 6 figure

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Approaches for Future Internet architecture design and Quality of Experience (QoE) Control

Author: Battilotti Stefano
Canale Silvia
DELLI PRISCOLI Francesco
GORI GIORGI Claudio
Monaco Salvatore
Panfili Martina
Pietrabissa Antonio
Suraci V.
Publication venue: 'World Scientific and Engineering Academy and Society (WSEAS)'
Publication date: 01/01/2015
Field of study

Researching a Future Internet capable of overcoming the current Internet limitations is a strategic investment. In this respect, this paper presents some concepts that can contribute to provide some guidelines to overcome the above-mentioned limitations. In the authors' vision, a key Future Internet target is to allow applications to transparently, efficiently and flexibly exploit the available network resources with the aim to match the users' expectations. Such expectations could be expressed in terms of a properly defined Quality of Experience (QoE). In this respect, this paper provides some approaches for coping with the QoE provision problem

Archivio della ricerca- Università di Roma La Sapienza

From Parameter Tuning to Dynamic Heuristic Selection

Author: Semendiak Yevhenii
Publication venue
Publication date: 18/06/2020
Field of study

The importance of balance between exploration and exploitation plays a crucial role while solving combinatorial optimization problems. This balance is reached by two general techniques: by using an appropriate problem solver and by setting its proper parameters. Both problems were widely studied in the past and the research process continues up until now. The latest studies in the field of automated machine learning propose merging both problems, solving them at design time, and later strengthening the results at runtime. To the best of our knowledge, the generalized approach for solving the parameter setting problem in heuristic solvers has not yet been proposed. Therefore, the concept of merging heuristic selection and parameter control have not been introduced. In this thesis, we propose an approach for generic parameter control in meta-heuristics by means of reinforcement learning (RL). Making a step further, we suggest a technique for merging the heuristic selection and parameter control problems and solving them at runtime using RL-based hyper-heuristic. The evaluation of the proposed parameter control technique on a symmetric traveling salesman problem (TSP) revealed its applicability by reaching the performance of tuned in online and used in isolation underlying meta-heuristic. Our approach provides the results on par with the best underlying heuristics with tuned parameters.:1 Introduction 1 1.1 Motivation 1 1.2 Research objective 2 1.3 Solution overview 2 2 Background and RelatedWork Analysis 3 2.1 Optimization Problems and their Solvers 3 2.2 Heuristic Solvers for Optimization Problems 9 2.3 Setting Algorithm Parameters 19 2.4 Combined Algorithm Selection and Hyper-Parameter Tuning Problem 27 2.5 Conclusion on Background and Related Work Analysis 28 3 Online Selection Hyper-Heuristic with Generic Parameter Control 31 3.1 Combined Parameter Control and Algorithm Selection Problem 31 3.2 Search Space Structure 32 3.3 Parameter Prediction Process 34 3.4 Low-Level Heuristics 35 3.5 Conclusion of Concept 36 4 Implementation Details 37 4.2 Search Space 40 4.3 Prediction Process 43 4.4 Low Level Heuristics 48 4.5 Conclusion 52 5 Evaluation 55 5.1 Optimization Problem 55 5.2 Environment Setup 56 5.3 Meta-heuristics Tuning 56 5.4 Concept Evaluation 60 5.5 Analysis of HH-PC Settings 74 5.6 Conclusion 79 6 Conclusion 81 7 FutureWork 83 7.1 Prediction Process 83 7.2 Search Space 84 7.3 Evaluations and Benchmarks 84 Bibliography 87 A Evaluation Results 99 A.1 Results in Figures 99 A.2 Results in numbers 10

Technische Universität Dresden: Qucosa

Data-driven Economic NMPC using Reinforcement Learning

Author: Gros Sébastien
Zanon Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/04/2019
Field of study

Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal control without relying on a model of the system. However, RL struggles to provide hard guarantees on the behavior of the resulting control scheme. In contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC) are standard tools for the closed-loop optimal control of complex systems with constraints and limitations, and benefit from a rich theory to assess their closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the quality of the model underlying the control scheme. In this paper, we show that an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system even when using a wrong model. This result also holds for real systems having stochastic dynamics. This entails that ENMPC can be used as a new type of function approximator within RL. Furthermore, we investigate our results in the context of ENMPC and formally connect them to the concept of dissipativity, which is central for the ENMPC stability. Finally, we detail how these results can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply these tools on both a classical linear MPC setting and a standard nonlinear example from the ENMPC literature

arXiv.org e-Print Archive

Archivio della ricerca della Scuola IMT Alti Studi Lucca