Search CORE

272 research outputs found

Off-Policy Temporal Difference Learning For Robotics And Autonomous Systems

Author: Jeong Heejin
Publication venue: ScholarlyCommons
Publication date: 01/01/2020
Field of study

Reinforcement learning (RL) is a rapidly advancing field with implications in autonomous vehicles, medicine, finance, along with several other applications. Particularly, off-policy temporal difference (TD) learning, a specific type of RL technique, has been widely used in a variety of autonomous tasks. However, there remain significant challenges that must be overcome before it can be successfully applied to various real-world applications. In this thesis, we specifically address several major challenges in off-policy TD learning. In the first part of the thesis, we introduce an efficient method of learning complex stand-up motion of humanoid robots by Q-learning. Standing up after falling is an essential ability for humanoid robots yet it is difficult to learn flexible stand-up motions for various fallen positions due to the complexity of the task. We reduce sample complexity of learning by applying a clustering method and utilizing the bilateral symmetric feature of humanoid robots. The learned policy is demonstrated in both simulation and on a physical robot. The greedy update of Q-learning, however, often causes overoptimism and instability. In the second part of the thesis, we propose a novel Bayesian approach to Q-learning, called ADFQ, which improves the greedy update issues by providing a principled way of updating Q-values based on uncertainty of Q-belief distributions. The algorithm converges to Q-learning as the uncertainty approaches zero, and its efficient computational complexity enables the algorithm to be extended with a neural network. Both ADFQ and its neural network extension outperform their comparing algorithms by improving the estimation bias and converging faster to optimal Q-values. In the last part of the thesis, we apply off-policy TD methods to solve the active information acquisition problem where an autonomous agent is tasked with acquiring information about targets of interests. Off-policy TD learning provides solutions for classical challenges in this problem -- system model dependence and the difficulty of computing information-theoretic cost functions for a long planning horizon. In particular, we introduce a method of learning a unified policy for in-sight tracking, navigation, and exploration. The policy shows robust behavior for tracking agile and anomalous targets with a partially known target model

Centralized learning and planning : for cognitive robots operating in human domains

Author: Janssen R.J.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2014
Field of study

Repository TU/e

Activity Report 2020 : Automatic Control Lund University

Author
Publication venue: Department of Automatic Control, Lund University
Publication date: 01/05/2021
Field of study

Lund University Publications

Design of large polyphase filters in the Quadratic Residue Number System

Author: Cardarilli G
Nannarelli A
Oster Y
Petricca M
Re M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study