Search CORE

614 research outputs found

Fitted Q-iteration by advantage weighted regression

Author: Neumann Gerhard
Peters Jan
Publication venue
Publication date: 01/01/2009
Field of study

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces

University of Lincoln Institutional Repository

CiteSeerX

TUbiblio

MPG.PuRe

Hierarchical relative entropy policy search

Author: Daniel Christian
Neumann Gerhard
Peters Jan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2012
Field of study

Many real-world problems are inherently hierarchically structured. The use of this structure in an agent’s policy may well be the key to improved scalability and higher performance. However, such hierarchical structures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy — the ‘mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy determines the action. In this paper, we reformulate learning a hierarchical policy as a latent variable estimation problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solutions while also showing an increased performance in terms of learning speed and quality of the found policy in comparison to the nonhierarchical approach

University of Lincoln Institutional Repository

CiteSeerX

TUbiblio

MPG.PuRe

Non-parametric policy search with limited information loss

Author: Neumann Gerhard
Peters Jan
van Hoof Herke
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2017
Field of study

Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the dataset. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non-parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations. We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated efficiently. Finally, we show that our algorithm can learn a real-robot underpowered swing-up task directly from image data

University of Lincoln Institutional Repository

TUbiblio

MPG.PuRe

Learning of non-parametric control policies with high-dimensional state features

Author: Neumann Gerhard
Peters Jan
Van Hoof Herke
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

Learning complex control policies from highdimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn nonparametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from highdimensional image data

University of Lincoln Institutional Repository

TUbiblio

CiteSeerX

MPG.PuRe

Non-parametric policy search with limited information loss

Author: Hoof Herke van
Neumann Gerhard
Peters Jan
Publication venue: Journal of Machine Learning Research
Publication date: 14/04/2020
Field of study

Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the data set. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non- parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations. We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated efficiently. Finally, we show that our algorithm can learn a real-robot under-powered swing-up task directly from image data

KITopen

State-regularized policy search for linearized dynamical systems

Author: Abdulsamad Hany
Arenz Oleg
Neumann Gerhard
Peters Jan
Publication venue
Publication date: 01/01/2017
Field of study

Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedbackcontrollers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policies can still differ significantly, rendering the used local approximate models invalid. To alleviate this issue we propose enforcing a relative entropy constraint not only on the policy update, but also on the update of the state distribution, around which the dynamics and cost are being approximated. We present a derivation of the closed-form policy update and show that our approach outperforms related methods on two nonlinear and highly dynamic simulated systems

University of Lincoln Institutional Repository

TUbiblio

Association for the Advancement of Artificial Intelligence: AAAI Publications

Probabilistic prioritization of movement primitives

Author: Lioutikov Rudolf
Neumann Gerhard
Paraschos Alexandros
Peters Jan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Movement prioritization is a common approach to combine controllers of different tasks for redundant robots, where each task is assigned a priority. The priorities of the tasks are often hand-tuned or the result of an optimization, but seldomly learned from data. This paper combines Bayesian task prioritization with probabilistic movement primitives to prioritize full motion sequences that are learned from demonstrations. Probabilistic movement primitives (ProMPs) can encode distributions of movements over full motion sequences and provide control laws to exactly follow these distributions. The probabilistic formulation allows for a natural application of Bayesian task prioritization. We extend the ProMP controllers with an additional feedback component that accounts inaccuracies in following the distribution and allows for a more robust prioritization of primitives. We demonstrate how the task priorities can be obtained from imitation learning and how different primitives can be combined to solve even unseen task-combinations. Due to the prioritization, our approach can efficiently learn a combination of tasks without requiring individual models per task combination. Further, our approach can adapt an existing primitive library by prioritizing additional controllers, for example, for implementing obstacle avoidance. Hence, the need of retraining the whole library is avoided in many cases. We evaluate our approach on reaching movements under constraints with redundant simulated planar robots and two physical robot platforms, the humanoid robot “iCub” and a KUKA LWR robot arm

University of Lincoln Institutional Repository

TUbiblio

Crossref

MPG.PuRe

Study of the interaction between a novel, protein stabilizing dipeptide and Interferon-alpha-2a by construction of a Markov State Model from Molecular Dynamics simulations

Author: Peters Günther H. J.
Tosstorff Andreas
Winter Gerhard
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Open Access LMU

Online Research Database In Technology

Probabilistic Approach to Physical Object Disentangling

Author: Arenz Oleg
Neumann Gerhard
Pajarinen Joni
Peters Jan
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 12/10/2020
Field of study

KITopen

Probabilistic approach to physical object disentangling

Author: Arenz Oleg
Neumann Gerhard
Pajarinen Joni
Peters Jan
Publication venue
Publication date: 02/07/2020
Field of study

Physically disentangling entangled objects from each other is a problem encountered in waste segregation or in any task that requires disassembly of structures. Often there are no object models, and, especially with cluttered irregularly shaped objects, the robot can not create a model of the scene due to occlusion. One of our key insights is that based on previous sensory input we are only interested in moving an object out of the disentanglement around obstacles. That is, we only need to know where the robot can successfully move in order to plan the disentangling. Due to the uncertainty we integrate information about blocked movements into a probability map. The map defines the probability of the robot successfully moving to a specific configuration. Using as cost the failure probability of a sequence of movements we can then plan and execute disentangling iteratively. Since our approach circumvents only previously encountered obstacles, new movements will yield information about unknown obstacles that block movement until the robot has learned to circumvent all obstacles and disentangling succeeds. In the experiments, we use a special probabilistic version of the Rapidly exploring Random Tree (RRT) algorithm for planning and demonstrate successful disentanglement of objects both in 2-D and 3-D simulation, and, on a KUKA LBR 7-DOF robot. Moreover, our approach outperforms baseline methods

arXiv.org e-Print Archive