Search CORE

4,650 research outputs found

Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation

Author: Abdulsamad Hany
Peters Jan
Publication venue
Publication date: 01/01/2020
Field of study

The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication and extremely over-parameterized models have come with the cost of an overall reduction in our ability to interpret the resulting policies. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems in order to break down complex dynamics into simpler components. We exploit the rich representational power of probabilistic graphical models and derive an expectation-maximization (EM) algorithm for learning a sequence model to capture the temporal structure of the data and automatically decompose nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.Comment: 2nd Annual Conference on Learning for Dynamics and Contro

arXiv.org e-Print Archive

MPG.PuRe

f-Divergence constrained policy improvement

Author: Belousov Boris
Peters Jan
Publication venue
Publication date: 04/04/2018
Field of study

To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we consider a more general class of f-divergences and derive the corresponding policy update rules. The generic solution is expressed through the derivative of the convex conjugate function to f and includes the KL solution as a special case. Within the class of f-divergences, we further focus on a one-parameter family of

\alpha

-divergences to study effects of the choice of divergence on policy improvement. Previously known as well as new policy updates emerge for different values of

\alpha

. We show that every type of policy update comes with a compatible policy evaluation resulting from the chosen f-divergence. Interestingly, the mean-squared Bellman error minimization is closely related to policy evaluation with the Pearson

\chi^2

-divergence penalty, while the KL divergence results in the soft-max policy update and a log-sum-exp critic. We carry out asymptotic analysis of the solutions for different values of

\alpha

and demonstrate the effects of using different divergence functions on a multi-armed bandit problem and on common standard reinforcement learning problems

arXiv.org e-Print Archive

TUbiblio

Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks

Author: Peters Jan
Rueckert Elmar
Tanneberg Daniel
Publication venue: 'Elsevier BV'
Publication date: 23/10/2018
Field of study

Autonomous robots need to interact with unknown, unstructured and changing environments, constantly facing novel challenges. Therefore, continuous online adaptation for lifelong-learning and the need of sample-efficient mechanisms to adapt to changes in the environment, the constraints, the tasks, or the robot itself are crucial. In this work, we propose a novel framework for probabilistic online motion planning with online adaptation based on a bio-inspired stochastic recurrent neural network. By using learning signals which mimic the intrinsic motivation signalcognitive dissonance in addition with a mental replay strategy to intensify experiences, the stochastic recurrent network can learn from few physical interactions and adapts to novel environments in seconds. We evaluate our online planning and adaptation framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is shown by learning unknown workspace constraints sample-efficiently from few physical interactions while following given way points.Comment: accepted in Neural Network

arXiv.org e-Print Archive

TUbiblio

MPG.PuRe

CRC 1114 - Report Membrane Deformation by N-BAR Proteins: Extraction of membrane geometry and protein diffusion characteristics from MD simulations

Author: Gräser Carsten
Klein Rupert
Peters Jan Henning
Publication venue
Publication date: 01/12/2017
Field of study

We describe simulations of Proteins and artificial pseudo-molecules interacting and shaping lipid bilayer membranes. We extract protein diffusion Parameters, membrane deformation profiles and the elastic properties of the used membrane models in preparation of calculations based on a large scale continuum model

arXiv.org e-Print Archive

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Convergence & Competition: United Ways and Community Foundations - A National Inquiry

Author: Jan Masaoka
Jeanne Bell Peters
Nancy Ragey
Publication venue: CompassPoint Nonprofit Services
Publication date: 08/08/2005
Field of study

This U.S. report summarizes key findings of the research that was commissioned to support the active dialogue among leaders of United Ways and other community foundations about their respective roles in community philanthropy and what the options for strategic co-existence -- if not full-fledged cooperation -- will look like in the coming years

IssueLab

Interactive television or enhanced televisiion? : the Dutch users interest in applications of ITV via set-top boxes

Author: Dijk Jan van
Heuvelman Ard
Peters Oscar
Publication venue
Publication date: 01/01/2003
Field of study

This paper is both an analysis of the phenomenon of interactive television with background concepts of interactivity and television and a report of an empirical investigation among Dutch users of set-top-box ITV. In the analytic part a distinction is made between levels of interactivity in the applications of ITV. Activities labelled as selection, customisation, transaction and reaction reveal low levels of interactivity. They may be called ‘enhanced television’. They are extensions of existing television programmes that keep their linear character. Activities called production and conversation have the potential of higher interactivity. They may lead to ‘real’ interactive television as the user input makes a difference to programmes. It is suggested that so-called hybrid ITV– TV combined with telephone and email reply channels- and (broadband) Internet ITV offer better opportunities for high interactivity than set-top-box ITV. \ud The empirical investigation shows that the demand of subscribers to set-top-box ITV in the Netherlands matches supply. They favour the less interactive applications of selection and reaction. Other striking results are that young subscribers appreciate interactive applications more than the older ones and that those with a low level of education prefer these applications more than high educated subscribers. No significant gender differences were found

University of Twente Research Information