Search CORE

801 research outputs found

Deep Reinforcement Learning for Swarm Systems

Author: Hüttenrauch Maximilian
Neumann Gerhard
Šošić Adrian
Publication venue
Publication date: 01/01/2019
Field of study

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

arXiv.org e-Print Archive

TUbiblio

Guided Deep Reinforcement Learning for Swarm Systems

Author: Hüttenrauch Maximilian
Neumann Gerhard
Šošić Adrian
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Worksho

arXiv.org e-Print Archive

TUbiblio

Variational inference for policy search in changing situations

Author: Neumann Gerhard
Publication venue
Publication date: 28/06/2011
Field of study

Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience while negative experience is neglected. However, the KL-divergence is not symmetric and we can also minimize the the reversed KL-divergence, which is typically used in variational inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are determined automatically by the inference. The algorithm meets the performance of state-of-theart methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of pushes with human-like 2-link and 4-link robots

University of Lincoln Institutional Repository

CiteSeerX

Pressure Calculation in Polar and Charged Systems using Ewald Summation: Results for the Extended Simple Point Charge Model of Water

Author: Gerhard Hummer
Martin Neumann
Niels Gro/nbech-Jensen
Smith W.
Publication venue: 'AIP Publishing'
Publication date: 01/01/1998
Field of study

Ewald summation and physically equivalent methods such as particle-mesh Ewald, kubic-harmonic expansions, or Lekner sums are commonly used to calculate long-range electrostatic interactions in computer simulations of polar and charged substances. The calculation of pressures in such systems is investigated. We find that the virial and thermodynamic pressures differ because of the explicit volume dependence of the effective, resummed Ewald potential. The thermodynamic pressure, obtained from the volume derivative of the Helmholtz free energy, can be expressed easily for both ionic and rigid molecular systems. For a system of rigid molecules, the electrostatic energy and the forces at the atom positions are required, both of which are readily available in molecular dynamics codes. We then calculate the virial and thermodynamic pressures for the extended simple point charge (SPC/E) water model at standard conditions. We find that the thermodynamic pressure exhibits considerably less system size dependence than the virial pressure. From an analysis of the cross correlation between the virial and thermodynamic pressure, we conclude that the thermodynamic pressure should be used to drive volume fluctuations in constant-pressure simulations.Comment: RevTeX, 19 pages, 2 EPS figures; in press: Journal of Chemical Physics, 15-August-199

arXiv.org e-Print Archive

Fitted Q-iteration by advantage weighted regression

Author: Neumann Gerhard
Peters Jan
Publication venue
Publication date: 01/01/2009
Field of study

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces

University of Lincoln Institutional Repository

CiteSeerX

TUbiblio

MPG.PuRe

On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

Author: Becker Philipp
Neumann Gerhard
Publication venue
Publication date: 18/10/2022
Field of study

Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks

KITopen

"The numerical accuracy of truncated Ewald sums for periodic systems with long-range Coulomb interactions"

Author: Allen
Belhadj
de Leeuw
Ewald
Fincham
Gerhard Hummer
Kolafa
Kusalik
Kusalik
Mordell
Neumann
Neumann
Perram
Rycerz
Whittaker
Publication venue: 'Elsevier BV'
Publication date: 10/02/1995
Field of study

Ewald summation is widely used to calculate electrostatic interactions in computer simulations of condensed-matter systems. We present an analysis of the errors arising from truncating the infinite real- and Fourier-space lattice sums in the Ewald formulation. We derive an optimal choice for the Fourier-space cutoff given a screening parameter

\eta

. We find that the number of vectors in Fourier space required to achieve a given accuracy scales with

\eta^3

. The proposed method can be used to determine computationally efficient parameters for Ewald sums, to assess the quality of Ewald-sum implementations, and to compare different implementations.Comment: 6 pages, 3 figures (Encapsulated PostScript), LaTe

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Formation of Polymorphic Cluster Phases for Purely Repulsive Soft Spheres

Author: Gottwald Dieter
Kahl Gerhard
Likos Christos N.
Mladek Bianca M.
Neumann Martin
Publication venue: 'American Physical Society (APS)'
Publication date: 07/11/2005
Field of study

We present results from density functional theory and computer simulations that unambiguously predict the occurrence of first-order freezing transitions for a large class of ultrasoft model systems into cluster crystals. The clusters consist of fully overlapping particles and arise without the existence of attractive forces. The number of particles participating in a cluster scales linearly with density, therefore the crystals feature density-independent lattice constants. Clustering is accompanied by polymorphic bcc-fcc transitions, with fcc being the stable phase at high densities.Comment: 4 pages, 5 figures, submitted to Phys. Rev. Let

arXiv.org e-Print Archive

Crossref

Hierarchical relative entropy policy search

Author: Daniel Christian
Neumann Gerhard
Peters Jan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2012
Field of study

Many real-world problems are inherently hierarchically structured. The use of this structure in an agent’s policy may well be the key to improved scalability and higher performance. However, such hierarchical structures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy — the ‘mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy determines the action. In this paper, we reformulate learning a hierarchical policy as a latent variable estimation problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solutions while also showing an increased performance in terms of learning speed and quality of the found policy in comparison to the nonhierarchical approach

University of Lincoln Institutional Repository

CiteSeerX

TUbiblio

MPG.PuRe