680 research outputs found

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Guided Deep Reinforcement Learning for Swarm Systems

    Full text link
    In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.Comment: 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Worksho

    Variational inference for policy search in changing situations

    Get PDF
    Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience while negative experience is neglected. However, the KL-divergence is not symmetric and we can also minimize the the reversed KL-divergence, which is typically used in variational inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are determined automatically by the inference. The algorithm meets the performance of state-of-theart methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of pushes with human-like 2-link and 4-link robots

    Pressure Calculation in Polar and Charged Systems using Ewald Summation: Results for the Extended Simple Point Charge Model of Water

    Get PDF
    Ewald summation and physically equivalent methods such as particle-mesh Ewald, kubic-harmonic expansions, or Lekner sums are commonly used to calculate long-range electrostatic interactions in computer simulations of polar and charged substances. The calculation of pressures in such systems is investigated. We find that the virial and thermodynamic pressures differ because of the explicit volume dependence of the effective, resummed Ewald potential. The thermodynamic pressure, obtained from the volume derivative of the Helmholtz free energy, can be expressed easily for both ionic and rigid molecular systems. For a system of rigid molecules, the electrostatic energy and the forces at the atom positions are required, both of which are readily available in molecular dynamics codes. We then calculate the virial and thermodynamic pressures for the extended simple point charge (SPC/E) water model at standard conditions. We find that the thermodynamic pressure exhibits considerably less system size dependence than the virial pressure. From an analysis of the cross correlation between the virial and thermodynamic pressure, we conclude that the thermodynamic pressure should be used to drive volume fluctuations in constant-pressure simulations.Comment: RevTeX, 19 pages, 2 EPS figures; in press: Journal of Chemical Physics, 15-August-199

    Fitted Q-iteration by advantage weighted regression

    Get PDF
    Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces

    On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

    Get PDF
    Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks

    "The numerical accuracy of truncated Ewald sums for periodic systems with long-range Coulomb interactions"

    Full text link
    Ewald summation is widely used to calculate electrostatic interactions in computer simulations of condensed-matter systems. We present an analysis of the errors arising from truncating the infinite real- and Fourier-space lattice sums in the Ewald formulation. We derive an optimal choice for the Fourier-space cutoff given a screening parameter η\eta. We find that the number of vectors in Fourier space required to achieve a given accuracy scales with η3\eta^3. The proposed method can be used to determine computationally efficient parameters for Ewald sums, to assess the quality of Ewald-sum implementations, and to compare different implementations.Comment: 6 pages, 3 figures (Encapsulated PostScript), LaTe

    Formation of Polymorphic Cluster Phases for Purely Repulsive Soft Spheres

    Full text link
    We present results from density functional theory and computer simulations that unambiguously predict the occurrence of first-order freezing transitions for a large class of ultrasoft model systems into cluster crystals. The clusters consist of fully overlapping particles and arise without the existence of attractive forces. The number of particles participating in a cluster scales linearly with density, therefore the crystals feature density-independent lattice constants. Clustering is accompanied by polymorphic bcc-fcc transitions, with fcc being the stable phase at high densities.Comment: 4 pages, 5 figures, submitted to Phys. Rev. Let

    MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network

    Full text link
    Estimating 6D poses of objects is an essential computer vision task. However, most conventional approaches rely on camera data from a single perspective and therefore suffer from occlusions. We overcome this issue with our novel multi-view 6D pose estimation method called MV6D which accurately predicts the 6D poses of all objects in a cluttered scene based on RGB-D images from multiple perspectives. We base our approach on the PVN3D network that uses a single RGB-D image to predict keypoints of the target objects. We extend this approach by using a combined point cloud from multiple views and fusing the images from each view with a DenseFusion layer. In contrast to current multi-view pose detection networks such as CosyPose, our MV6D can learn the fusion of multiple perspectives in an end-to-end manner and does not require multiple prediction stages or subsequent fine tuning of the prediction. Furthermore, we present three novel photorealistic datasets of cluttered scenes with heavy occlusions. All of them contain RGB-D images from multiple perspectives and the ground truth for instance semantic segmentation and 6D pose estimation. MV6D significantly outperforms the state-of-the-art in multi-view 6D pose estimation even in cases where the camera poses are known inaccurately. Furthermore, we show that our approach is robust towards dynamic camera setups and that its accuracy increases incrementally with an increasing number of perspectives.Comment: Accepted at IROS 202
    • …
    corecore