Search CORE

4 research outputs found

Deep Reinforcement Learning for Swarm Systems

Author: Adrian Sosic
Hüttenrauch Maximilian
Neumann Gerhard
Publication venue: Journal of Machine Learning Research
Publication date: 28/02/2019
Field of study

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, the observation vector for decentralized decision making is represented by a concatenation of the (local) information an agent gathers about other agents. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions, where we treat the agents as samples and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and neural networks trained end-to-end. We evaluate the representation on two well-known problems from the swarm literature in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents, facilitating the development of complex collective strategies

University of Lincoln Institutional Repository

KITopen

Exploiting Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Author: Huttenrauch Max
Neumann Gerhard
Sosic Adrian
Publication venue: Springer International Publishing
Publication date: 01/01/2018
Field of study

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the gents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols

University of Lincoln Institutional Repository

sNN-LDS: Spatio-temporal Non-negative Sparse Coding for Human Action Recognition

Author: Eggert Julian
Guthier Thomas
Sosic Adrian
Willert Volker
Publication venue
Publication date: 01/01/2014
Field of study

TUbiblio

Crossref

Finding a Tradeoff between Compression and Loss in Motion Compensated Video Coding

Author: Eggert Julian
Guthier Thomas
Sosic Adrian
Willert Volker
Publication venue
Publication date: 01/01/2012
Field of study

In video coding, affine motion models combined with a quadtree decomposition have often been suggested as an extension to the mostly used translational models combined with a blockwise decomposition. What is missing so far is a thorough analysis to judge the tradeoff between using more complex motion models or more elaborate decomposition methods in terms of data compression and information loss. In this paper, we compare different polynomial motion models with a quadtree decomposition concerning motion model complexity and granularity of decomposition. We provide a statistical evaluation based on optical flow databases to quantitatively find a tradeoff between bitrate and reconstruction error

TUbiblio