Search CORE

19 research outputs found

Perspective Taking in Deep Reinforcement Learning Agents

Author: Aru Jaan
Labash Aqeel
Matiisen Tambet
Tampuu Ardi
Vicente Raul
Publication venue
Publication date: 15/04/2020
Field of study

Perspective taking is the ability to take the point of view of another agent. This skill is not unique to humans as it is also displayed by other animals like chimpanzees. It is an essential ability for social interactions, including efficient cooperation, competition, and communication. Here we present our progress toward building artificial agents with such abilities. We implemented a perspective taking task inspired by experiments done with chimpanzees. We show that agents controlled by artificial neural networks can learn via reinforcement learning to pass simple tests that require perspective taking capabilities. We studied whether this ability is more readily learned by agents with information encoded in allocentric or egocentric form for both their visual perception and motor actions. We believe that, in the long run, building better artificial agents with perspective taking ability can help us develop artificial intelligence that is more human-like and easier to communicate with

arXiv.org e-Print Archive

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Näotuvastus Fotis andmebaasi põhjal

Author: Matiisen Tambet
Publication venue
Publication date: 03/03/2015
Field of study

DSpace at Tartu University Library

Do Deep Reinforcement Learning Agents Model Intentions?

Author: Aqeel Labash
Daniel Majoral
Jaan Aru
Raul Vicente
Tambet Matiisen
Publication venue: 'MDPI AG'
Publication date: 28/12/2022
Field of study

Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each agent models or represents other agents in their environment remains unclear. In this work, we test whether deep reinforcement learning agents trained with the multi-agent deep deterministic policy gradient (MADDPG) algorithm explicitly represent other agents’ intentions (their specific aims or plans) during a task in which the agents have to coordinate the covering of different spots in a 2D environment. In particular, we tracked over time the performance of a linear decoder trained to predict the final targets of all agents from the hidden-layer activations of each agent’s neural network controller. We observed that the hidden layers of agents represented explicit information about other agents’ intentions, i.e., the target landmark the other agent ended up covering. We also performed a series of experiments in which some agents were replaced by others with fixed targets to test the levels of generalization of the trained agents. We noticed that during the training phase, the agents developed a preference for each landmark, which hindered generalization. To alleviate the above problem, we evaluated simple changes to the MADDPG training algorithm which lead to better generalization against unseen agents. Our method for confirming intention modeling in deep learning agents is simple to implement and can be used to improve the generalization of multi-agent systems in fields such as robotics, autonomous vehicles and smart cities

Multidisciplinary Digital Publishing Institute

Efficient neural decoding of self-location with a deep recurrent network.

Author: Ardi Tampuu
Caswell Barry
H Freyja Ólafsdóttir
Raul Vicente
Tambet Matiisen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2019
Field of study

Place cells in the mammalian hippocampus signal self-location with sparse spatially stable firing fields. Based on observation of place cell activity it is possible to accurately decode an animal's location. The precision of this decoding sets a lower bound for the amount of information that the hippocampal population conveys about the location of the animal. In this work we use a novel recurrent neural network (RNN) decoder to infer the location of freely moving rats from single unit hippocampal recordings. RNNs are biologically plausible models of neural circuits that learn to incorporate relevant temporal context without the need to make complicated assumptions about the use of prior information to predict the current state. When decoding animal position from spike counts in 1D and 2D-environments, we show that the RNN consistently outperforms a standard Bayesian approach with either flat priors or with memory. In addition, we also conducted a set of sensitivity analysis on the RNN decoder to determine which neurons and sections of firing fields were the most influential. We found that the application of RNNs to neural data allowed flexible integration of temporal context, yielding improved accuracy relative to the more commonly used Bayesian approaches and opens new avenues for exploration of the neural code

Directory of Open Access Journals

FigShare

Progression of behavioral statistics when passing from competitive to collaborative rewarding scheme.

Author: Ardi Tampuu (3906790)
Dorian Kodelja (3906787)
Ilya Kuzovkin (3906796)
Jaan Aru (3747808)
Juhan Aru (3906799)
Kristjan Korjus (3088494)
Raul Vicente (3088500)
Tambet Matiisen (3906793)
Publication venue
Publication date
Field of study

<p>Each blue dot corresponds to the average of one game. Red line depicts the average across games (also given in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0172395#pone.0172395.s009" target="_blank">S2 Table</a>). (a) The game lasts longer when the agents have a strong incentive to collaborate. (b) Forcing the agents to collaborate decreases the proportion of angled shots that bounce off the walls before reaching the opposite player. Notice the two aberrant values for <i>ρ</i> = −0.75 correspond to games where the agents never reach the collaborative strategy of keeping the ball alive by passing it horizontally. (c) Serving time decreases when agents receive stronger positive rewards for scoring.</p

FigShare

Cooperative game—game situations and the Q-values predicted by the agents.

Author: Ardi Tampuu (3906790)
Dorian Kodelja (3906787)
Ilya Kuzovkin (3906796)
Jaan Aru (3747808)
Juhan Aru (3906799)
Kristjan Korjus (3088494)
Raul Vicente (3088500)
Tambet Matiisen (3906793)
Publication venue
Publication date
Field of study

<p>A) The ball is moving slowly and the future reward expectation is not very low—the agents do not expect to miss the slow balls. B) The ball is moving faster and the reward expectation is much more negative—the agents expect to miss the ball in the near future. C) The ball is inevitably going out of play. Both agents’ reward expectations drop accordingly. See supporting information for videos illustrating other game situations and the corresponding agents’ Q-values.</p

FigShare