Search CORE

19 research outputs found

Perspective Taking in Deep Reinforcement Learning Agents

Author: Aru Jaan
Labash Aqeel
Matiisen Tambet
Tampuu Ardi
Vicente Raul
Publication venue
Publication date: 15/04/2020
Field of study

Perspective taking is the ability to take the point of view of another agent. This skill is not unique to humans as it is also displayed by other animals like chimpanzees. It is an essential ability for social interactions, including efficient cooperation, competition, and communication. Here we present our progress toward building artificial agents with such abilities. We implemented a perspective taking task inspired by experiments done with chimpanzees. We show that agents controlled by artificial neural networks can learn via reinforcement learning to pass simple tests that require perspective taking capabilities. We studied whether this ability is more readily learned by agents with information encoded in allocentric or egocentric form for both their visual perception and motor actions. We believe that, in the long run, building better artificial agents with perspective taking ability can help us develop artificial intelligence that is more human-like and easier to communicate with

arXiv.org e-Print Archive

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Kuriteoga saaduks peetava vara kriminaalmenetluslik väljaandmine seaduslikule valdajale

Author: Tampuu Tambet
Publication venue: Tartu : Tartu Ülikool
Publication date: 01/01/1997
Field of study

https://www.ester.ee/record=b3524346*es

DSpace at Tartu University Library

Efficient neural decoding of self-location with a deep recurrent network.

Author: Ardi Tampuu
Caswell Barry
H Freyja Ólafsdóttir
Raul Vicente
Tambet Matiisen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2019
Field of study

Place cells in the mammalian hippocampus signal self-location with sparse spatially stable firing fields. Based on observation of place cell activity it is possible to accurately decode an animal's location. The precision of this decoding sets a lower bound for the amount of information that the hippocampal population conveys about the location of the animal. In this work we use a novel recurrent neural network (RNN) decoder to infer the location of freely moving rats from single unit hippocampal recordings. RNNs are biologically plausible models of neural circuits that learn to incorporate relevant temporal context without the need to make complicated assumptions about the use of prior information to predict the current state. When decoding animal position from spike counts in 1D and 2D-environments, we show that the RNN consistently outperforms a standard Bayesian approach with either flat priors or with memory. In addition, we also conducted a set of sensitivity analysis on the RNN decoder to determine which neurons and sections of firing fields were the most influential. We found that the application of RNNs to neural data allowed flexible integration of temporal context, yielding improved accuracy relative to the more commonly used Bayesian approaches and opens new avenues for exploration of the neural code

Directory of Open Access Journals

FigShare

Cooperative game—game situations and the Q-values predicted by the agents.

Author: Ardi Tampuu (3906790)
Dorian Kodelja (3906787)
Ilya Kuzovkin (3906796)
Jaan Aru (3747808)
Juhan Aru (3906799)
Kristjan Korjus (3088494)
Raul Vicente (3088500)
Tambet Matiisen (3906793)
Publication venue
Publication date
Field of study

A) The ball is moving slowly and the future reward expectation is not very low—the agents do not expect to miss the slow balls. B) The ball is moving faster and the reward expectation is much more negative—the agents expect to miss the ball in the near future. C) The ball is inevitably going out of play. Both agents’ reward expectations drop accordingly. See supporting information for videos illustrating other game situations and the corresponding agents’ Q-values.</p

FigShare

Evolution of the behavior of the collaborative agents during training.

Author: Ardi Tampuu (3906790)
Dorian Kodelja (3906787)
Ilya Kuzovkin (3906796)
Jaan Aru (3747808)
Juhan Aru (3906799)
Kristjan Korjus (3088494)
Raul Vicente (3088500)
Tambet Matiisen (3906793)
Publication venue
Publication date
Field of study

(a) The number of paddle-bounces increases as the players get better at reaching the ball. (b) The frequency of the ball hitting the upper and lower walls decreases significantly with training. The first 10 epochs are omitted from the plot as very few paddle-bounces were made by the agents and the metric was very noisy. (c) Serving takes a long time—the agents learn to postpone putting the ball into play.</p

FigShare

Progression of behavioral statistics when passing from competitive to collaborative rewarding scheme.

Author: Ardi Tampuu (3906790)
Dorian Kodelja (3906787)
Ilya Kuzovkin (3906796)
Jaan Aru (3747808)
Juhan Aru (3906799)
Kristjan Korjus (3088494)
Raul Vicente (3088500)
Tambet Matiisen (3906793)
Publication venue
Publication date
Field of study

Each blue dot corresponds to the average of one game. Red line depicts the average across games (also given in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0172395#pone.0172395.s009" target="_blank">S2 Table</a>). (a) The game lasts longer when the agents have a strong incentive to collaborate. (b) Forcing the agents to collaborate decreases the proportion of angled shots that bounce off the walls before reaching the opposite player. Notice the two aberrant values for ρ = −0.75 correspond to games where the agents never reach the collaborative strategy of keeping the ball alive by passing it horizontally. (c) Serving time decreases when agents receive stronger positive rewards for scoring.</p

FigShare

Results of games between multiplayer DQN, single-player DQN and four hand-coded algorithms.

Author: Ardi Tampuu (3906790)
Dorian Kodelja (3906787)
Ilya Kuzovkin (3906796)
Jaan Aru (3747808)
Juhan Aru (3906799)
Kristjan Korjus (3088494)
Raul Vicente (3088500)
Tambet Matiisen (3906793)
Publication venue
Publication date
Field of study

The values correspond to an average of 10 games with different random seeds. Score difference means the points scored by the agent mentioned first minus the points of the agent mentioned second. (a) Multi and Single DQN’s performance against each other and against HCN=4 in function of training time. (b) Scores of Single DQN and Multi DQN agents against 4 versions of a handcoded agent trying to keep the center of the paddle level with the ball. N refers to the number of frames a selected action is repeated by the algorithm before selecting a new action (the smaller the better).</p

FigShare