Search CORE

21 research outputs found

Direction Opponency, Not Quadrature, Is Key to the 1/4 Cycle Preference for Apparent Motion in the Motion Energy Model

Author: Bair W
Heess N
Publication venue
Publication date: 25/08/2010
Field of study

Sensitivity to visual motion is a fundamental property of neurons in the visual cortex and has received wide attention in terms of mathematical models. A key feature of many popular models for cortical motion sensors is the use of pairs of functions that are related by a 90° phase shift. This phase relationship, known as quadrature, is the hallmark of the motion energy model and played an important role in the development of a class of model dubbed elaborated Reichardt detectors. For decades, the literature has supported a link between quadrature and the observation that motion detectors and human observers often prefer a 1/4 cycle displacement of an apparent motion stimulus that consists of a pair of sinusoidal gratings. We show that there is essentially no link between quadrature and this preference. Quadrature is neither necessary nor sufficient for a motion sensor to prefer 1/4 cycle displacement, and motion energy is not maximized for a 1/4 cycle step. Other properties of motion sensors are the key: the opponent subtraction of two oppositely tuned stages that individually have sinusoidal displacement tuning curves. Thus, psychophysical and neurophysiological data revealing a preference at or near 1/4 cycle displacement do not offer specific support for common quadrature or energy-based motion models. Instead, they point to a broader class of model

UCL Discovery

Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems

Author: Eccles T
Graepel T
Heess N
Hughes E
Leibo JZ
Lever G
Liu S
Merel J
Sunehag P
Publication venue: Conference on Artificial Life (ALIFE) - How Can Artificial Life Help Solve Societal Challenges?
Publication date: 01/01/2019
Field of study

In nature, group behaviours such as flocking as well as cross-species symbiotic partnerships are observed in vastly different forms and circumstances. We hypothesize that such strategies can arise in response to generic predator-prey pressures in a spatial environment with range-limited sensation and action. We evaluate whether these forms of coordination can emerge by independent multi-agent reinforcement learning in simple multiple-species ecosystems. In contrast to prior work, we avoid hand-crafted shaping rewards, specific actions, or dynamics that would directly encourage coordination across agents. Instead we test whether coordination emerges as a consequence of adaptation without encouraging these specific forms of coordination, which only has indirect benefit. Our simulated ecosystems consist of a generic food chain involving three trophic levels: apex predator, mid-level predator, and prey. We conduct experiments on two different platforms, a 3D physics engine with tens of agents as well as in a 2D grid world with up to thousands. The results clearly confirm our hypothesis and show substantial coordination both within and across species. To obtain these results, we leverage and adapt recent advances in deep reinforcement learning within an ecosystem training protocol featuring homogeneous groups of independent agents from different species (sets of policies), acting in many different random combinations in parallel habitats. The policies utilize neural network architectures that are invariant to agent individuality but not type (species) and that generalize across varying numbers of observed other agents. While the emergence of complexity in artificial ecosystems have long been studied in the artificial life community, the focus has been more on individual complexity and genetic algorithms or explicit modelling, and less on group complexity and reinforcement learning emphasized in this article. Unlike what the name and intuition suggests, reinforcement learning adapts over evolutionary history rather than a life-time and is here addressing the sequential optimization of fitness that is usually approached by genetic algorithms in the artificial life community. We utilize a shift from procedures to objectives, allowing us to bring new powerful machinery to bare, and we see emergence of complex behaviour from a sequence of simple optimization problems

Crossref

UCL Discovery

Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic Systems

Author: Foerster J.
Foerster J.
Hausknecht M.
Heess N.
Hinton G. E.
Ioffe S.
Juliani A.
Kaminer I.
Konda V. R.
Kushner H.
Lample G.
Lillicrap T. P.
Lowe R.
Mao H.
Shalev-Shwartz S.
Silver D.
Sutton R. S.
Tsitsiklis J. N.
Yoon H.-J.
Publication venue
Publication date: 12/12/2018
Field of study

We present a machine learning framework for multi-agent systems to learn both the optimal policy for maximizing the rewards and the encoding of the high dimensional visual observation. The encoding is useful for sharing local visual observations with other agents under communication resource constraints. The actor-encoder encodes the raw images and chooses an action based on local observations and messages sent by the other agents. The machine learning agent generates not only an actuator command to the physical device, but also a communication message to the other agents. We formulate a reinforcement learning problem, which extends the action space to consider the communication action as well. The feasibility of the reinforcement learning framework is demonstrated using a 3D simulation environment with two collaborating agents. The environment provides realistic visual observations to be used and shared between the two agents.Comment: AIAA SciTech 201

arXiv.org e-Print Archive

Crossref

Game Plan: What AI can do for Football, and What Football can do for AI

Author: Ahamed R
Back T
Beauguerlange N
Bouton S
Bridgland A
Broshear J
Cao K
Connor J
De Vylder B
Dutta P
Elie R
Eslami SMA
Galashov A
Garnelo M
Graepel T
Graham I
Hassabis D
Heess N
Hennes D
Jaegle A
Luc P
Moreno P
Muller P
Munos R
Omidshafiei S
Perolat J
Recasens A
Rowland M
Spearman W
Sprechmann P
Steele D
Thornton G
Tuyls K
Valko M
Wang Z
Waskett T
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2021
Field of study

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players’ and coordinated teams’ behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-theart and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual)

UCL Discovery

The Shape Boltzmann Machine: A Strong Model of Object Shape

Author: A Bertozzi
B Russell
C Rother
Christopher K. I. Williams
CKI Williams
D Ackley
D Anguelov
DM Gavrila
G Hinton
H Robbins
H Rowley
J Friedman
John Winn
L Sigal
L Younes
N Roux Le
N Roux Le
Nicolas Heess
P Kohli
PF Felzenszwalb
RM Neal
RM Neal
S. M. Ali Eslami
T Cootes
TF Chan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2014
Field of study

Crossref

Edinburgh Research Explorer

Bayes-Adaptive Simulation-based Search with Value Function Approximation

Author: Dayan P.
Guez A.
Heess N.
Silver D.
Publication venue
Publication date: 01/01/2015
Field of study

Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks

MPG.PuRe

The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution

Author: Bachrach Y
Banarse D
Fernando C
Graepel T
Heess N
Kohli P
Lever G
Liu S
Publication venue: 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)
Publication date: 17/05/2019
Field of study

Reinforcement learning (RL) has proven to be a powerful paradigm for deriving complex behaviors from simple reward signals in a wide range of environments. When applying RL to continuous control agents in simulated physics environments, the body is usually considered to be part of the environment. However, during evolution the physical body of biological organisms and their controlling brains are co-evolved, thus exploring a much larger space of actuator/controller configurations. Put differently, the intelligence does not reside only in the agent’s mind, but also in the design of their body. We propose a method for uncovering strong agents, consisting of a good combination of a body and policy, based on combining RL with an evolutionary procedure. Given the resulting agent, we also propose an approach for identifying the body changes that contributed the most to the agent performance. We use the Shapley value from cooperative game theory to find the fair contribution of individual components, taking into account synergies between components. We evaluate our methods in an environment similar to the the recently proposed Robo-Sumo task, where agents in a software physics simulator compete in tipping over their opponent or pushing them out of the arena. Our results show that the proposed methods are indeed capable of generating strong agents, significantly outperforming baselines that focus on optimizing the agent policy alone

UCL Discovery

Filtering variational objectives

Author: Doucet A
Heess N
Lawson D
Maddison C
Mnih A
Norouzi M
Teh Y
Tucker G
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2017
Field of study

When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results. Inspired by this, we consider the extension of the ELBO to a family of lower bounds defined by a particle filter’s estimator of the marginal likelihood, the filtering variational objectives (FIVOs). FIVOs take the same arguments as the ELBO, but can exploit a model’s sequential structure to form tighter bounds. We present results that relate the tightness of FIVO’s bound to the variance of the particle filter’s estimator by considering the generic case of bounds defined as log-transformed likelihood estimators. Experimentally, we show that training with FIVO results in substantial improvements over training with the ELBO on sequential data. </p

Oxford University Research Archive

Just-In-Time Kernel Regression for Expectation Propagation

Author: Eslami A
Gretton A
Heess N
Jitkrittum W
Lakshminarayanan B
Sejdinovic D
Szabo Z
Publication venue
Publication date
Field of study

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where the ability to accurately assess uncertainty and to efficiently and robustly update the message operator are essential

UCL Discovery

Distral: robust multitask reinforcement learning

Author: Bapst V
Czarnecki WM
Hadsell R
Heess N
Kirkpatrick J
Pascanu R
Quan J
Teh YW
Publication venue: Massachusetts Institute of Technology Press
Publication date: 01/01/2017
Field of study

Neural information processing systems foundation. All rights reserved. Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust to hyperparameter settings and more stable - attributes that are critical in deep reinforcement learning

Oxford University Research Archive