Search CORE

15 research outputs found

Stabilizing Unsupervised Environment Design with a Learned Adversary

Author: Dennis M
Jiang M
Mediratta I
Parker-Holder J
Rocktaschel T
Vinitsky E
Publication venue: PMLR: Proceedings of Machine Learning Research
Publication date: 21/11/2023
Field of study

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent’s current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents

UCL Discovery

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

Author: A Aronson
A Doms
A Jimeno
A Koike
A Sokolov
AT McCray
B Settles
Benjamin Garcia
C Brewster
C Jonquet
C Roeder
C Verspoor
Christophe Roeder
Christopher Funk
D Ferrucci
D Hancock
D Rebholz-Schuhmann
DA Natale
DL Wheeler
DS DeLuca
FM Couto
H Liu
H Yu
HM Muller
IBM
J Bard
JC Denny
JC Denny
JG Caporaso
K Bretonnel Cohen
K Degtyarenko
K Eilbeck
K Verspoor
K Verspoor
K Verspoor
K Verspoor
Karin Verspoor
KB Cohen
L Hunter
L Reeve
L Yao
Lawrence E Hunter
M Bada
M Bada
M Krallinger
M Tanenblatt
Michael Bada
MJ Schuemie
N Kang
N Shah
Ontology Consortium The Gene
P Khatri
PV Ogren
Q Zou
R Leaman
S Ray
S Van Landeghem
SA Stewart
T Rocktaschel
William Baumgartner
WW Chu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Stable opponent shaping in differentiable games

Author: Balduzzi D
Foerster J
Letcher A
Rocktaschel T
Whiteson S
Publication venue: OpenReview
Publication date: 01/01/2019
Field of study

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel – from GANs and intrinsic curiosity to multi-agent RL. Opponent shaping is a powerful approach to improve learning dynamics in these games, accounting for player influence on others’ updates. Learning with Opponent-Learning Awareness (LOLA) is a recent algorithm that exploits this response and leads to cooperation in settings like the Iterated Prisoner’s Dilemma. Although experimentally successful, we show that LOLA agents can exhibit ‘arrogant’ behaviour directly at odds with convergence. In fact, remarkably few algorithms have theoretical guarantees applying across all (n-player, non-convex) games. In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead. We prove that LookAhead converges locally to equilibria and avoids strict saddles in all differentiable games. SOS inherits these essential guarantees, while also shaping the learning of opponents and consistently either matching or outperforming LOLA experimentally

Oxford University Research Archive

My body is a cage: the role of morphology in graph−based incompatible control

Author: Boehmer W
Igl M
Kurin V
Rocktaschel T
Whiteson S
Publication venue: OpenReview
Publication date: 01/01/2021
Field of study

Oxford University Research Archive

Stable opponent shaping in differentiable games

Author: Balduzzi D
Foerster J
Letcher A
Rocktaschel T
Whiteson S
Publication venue
Publication date: 01/01/2019
Field of study

UCL Discovery

Oxford University Research Archive

A baseline for any order gradient estimation in stochastic computation graphs

Author: Al-Shedivat M
Farquhar G
Foerster J
Mao J
Rocktaschel T
Whiteson S
Publication venue
Publication date: 01/01/2019
Field of study

By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCEâ€™s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning

UCL Discovery

Oxford University Research Archive

A corpus for plant-chemical relationships in the biomedical domain

Author: A Hamosh
AJ Viera
AY Esmat
Baeksoo Kim
BC Bennett
CH Wei
CYC Chen
DC Comeau
Doheon Lee
DS Wishart
E Pafilis
FE Koehn
H Ye
Hyejin Cho
Hyunju Lee
J Bjorne
J Zhao
JB Calixto
JH Han
K Jenson
KB Cohen
L Chiticariu
L Wang
M Gerner
M Krallinger
M Kuhn
M Marcus
M O’Hara
R Leaman
R Leaman
R Xue
S Federhen
T Rocktaschel
T Wiegers
Wonjun Choi
X Chen
Y Garten
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The CHEMDNER corpus of chemicals and drugs and its annotation principles

Author: Ahmad Akhondi Saber
Alves R
An X
Ata C
Bajec M
Batista-Navarro RT
Campos D
Can T
Choi M
Couto FM
Dai HJ
Dieb TM
Ekbal A
Giles CL
Ho Ryu K
Huber T
Immer M
Ji D
Khabsa M
Kors Jan
Krallinger M
Kumar Sikdar U
Lamurias A
Leaman R
Leitner F
Liu H
Lowe DM
Lu Y
Lu Z
Martinez P
Matos S
Munkhdalai T
Nathan S
Oyarzabal J
Rabal O
Rak R
Ramanan S
Ravikumar KE
Rocktaschel T
Salgado D
Sayle R
Segura-Bedmar I
Tang B
Tsai RT
Usie A
Valencia A
Vazquez M
Verspoor A
Weber L
Xu H
Xu S
Yoshioka M
Zitnik S
Publication venue
Publication date: 01/01/2015
Field of study

Analysis of bacterial, fungal and archaeal populations from a municipal wastewater treatment plant developing an innovative aerobic granular sludge process

Author: A Cydzik-Kwiatkowska
A Giesen
AM Enrigt
APHA
Balasubramanian Sellamuthu
BJ Ni
DJ Lee
DW Gao
E Morgenroth
F Guo
G Gonzalez-Gil
H Liu
J Li
JB Hughes
JS Hiraswa
Jun Li
Jun Liu
KL Huang
KY Show
L Roesch
MK Jungles
MK Kreuk de
MZ Khan
N Morales
ND Gray
QX Yang
RM Mckeown
Ryan Walsh
S Subramanian
SB Subramanian
SD Weber
SD Weber
SL McLellan
SS Adav
T Rocktaschel
T Zhang
TT More
WL Huang
Yaqiang Tao
YQ Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Acid-Base Disturbances in the Intensive Care Unit: Current Issues and the Use of Continuous Renal Replacement Therapy as a Customized Treatment Tool

Author: Aucella F.
Barenbrock M.
Cole L.
Cusack R.J.
Davenport A.
Dondorp A.M.
Dubin A.
Gauthier P.M.
Gore D.C.
Gunnerson K.J.
Gunnerson K.J.
Hamill-Ruth R.J.
Heering P.
Kaplan L.J.
Kellum J.A.
Kellum J.A.
Kellum J.A.
Kellum J.A.
Kierdorf H.P.
Leblanc M.
Mehta R.L.
Mehta R.L.
Meier-Kriesche H.U.
Meyer T.W.
Morimatsu H.
Naka T.
Nimmo G.R.
O'Reilly P.
Paganini E.
Rocktaeschel J.
Rocktaschel J.
Ronco C.
Schoolwerth A.C.
Sigler M.H.
Thomas A.N.
Troyanov S.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref