Search CORE

11 research outputs found

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

Author: Al-Shedivat M
Farquhar G
Foerster J
Rocktäschel T
Whiteson S
Xing EP
Publication venue: Proceedings of Machine Learning Research
Publication date: 01/07/2018
Field of study

The score function estimator is widely used for estimating gradients of stochastic objectives in stochastic computation graphs (SCG), eg, in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order derivatives is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order derivative involves increasingly cumbersome graph manipulations. Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives. To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct estimators of derivatives of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DiCE both through a proof and numerical evaluation of the DiCE derivative estimates. We also use DiCE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://www.github.com/alshedivat/lola

UCL Discovery

Recommended from our members

Stochastic synapses enable efficient brain-inspired learning machines

Author: Al-Shedivat M
Cauwenberghs G
Joshi S
Neftci EO
Pedroni BU
Publication venue: eScholarship, University of California
Publication date: 29/06/2016
Field of study

Recent studies have shown that synaptic unreliability is a robust and sufficient mechanism for inducing the stochasticity observed in cortex. Here, we introduce Synaptic Sampling Machines (S2Ms), a class of neural network models that uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning. Similar to the original formulation of Boltzmann machines, these models can be viewed as a stochastic counterpart of Hopfield networks, but where stochasticity is induced by a random mask over the connections. Synaptic stochasticity plays the dual role of an efficient mechanism for sampling, and a regularizer during learning akin to DropConnect. A local synaptic plasticity rule implementing an event-driven form of contrastive divergence enables the learning of generative models in an on-line fashion. S2Ms perform equally well using discrete-timed artificial units (as in Hopfield networks) or continuous-timed leaky integrate and fire neurons. The learned representations are remarkably sparse and robust to reductions in bit precision and synapse pruning: removal of more than 75% of the weakest connections followed by cursory re-learning causes a negligible performance loss on benchmark classification tasks. The spiking neuron-based S2Ms outperform existing spike-based unsupervised learners, while potentially offering substantial advantages in terms of power and complexity, and are thus promising models for on-line learning in brain-inspired hardware

eScholarship - University of California

DiCE: The infinitely differentiable Monte Carlo estimator

Author: Al-Shedivat M
Farquhar G
Foerster J
Rocktäschel T
Whiteson SA
Xing EP
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2018
Field of study

The score function estimator is widely used for estimating gradients of stochastic objectives in stochastic computation graphs (SCG), e.g., in reinforcement learning and meta-learning. While deriving the first order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher order derivatives is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order derivative involves increasingly cumbersome graph manipulations. Lastly, to match the first order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher order derivatives. To address all these shortcomings in a unified way, we introduce DICE, which provides a single objective that can be differentiated repeatedly, generating correct estimators of derivatives of any order in SCGs. Unlike SL, DICE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DICE both through a proof and numerical evaluation of the DICE derivative estimates. We also use DICE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://goo.gl/xkkGxN

Oxford University Research Archive

DiCE: The infinitely differentiable Monte Carlo estimator

Author: Al-Shedivat M
Farquhar G
Foerster J
Rocktäschel T
Whiteson SA
Xing EP
Publication venue
Publication date: 03/07/2018
Field of study

A baseline for any order gradient estimation in stochastic computation graphs

Author: Al-Shedivat M
Farquhar G
Foerster J
Mao J
Rocktaschel T
Whiteson S
Publication venue
Publication date: 01/01/2019
Field of study

By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate. It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline. We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCEâ€™s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients. This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning

UCL Discovery

Oxford University Research Archive

Learning with opponent-learning awareness

Author: Abbeel P
Al-Shedivat M
Chen R
Foerster J
Mordatch I
Whiteson S
Publication venue: International Foundation for Autonomous Agents and Multiagent Systems
Publication date: 01/01/2018
Field of study

Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical reinforcement learning, generative adversarial networks and decentralised optimization. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes an additional term that accounts for the impact of one agent’s policy on the anticipated parameter update of the other agents. Preliminary results show that the encounter of two LOLA agents leads to the emergence of titfor-tat and therefore cooperation in the iterated prisoners’ dilemma (IPD), while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to infinitely repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents can successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the likelihood ratio policy gradient estimator, making the method suitable for model-free reinforcement learning. This method thus scales to large parameter and input spaces and nonlinear function approximators. We also apply LOLA to a grid world task with an embedded social dilemma using deep recurrent policies and opponent modelling. Again, by explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest

Oxford University Research Archive

An SGD-based meta-learner with “growing” descent

Author: Al-Shedivat M
Aljundi R
Andrychowicz M
Finn C
Gompertz B
Ha D
I Kulikovskikh
Kim H S
Li D
Li K
Li K
Li Q
Li Z
Lv K
Malthus T R
Mishra N
Munkhdalai T
Ren M
Ribeiro F L
Ruder S
S Prokhorov
T Legović
T Šmuc
Verhulst P F
Verschure P F M J
Wichrowska O
Wilson A
Wu X
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Fully memristive neural networks for pattern classification with unsupervised learning

Author: A Mehonic
A Pantazi
A Sebastian
A Serb
C Li
D Silver
DB Strukov
F Messerschmitt
G Indiveri
GW Burr
H Jiang
H Lim
H Lim
H Lim
HSP Wong
I Gupta
I Sourikopoulos
I Valov
I Valov
I Valov
J Park
JC Magee
M Al-Shedivat
M Prezioso
MD Pickett
MM Shulaker
PA Merolla
Q Liu
R Midya
S Ambrogio
S Barbera La
S Hochreiter
S Kim
S Yu
S Yu
SB Eryilmaz
SG Hu
SH Jo
ST Roweis
T Ohno
T Tsuruoka
T Tuma
T Tuma
Y Burgt van de
Y Yang
YV Pershin
Z Wang
ZF Mainen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This paper was accepted for publication in the journal Nature Electronics and the definitive published version is available at https://doi.org/10.1038/s41928-018-0023-2.Neuromorphic computers comprised of artificial neurons and synapses could provide a more efficient approach to implementing neural network algorithms than traditional hardware. Recently, artificial neurons based on memristors have been developed, but with limited bio-realistic dynamics and no direct interaction with the artificial synapses in an integrated network. Here we show that a diffusive memristor based on silver nanoparticles in a dielectric film can be used to create an artificial neuron with stochastic leaky integrate-and-fire dynamics and tunable integration time, which is determined by silver migration alone or its interaction with circuit capacitance. We integrate these neurons with nonvolatile memristive synapses to build fully memristive artificial neural networks. With these integrated networks, we experimentally demonstrate unsupervised synaptic weight updating and pattern classification

Crossref

Loughborough University Institutional Repository

Stochastic phase-change neurons

Author: A Borst
A Pouget
A Sebastian
Abu Sebastian
AF Vincent
Angeliki Pantazi
BB Averbeck
BS Lee
C Mead
C Perera
C Ríos
CD Wright
D Kuzum
D Pecevski
D Sterratt
DS Modha
E Marder
E Neftci
EM Izhikevich
ER Kandel
Evangelos Eleftheriou
F Xiong
F Zipoli
G Indiveri
G Indiveri
G Indiveri
GW Burr
GW Burr
J Bollen
J Kalb
L Chua
LF Abbott
LJ Gentet
M Al-Shedivat
M Di Ventra
M Mishali
Manuel Le Gallo
MD Pickett
Pa Merolla
R Gütig
S Choudhary
S Gaba
S Senkader
S Song
SC Liu
T Ohno
Tomas Tuma
W Gerstner
W Maass
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Memristive computing devices and applications

Author: A Alaghi
A Chen
A Sawa
A Younis
AC Torrezan
An Chen
BV Benjamin
C Gopalan
C Herder
C Mead
C-Y Huang
D Loke
D Silver
D-H Kwon
DB Strukov
DJ Wouters
E Chicca
F Xiong
G Indiveri
G Indiveri
G Indiveri
Giacomo Indiveri
H Nili
H-SP Wong
I Valov
J Borghetti
J Cong
J Zhou
J Zhou
J-Y Chen
JH Poikonen
JJ Yang
JJ Yang
JJ Yang
JM Shalf
JP Strachan
K Sankaran
K Shirriff
K-H Kim
K-H Kim
KM Bresniker
KM Kim
L Chua
M Al-Shedivat
M Chu
M Prezioso
M Suri
M Wang
M Wuttig
M Zidan
M-J Lee
MA Zidan
MA Zidan
MA Zidan
MD Pickett
MM Shulaker
MM Waldrop
MMS Aly
MN Kozicki
Mohammed A. Zidan
N Qiao
P Knag
P Stoliar
PA Merolla
PM Sheridan
PO Vontobel
R Liu
S Ambrogio
S Balatti
S Borkar
S Gaba
S Gaba
S Kim
S Kvatinsky
S Menzel
S Nandakumar
S Raoux
SB Furber
SH Jo
SP Adhikari
SP Adhikari
T Berzina
T Pfeil
T Tsuruoka
V Dubost
W Robinett
Wei D. Lu
X Tang
X Zhu
Y Dong
Y LeCun
Y Yang
Y Yang
Y-M Chang
YV Pershin
YV Pershin
Z Diao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref