Search CORE

167 research outputs found

On Backdoors to Tractable Constraint Languages

Author: A.A. Bulatov
J. Chen
L. Barto
L. Barto
L. Barto
P. Idziak
P. Jeavons
R. Williams
T. Feder
V. Dalmau
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceIn the context of CSPs, a strong backdoor is a subset of variables such that every complete assignment yields a residual instance guaranteed to have a specified property. If the property allows efficient solving, then a small strong backdoor provides a reasonable decomposition of the original instance into easy instances. An important challenge is the design of algorithms that can find quickly a small strong backdoor if one exists. We present a systematic study of the parameterized complexity of backdoor detection when the target property is a restricted type of constraint language defined by means of a family of polymor-phisms. In particular, we show that under the weak assumption that the polymorphisms are idempotent, the problem is unlikely to be FPT when the parameter is either r (the constraint arity) or k (the size of the backdoor) unless P = NP or FPT = W[2]. When the parameter is k + r, however, we are able to identify large classes of languages for which the problem of finding a small backdoor is FPT

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

On the reduction of the CSP dichotomy conjecture to digraphs

Author: A. Atserias
A. Bulatov
A. Kazda
A.A. Bulatov
B. Larose
J. Berman
L. Barto
M. Maróti
P. Hell
P. Hell
P. Idziak
P. Jeavons
R.E. Ladner
T.J. Schaefer
U. Montanari
V. Dalmau
Publication venue
Publication date: 01/01/2013
Field of study

It is well known that the constraint satisfaction problem over general relational structures can be reduced in polynomial time to digraphs. We present a simple variant of such a reduction and use it to show that the algebraic dichotomy conjecture is equivalent to its restriction to digraphs and that the polynomial reduction can be made in logspace. We also show that our reduction preserves the bounded width property, i.e., solvability by local consistency methods. We discuss further algorithmic properties that are preserved and related open problems.Comment: 34 pages. Article is to appear in CP2013. This version includes two appendices with proofs of claims omitted from the main articl

arXiv.org e-Print Archive

Crossref

Artificial Neural Network-based error compensation procedure for low-cost encoders

Author: A K Tickoo
B P Dubey
Barto A
Duda R O
Gasking J
Gasking J
Hagan M T
Kaul S K
Liu M
Press W H
R Koul
S K Kaul
V K Dhar
Yamada T
Zurada J M
Publication venue: 'IOP Publishing'
Publication date: 19/11/2009
Field of study

An Artificial Neural Network-based error compensation method is proposed for improving the accuracy of resolver-based 16-bit encoders by compensating for their respective systematic error profiles. The error compensation procedure, for a particular encoder, involves obtaining its error profile by calibrating it on a precision rotary table, training the neural network by using a part of this data and then determining the corrected encoder angle by subtracting the ANN-predicted error from the measured value of the encoder angle. Since it is not guaranteed that all the resolvers will have exactly similar error profiles because of the inherent differences in their construction on a micro scale, the ANN has been trained on one error profile at a time and the corresponding weight file is then used only for compensating the systematic error of this particular encoder. The systematic nature of the error profile for each of the encoders has also been validated by repeated calibration of the encoders over a period of time and it was found that the error profiles of a particular encoder recorded at different epochs show near reproducible behavior. The ANN-based error compensation procedure has been implemented for 4 encoders by training the ANN with their respective error profiles and the results indicate that the accuracy of encoders can be improved by nearly an order of magnitude from quoted values of ~6 arc-min to ~0.65 arc-min when their corresponding ANN-generated weight files are used for determining the corrected encoder angle.Comment: 16 pages, 4 figures. Accepted for Publication in Measurement Science and Technology (MST

arXiv.org e-Print Archive

Crossref

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

A reinforcement connectionist approach to robot path finding in non-maze-like environments

Author: A.G. Barto
A.G. Barto
A.G. Barto
A.G. Barto
A.G. Barto
A.J. Robinson
B.R. Donald
B.W. Mel
C.-K. Yap
C.J.C.H. Watkins
C.W. Anderson
Carme Torras
D. Chapman
J. del R. Millán
J. del R. Millán
J. Ilari
J.F. Canny
Jos� Del R. Mill�n
M.C. Mozer
M.I. Jordan
O. Khatib
P. Langley
P.J. Werbos
R.A. Brooks
R.J. Williams
R.J. Williams
R.S. Sutton
R.S. Sutton
S. Mahadevan
S.H. Whitesides
S.P. Singh
T. Lozano-Pérez
T. Lozano-Pérez
V. Gullapalli
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Tractability in Constraint Satisfaction Problems: A Survey

Author: A Krokhin
A Krokhin
AA Bulatov
AA Bulatov
AA Bulatov
AA Bulatov
AA Bulatov
AA Bulatov
AA Bulatov
AA Bulatov
AA Bulatov
AD Scott
B Larose
Clément Carbonnel
D Lesaint
D Marx
D Marx
D Marx
D Marx
DA Cohen
DA Cohen
DA Cohen
DA Cohen
G Gottlob
H Chen
H Chen
HL Bodlaender
I Adler
J Chen
J Edmonds
JF Allen
KA Abrahamson
L Barto
L Barto
L Barto
M Bodirsky
M Bodirsky
M Bodirsky
M Bodirsky
M Chudnovsky
M Chudnovsky
M Dyer
M Garey
M Grohe
M Grohe
M Grötschel
M Kozik
M Samer
M Zytnicki
Martin C. Cooper
MC Cooper
MC Cooper
MC Cooper
MC Cooper
MC Cooper
MC Cooper
MC Cooper
ME Dyer
MJ Green
N Creignou
N Robertson
P David
P Hell
P Hell
P Jeavons
P Jeavons
P Jonsson
P Jonsson
P van Beek
P van Beek
PG Jeavons
PG Jeavons
PG Jeavons
PM Idziak
R Dechter
R Dechter
R Dechter
R Dechter
RE Ladner
RG Downey
S Arnborg
S Arnborg
S Bistarelli
S Ordyniak
T Feder
T Feder
T Luczak
T Werner
V Dalmau
V Kolmogorov
W Naanaa
Y Deville
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

International audienceEven though the Constraint Satisfaction Problem (CSP) is NP-complete, many tractable classes of CSP instances have been identified. After discussing different forms and uses of tractability, we describe some landmark tractable classes and survey recent theoretical results. Although we concentrate on the classical CSP, we also cover its important extensions to infinite domains and optimisation, as well as #CSP and QCSP

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

Approximate policy iteration: A survey and some new methods

Author: A. G. Barto
A. G. Barto
A. Gosavi
A. L. Samuel
A. L. Samuel
A. Nedić
B. Martinet
B. Roy Van
B. Roy Van
C. A. J. Fletcher
C. Szepesvari
C. Thiery
D. D. Castro
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. S. Choi
D. White
Dimitri P. Bertsekas
E. V. Denardo
F. L. Lewis
F. L. Lewis
F. Pineda
G. J. Gordon
G. J. Tesauro
G. Strang
H. Chang
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
I. Menache
I. Szita
J. A. Boyan
J. Liu
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
L. Busoniu
L. Busoniu
L. Busoniu
L. C. Baird
L. Gurvits
L. N. Trefethen
L. S. Shapley
M. A. Krasnoselskii
M. G. Lagoudakis
M. L. Puterman
M. Wang
N. Polydorides
P. J. Werbos
P. J. Werbös
P. T. Boer de
R. J. Williams
R. S. Sutton
R. S. Sutton
R. T. Rockafellar
R. Y. Rubinstein
S. J. Bradtke
S. Meyn
S. P. Singh
T. Jaakkola
T. Jung
V. F. Farias
V. S. Borkar
W. B. Powell
X. R. Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

CiteSeerX

DSpace@MIT

Crossref

Institute of Mathematics AS CR, v. v. i.

Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

Author: A Arleo
A Barto
A Barto
A Klopf
A Klopf
A Morrison
B Devan
B Poucet
C Clopath
C Hull
C von der Malsburg
C Watkins
D Baras
D Di Castro
D Foster
D Sheynikhovich
DO Hebb
E Bienenstock
E Izhikevich
E Nordlie
E Oja
E Thorndike
E Toleman
E Vasilaki
Eleni Vasilaki
F Wörgötter
H Eichenbaum
H Markram
H Seung
I Fiete
J Baxter
J Wickens
J Wickens
JC Zhang
JNJ Reynolds
JNJ Reynolds
JP Pfister
K Doya
Karl J. Friston
KG Reymann
LF Abbott
M Packard
M Tsodyks
MA Farries
MCW van Rossum
N White
Nicolas Frémaux
P Dayan
P Dayan
P Redgrave
P Roberts
PJ Sjöström
R Jolivet
R Kempter
R Legenstein
R Morris
R Morris
R Morris
R Rao
R Rescorla
R Suri
R Sutton
R Sutton
R Urbanczik
R Williams
RB Stein
RC Malenka
Robert Urbanczik
RS Sutton
RV Florian
S Sajikumar
S Sajikumar
T Kohonen
T Stroesslin
TVP Bliss
U Frey
V Pawlak
W Gerstner
W Gerstner
W Gerstner
W Potjans
W Schultz
W Senn
Walter Senn
Wulfram Gerstner
X Xie
XJ Wang
Y Loewenstein
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)

White Rose Research Online

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

Author: A Barto
A Garthe
A Hanuschkin
A Soltani
A Soltani
AA Prinz
Abigail Morrison
AG Barto
B Porr
B Seymour
B Seymour
BI Hyland
BJ Knowlton
CA Paladini
CD Fiorillo
D Baras
D Joel
DC Dennett
DJ Foster
DZ Jin
E Brazhnik
E Nordlie
E Vasilaki
EA Ludvig
EM Izhikevich
F Wörgötter
G La Camera
G Morris
G Morris
GS Berns
HC Tuckwell
HE Attalah
HM Bayer
HS Seung
IH Witten
J Brown
J O'Doherty
J Wickens
J Yacubian
JC Horvitz
JC Houk
JC Houk
JC Houk
JC Houk
JJ Hopfield
JL Contreras-Vidal
JM Tepper
JN Reynolds
JNJ Reynolds
K Doya
K Gurney
KD Sethi
KJ Friston
M Dai
M Helias
M Matsumoto
M Matsumoto
M Pessiglione
MA Farries
MA Häusser
Markus Diesmann
MD Humphries
MO Gewaltig
N Bowery
N Frémaux
N Schweighofer
ND Daw
O Arias-Carrion
P Calabresi
P Dayan
P Dayan
P Dayan
P Montague
P Redgrave
P Redgrave
PA Garris
PN Tobler
PR Montague
PR Montague
R Legenstein
R Suri
R VanRullen
RC Froemke
RE Suri
RE Suri
RJ McDonald
RJ Steele
RPN Rao
RS Sutton
RS Sutton
RV Florian
S Fusi
S Pecina
S Schrader
S Sugita
SM Reynolds
SM Reynolds
T Ljungberg
T Nakano
Tim Behrens
V Pawlak
V Pawlak
W Potjans
W Potjans
W Schultz
W Schultz
W Schultz
Wiebke Potjans
X Xie
Y Loewenstein
Y Niv
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

Differential Allocation of Constitutive and Induced Chemical Defenses in Pine Tree Juveniles: A Test of the Optimal Defense Theory

Author: A Eyles
A Kessler
AA Agrawal
AA Agrawal
AC McCall
AR Zangerl
AR Zangerl
B Gutbrodt
B Miller
D Martin
D McKey
D Takahashi
DA Herms
DF Rhoades
DM Martin
DPW Huber
EK Barto
ES Tomlin
FS Chapin III
G Nordlander
GB Toth
Gustavo Bonaventure
H Pavia
I Kaplan
I Terashima
J Asplund
JN Holland
K Boege
KE Barton
L Sampedro
L Sampedro
L Sampedro
Luis Sampedro
M Euler
MB Traw
NM Van Dam
O Nystrand
P Kaitaniemi
R Karban
R Mumm
Rafael Zas
RB Taylor
RC Littell
RL Furniss
RS Fritz
RV Bega
S Gómez
SY Strauss
T Honkanen
TE Ohnmeiss
V Radhika
WF Morris
X Moreira
X Moreira
X Moreira
Xoaquín Moreira
Publication venue: Public Library of Science
Publication date: 28/03/2012
Field of study

Optimal defense theory (ODT) predicts that the within-plant quantitative allocation of defenses is not random, but driven by the potential relative contribution of particular plant tissues to overall fitness. These predictions have been poorly tested on long-lived woody plants. We explored the allocation of constitutive and methyl-jasmonate (MJ) inducible chemical defenses in six half-sib families of Pinus radiata juveniles. Specifically, we studied the quantitative allocation of resin and polyphenolics (the two major secondary chemicals in pine trees) to tissues with contrasting fitness value (stem phloem, stem xylem and needles) across three parts of the plants (basal, middle and apical upper part), using nitrogen concentration as a proxy of tissue value. Concentration of nitrogen in the phloem, xylem and needles was found to be greater higher up the plant. As predicted by the ODT, the same pattern was found for the concentration of non-volatile resin in the stem. However, in leaf tissues the concentrations of both resin and total phenolics were greater towards the base of the plant. Two weeks after MJ application, the concentrations of nitrogen in the phloem, resin in the stem and total phenolics in the needles increased by roughly 25% compared with the control plants, inducibility was similar across all plant parts, and families differed in the inducibility of resin compounds in the stem. In contrast, no significant changes were observed either for phenolics in the stems, or for resin in the needles after MJ application. Concentration of resin in the phloem was double that in the xylem and MJ-inducible, with inducibility being greater towards the base of the stem. In contrast, resin in the xylem was not MJ-inducible and increased in concentration higher up the plant. The pattern of inducibility by MJ-signaling in juvenile P. radiata is tissue, chemical-defense and plant-part specific, and is genetically variable

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital.CSIC