Search CORE

106 research outputs found

Projective simulation for artificial intelligence

Author: Andy Clark
AP Hines
D Ormoneit
DanielL Schacter
David Deutsch
EdwardC Tolman
Ende Tulving
Germund Hesslow
Hendrik Weimer
Igor Antonov
JT Barreiro
Julia Kempe
Jun Tani
Lev Grover
Long-Ji Lin
M Toussaint
MartinV Butz
PreetiS Sareen
R Parr
Richard Feynman
RS Sutton
Sebastian Diehl
TG Dietterich
Publication venue
Publication date: 08/02/2013
Field of study

We propose a model of a learning agent whose interaction with the environment is governed by a simulation-based projection, which allows the agent to project itself into future situations before it takes real action. Projective simulation is based on a random walk through a network of clips, which are elementary patches of episodic memory. The network of clips changes dynamically, both due to new perceptual input and due to certain compositional principles of the simulation process. During simulation, the clips are screened for specific features which trigger factual action of the agent. The scheme is different from other, computational, notions of simulation, and it provides a new element in an embodied cognitive science approach to intelligent action and learning. Our model provides a natural route for generalization to quantum-mechanical operation and connects the fields of reinforcement learning and quantum computation.Comment: 22 pages, 18 figures. Close to published version, with footnotes retaine

arXiv.org e-Print Archive

Crossref

PubMed Central

Sequential Quasi-Monte Carlo

Author: Aistleitner
Andrieu
Andrieu
Andrieu
Andrieu
Arulampalam
Bach
Barvínek
Beaumont
Berard
Beskos
Blackwell
Blumer
Bocquet
Briers
Briers
Butz
Caflisch
Cappé
Carpenter
Carpenter
Carvalho
Ch
Chan
Chan
Chang
Chen
Chen
Chopin
Chopin
Chopin
Chopin
Chorin
Chorin
Cornebise
Cranley
Cvetanoska
Dacunha-Castelle
Del Moral
Del Moral
Del Moral
Del Moral
Del Moral
Del Moral
Devroye
Dick
Dick
Douc
Doucet
Doucet
Dutang
Fearnhead
Fearnhead
Fearnhead
Fearnhead
Ferguson
Gerber
Glasserman
Gordon
Gundersen
Guo
Götz
Hall
Hamilton
Heinrich
Hickernell
Hickernell
Hlawka
Hong
Hörmann
James
Johansen
Johansen
Julier
Kahn
Kantas
Kearns
Kitagawa
Koyama
Kuipers
Künsch
L'Ecuyer
L'Ecuyer
L'Ecuyer
L'Ecuyer
L'Ecuyer
L'Ecuyer
L'Ecuyer
Lacoste-Julien
Lemieux
Liu
Liu
Loh
Lopes
Lécot
Lécot
MacRae
Marin
Matoŭsek
McAlinn
Morzfeld
Muller
Muller
Murray
Murray
Møller
Neal
Niederreiter
Novak
Ormoneit
Ormoneit
Owen
Owen
Owen
Owen
Owen
Owen
Owen
Owen
Pitman
Pitt
Pitt
Pitt
Pollock
Poyiadjis
Poyiadjis
Raftery
Rebeschini
Regazzini
Reich
Robert
Rosenblatt
Rust
Sagan
Sherlock
Shinozuka
Skilling
Steigleder
Stein
Tuffin
van der Vaart
Vapnik
von Neumann
Wang
Wächter
Zhou
Ökten
Publication venue
Publication date: 28/11/2014
Field of study

We derive and study SQMC (Sequential Quasi-Monte Carlo), a class of algorithms obtained by introducing QMC point sets in particle filtering. SQMC is related to, and may be seen as an extension of, the array-RQMC algorithm of L'Ecuyer et al. (2006). The complexity of SQMC is

O(N \log N)

, where

N

is the number of simulations at each iteration, and its error rate is smaller than the Monte Carlo rate

O_P(N^{-1/2})

. The only requirement to implement SQMC is the ability to write the simulation of particle

x_t^n

given

x_{t-1}^n

as a deterministic function of

x_{t-1}^n

and a fixed number of uniform variates. We show that SQMC is amenable to the same extensions as standard SMC, such as forward smoothing, backward smoothing, unbiased likelihood evaluation, and so on. In particular, SQMC may replace SMC within a PMCMC (particle Markov chain Monte Carlo) algorithm. We establish several convergence results. We provide numerical evidence that SQMC may significantly outperform SMC in practical scenarios.Comment: 55 pages, 10 figures (final version

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL-Polytechnique

Explore Bristol Research

Regularized fitted Q-iteration: application to planning

Author: A. Antos
B. Schölkopf
D. Ernst
D. Ormoneit
D.-X. Zhou
D.P. Bertsekas
F. Bunea
L. Györfi
N. Srebro
R. Munos
S. Mannor
X. Xu
Y. Engel
Y. Engel
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure

CiteSeerX

Crossref

SZTAKI Publication Repository

The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups

Author: Asiain
Baade
Bienaymé
Binney
Binney
Blaauw
Bovy
Bozdogan
Cabrera-Caño
Chaudhuri
Chen
Chereul
Conway
David W. Hogg
Dehnen
Dempster
Eddington
Eggen
Eggen
Eggen
Eggen
Eggen
Eggen
ESA.
Figueras
Francis
Gelman
Gerbaldi
Gilmore
Gomez
Grenier
Grünwald
Hogg
Ivezić
Jo Bovy
Johnston
Kapteyn
Kolmogorov
Koposov
Koposov
McDonald
Murenzi
Mädler
Navarro
Ogorodnikov
Ogorodnikov
Oliver
Oliver
Oort
Ormoneit
Plummer
Rasmuson
Sam T. Roweis
Schwarzschild
Schwarzschild
Silverman
Simon
Slezak
Soubiran
Starck
Stone
Tollerud
Tuominen
Ueda
Wallace
Wallace
Wallace
Williams
Williams
Zador
Zhao
Publication venue: 'IOP Publishing'
Publication date: 20/07/2009
Field of study

We present a three-dimensional reconstruction of the velocity distribution of nearby stars (<~ 100 pc) using a maximum likelihood density estimation technique applied to the two-dimensional tangential velocities of stars. The underlying distribution is modeled as a mixture of Gaussian components. The algorithm reconstructs the error-deconvolved distribution function, even when the individual stars have unique error and missing-data properties. We apply this technique to the tangential velocity measurements from a kinematically unbiased sample of 11,865 main sequence stars observed by the Hipparcos satellite. We explore various methods for validating the complexity of the resulting velocity distribution function, including criteria based on Bayesian model selection and how accurately our reconstruction predicts the radial velocities of a sample of stars from the Geneva-Copenhagen survey (GCS). Using this very conservative external validation test based on the GCS, we find that there is little evidence for structure in the distribution function beyond the moving groups established prior to the Hipparcos mission. This is in sharp contrast with internal tests performed here and in previous analyses, which point consistently to maximal structure in the velocity distribution. We quantify the information content of the radial velocity measurements and find that the mean amount of new information gained from a radial velocity measurement of a single star is significant. This argues for complementary radial velocity surveys to upcoming astrometric surveys

arXiv.org e-Print Archive

CiteSeerX

Crossref

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

Interactive generation of human animation with deformable motion models

Author: Howe N.
Jianyuan Min
Jinxiang Chai
Kovar L.
Lourakis M.
Ormoneit D.
Pavlović V.
Sidenbladh H.
Yen-Lin Chen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Primer on using neural networks for forecasting market variables

Author: Barone-Adesi
Coats
Davies
Donaldson
Gonzáles Miranda
Hawley
Kean
Lippmann
McCormick
Meissner
Mendelsohn
Ormoneit
Qi
Qi
Qi
Ridley
Rumelhart
Schittenkopf
Schittenkopf
Shaikh A. Hamid
Zahid Iqbal
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Author's OriginalAbility to forecast market variables is critical to analysts, economists and investors. Among other uses, neural networks are gaining in popularity in forecasting market variables. They are used in various disciplines and issues to map complex relationships. We present a primer for using neural networks for forecasting market variables in general, and in particular, forecasting volatility of the S&P 500 Index futures prices. We compare volatility forecasts from neural networks with implied volatility from S&P 500 Index futures options using the Barone-Adesi and Whaley (BAW) model for pricing American options on futures. Forecasts from neural networks outperform implied volatility forecasts. Volatility forecasts from neural networks are not found to be significantly different from realized volatility. Implied volatility forecasts are found to be significantly different from realized volatility in two of three cases. A revised version of this paper has since been published in the Journal of Business Research. Please use this version in your citations.Hamid, S. A. & Iqbal, Zahid. (2004). Using Neural Networks for Forecasting Volatility of S&P 500 Index Futures Prices. Journal of Business Research, 57(10), 1116-1125

CiteSeerX

Crossref

SNHU Academic Archive

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Author: A. Antos
A. Antos
A. Nobel
András Antos
B. Yu
Csaba Szepesvári
D. Ernst
D. Haussler
D. Ormoneit
D. P. Bertsekas
D. P. Bertsekas
D. Pollard
E. Cheney
G. Gordon
J. N. Tsitsiklis
L. Devroye
L. Györfi
M. Anthony
M. Carrasco
M. Kuczma
M. Lagoudakis
P. Doukhan
P. Schweitzer
R. A. Howard
R. Bellman
R. Meir
R. Sutton
Rémi Munos
S. Bradtke
S. Meyn
S. Murphy
T. G. Dietterich
Y. Baraud
Y. Davidov
Publication venue
Publication date: 01/01/2008
Field of study

We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory

CiteSeerX

HAL - Lille 3

Crossref

SZTAKI Publication Repository

INRIA a CCSD electronic archive server