Search CORE

14 research outputs found

Simple Model-Based Exploration and Exploitation of Markov Decision Processes Using the Elimination Algorithm

Author: C. Lusena
L.P. Kaelbling
L.P. Kaelbling
M. Kearns
M. Kearns
M. Mundhenk
M.J.A. Strens
R.I. Brafman
R.M. Neal
R.S. Sutton
S.J. Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Crossref

Quantitative analysis of retinal vessel attenuation in eyes with retinitis pigmentosa

Author: Dobson L.P.
Kawasaki R.
Kearns L.S.
Ma Y.
Mackey D.A.
Ruddle J.B.
Wong T.Y.
Publication venue: 'Association for Research in Vision and Ophthalmology (ARVO)'
Publication date: 01/06/2012
Field of study

10.1167/iovs.11-8596Investigative Ophthalmology and Visual Science5374306-4314IOVS

Crossref

ScholarBank@NUS

Role of glutamine synthetase in phenazine antibiotic production by Pantoea agglomerans

Author: Altschul S.F.
Backman K.
Baumann L.
Besemer J.
Calhoun D.H.
Chin
Colombo G.
Giddens S.R.
Giddens S.R.
Herbert R.B.
Kearns L.P.
Kearns L.P.
Kearns L.P.
Kerr J.R.
Kleckner N.
Longely R.P.
Mahajan-Miklos S.
Mavrodi D.V.
McDonald M.
Pierson L.S. III
Reitzer L.J.
Römer A.
Sato A.
Simon R.
Turner J.M.
Woods D.R.
Publication venue: 'Canadian Science Publishing'
Publication date
Field of study

Crossref

Sample Complexity Bounds of Exploration

Author: A. Nouri
A.G. Barto
A.L. Strehl
A.L. Strehl
A.N. Burnetas
A.W. Moore
B. Ratitch
C.J. Watkins
E. Brunskill
E. Even-Dar
E. Even-Dar
H. Robbins
L. Kocsis
L. Li
L. Li
L.G. Valiant
L.P. Kaelbling
M.J. Kearns
M.J. Kearns
M.L. Puterman
N. Littlestone
N. Meuleau
R.I. Brafman
R.S. Sutton
S. Koenig
S.P. Singh
T. Jaakkola
T. Jaksch
Publication venue
Publication date: 01/01/2012
Field of study

Abstract Efficient exploration is widely recognized as a fundamental challenge inherent in reinforcement learning. Algorithms that explore efficiently converge faster to near-optimal policies. While heuristics techniques are popular in practice, they lack formal guarantees and may not work well in general. This chapter studies algorithms with polynomial sample complexity of exploration, both model-based and model-free ones, in a unified manner. These so-called PAC-MDP algorithms behave near-optimally except in a “small ” number of steps with high probability. A new learning model known as KWIK is used to unify most existing model-based PAC-MDP algorithms for various subclasses of Markov decision processes. We also compare the sample-complexity framework to alternatives for formalizing exploration efficiency such as regret minimization and Bayes optimal solutions.

CiteSeerX

Crossref