Search CORE

144,895 research outputs found

A Generalization Error for Q-Learning

Author: Murphy Susan A.
Publication venue
Publication date: 01/01/2005
Field of study

Planning problems that involve learning a policy from a single training set of ?nite horizon trajectories arise in both social science and medical ?elds. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.National Institutes of Health (NIDA grants K02 DA15674 and P50 DA 10075 to the Methodology Center)Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57425/2/murphy05a.pd

CiteSeerX

PubMed Central

Deep Blue Documents at the University of Michigan

Does generalization performance of $l^q$ regularization learning depend on $q$ ? A negative example

Author: Fang Jian
Lin Shaobo
Xu Chen
Zeng Jingshan
Publication venue
Publication date: 24/07/2013
Field of study

l^q

-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a

l^q

estimator differs in varying choices of the regularization order

q

. In particular,

l^1

leads to the LASSO estimate, while

l^{2}

corresponds to the smooth ridge regression. This makes the order

q

a potential tuning parameter in applications. To facilitate the use of

l^{q}

-regularization, we intend to seek for a modeling strategy where an elaborative selection on

q

is avoidable. In this spirit, we place our investigation within a general framework of

l^{q}

-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all

l^{q}

estimators for

0< q < \infty

attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of

q

might not have a strong impact in terms of the generalization capability. From this perspective,

q

can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..Comment: 35 pages, 3 figure

arXiv.org e-Print Archive

Q-learning with censored data

Author: Goldberg Yair
Kosorok Michael R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.Comment: Published in at http://dx.doi.org/10.1214/12-AOS968 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

PubMed Central

Carolina Digital Repository

Lifting the Veil: Unlocking the Power of Depth in Q-learning

Author: Li Tao
Lin Shao-Bo
Tang Shaojie
Wang Yao
Zhou Ding-Xuan
Publication venue
Publication date: 27/10/2023
Field of study

With the help of massive data and rich computational resources, deep Q-learning has been widely used in operations research and management science and has contributed to great success in numerous applications, including recommender systems, supply chains, games, and robotic manipulation. However, the success of deep Q-learning lacks solid theoretical verification and interpretability. The aim of this paper is to theoretically verify the power of depth in deep Q-learning. Within the framework of statistical learning theory, we rigorously prove that deep Q-learning outperforms its traditional version by demonstrating its good generalization error bound. Our results reveal that the main reason for the success of deep Q-learning is the excellent performance of deep neural networks (deep nets) in capturing the special properties of rewards namely, spatial sparseness and piecewise constancy, rather than their large capacities. In this paper, we make fundamental contributions to the field of reinforcement learning by answering to the following three questions: Why does deep Q-learning perform so well? When does deep Q-learning perform better than traditional Q-learning? How many samples are required to achieve a specific prediction accuracy for deep Q-learning? Our theoretical assertions are verified by applying deep Q-learning in the well-known beer game in supply chain management and a simulated recommender system

arXiv.org e-Print Archive

Statistical Mechanics of Soft Margin Classifiers

Author: A. Buhot
A. Buhot
B. Martos
C. Cortes
C. J. C. Burges
E. Gardner
G. Györgyi
G. Györgyi
H. S. Seung
J. Hertz
J.-I. Inoue
M. B. Gordon
M. Opper
M. Opper
M. Seeger
Mirta B. Gordon
O. Kinouchi
P. Peretto
P. Reimann
P. Sollich
R. Dietrich
R. Meir
S. Risau-Gusman
S. Risau-Gusman
S.-I. Amari
Sebastian Risau-Gusman
T. Cover
T. L. H. Watkin
T. Uezu
V. Vapnik
W. Krauth
Publication venue: 'American Physical Society (APS)'
Publication date: 18/02/2001
Field of study

We study the typical learning properties of the recently introduced Soft Margin Classifiers (SMCs), learning realizable and unrealizable tasks, with the tools of Statistical Mechanics. We derive analytically the behaviour of the learning curves in the regime of very large training sets. We obtain exponential and power laws for the decay of the generalization error towards the asymptotic value, depending on the task and on general characteristics of the distribution of stabilities of the patterns to be learned. The optimal learning curves of the SMCs, which give the minimal generalization error, are obtained by tuning the coefficient controlling the trade-off between the error and the regularization terms in the cost function. If the task is realizable by the SMC, the optimal performance is better than that of a hard margin Support Vector Machine and is very close to that of a Bayesian classifier.Comment: 26 pages, 12 figures, submitted to Physical Review

arXiv.org e-Print Archive

Crossref

Statistical Mechanics of Nonlinear On-line Learning for Ensemble Teachers

Author: Freund Y.
Hara K.
Krogh A.
Miyoshi S.
Miyoshi S.
Miyoshi S.
Nishimori H.
Saad D.
Urakami M.
Urbanczik R.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 16/05/2007
Field of study

We analyze the generalization performance of a student in a model composed of nonlinear perceptrons: a true teacher, ensemble teachers, and the student. We calculate the generalization error of the student analytically or numerically using statistical mechanics in the framework of on-line learning. We treat two well-known learning rules: Hebbian learning and perceptron learning. As a result, it is proven that the nonlinear model shows qualitatively different behaviors from the linear model. Moreover, it is clarified that Hebbian learning and perceptron learning show qualitatively different behaviors from each other. In Hebbian learning, we can analytically obtain the solutions. In this case, the generalization error monotonically decreases. The steady value of the generalization error is independent of the learning rate. The larger the number of teachers is and the more variety the ensemble teachers have, the smaller the generalization error is. In perceptron learning, we have to numerically obtain the solutions. In this case, the dynamical behaviors of the generalization error are non-monotonic. The smaller the learning rate is, the larger the number of teachers is; and the more variety the ensemble teachers have, the smaller the minimum value of the generalization error is.Comment: 13 pages, 9 figure

arXiv.org e-Print Archive

Crossref