144,895 research outputs found
A Generalization Error for Q-Learning
Planning problems that involve learning a policy from a single training set of ?nite horizon trajectories arise in both social science and medical ?elds. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.National Institutes of Health (NIDA grants K02 DA15674 and P50 DA 10075 to the Methodology Center)Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57425/2/murphy05a.pd
Does generalization performance of regularization learning depend on ? A negative example
-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a estimator
differs in varying choices of the regularization order . In particular,
leads to the LASSO estimate, while corresponds to the smooth
ridge regression. This makes the order a potential tuning parameter in
applications. To facilitate the use of -regularization, we intend to
seek for a modeling strategy where an elaborative selection on is
avoidable. In this spirit, we place our investigation within a general
framework of -regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all estimators for attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
might not have a strong impact in terms of the generalization capability.
From this perspective, can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..Comment: 35 pages, 3 figure
Q-learning with censored data
We develop methodology for a multistage decision problem with flexible number
of stages in which the rewards are survival times that are subject to
censoring. We present a novel Q-learning algorithm that is adjusted for
censored data and allows a flexible number of stages. We provide finite sample
bounds on the generalization error of the policy learned by the algorithm, and
show that when the optimal Q-function belongs to the approximation space, the
expected survival time for policies obtained by the algorithm converges to that
of the optimal policy. We simulate a multistage clinical trial with flexible
number of stages and apply the proposed censored-Q-learning algorithm to find
individualized treatment regimens. The methodology presented in this paper has
implications in the design of personalized medicine trials in cancer and in
other life-threatening diseases.Comment: Published in at http://dx.doi.org/10.1214/12-AOS968 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Lifting the Veil: Unlocking the Power of Depth in Q-learning
With the help of massive data and rich computational resources, deep
Q-learning has been widely used in operations research and management science
and has contributed to great success in numerous applications, including
recommender systems, supply chains, games, and robotic manipulation. However,
the success of deep Q-learning lacks solid theoretical verification and
interpretability. The aim of this paper is to theoretically verify the power of
depth in deep Q-learning. Within the framework of statistical learning theory,
we rigorously prove that deep Q-learning outperforms its traditional version by
demonstrating its good generalization error bound. Our results reveal that the
main reason for the success of deep Q-learning is the excellent performance of
deep neural networks (deep nets) in capturing the special properties of rewards
namely, spatial sparseness and piecewise constancy, rather than their large
capacities. In this paper, we make fundamental contributions to the field of
reinforcement learning by answering to the following three questions: Why does
deep Q-learning perform so well? When does deep Q-learning perform better than
traditional Q-learning? How many samples are required to achieve a specific
prediction accuracy for deep Q-learning? Our theoretical assertions are
verified by applying deep Q-learning in the well-known beer game in supply
chain management and a simulated recommender system
Statistical Mechanics of Soft Margin Classifiers
We study the typical learning properties of the recently introduced Soft
Margin Classifiers (SMCs), learning realizable and unrealizable tasks, with the
tools of Statistical Mechanics. We derive analytically the behaviour of the
learning curves in the regime of very large training sets. We obtain
exponential and power laws for the decay of the generalization error towards
the asymptotic value, depending on the task and on general characteristics of
the distribution of stabilities of the patterns to be learned. The optimal
learning curves of the SMCs, which give the minimal generalization error, are
obtained by tuning the coefficient controlling the trade-off between the error
and the regularization terms in the cost function. If the task is realizable by
the SMC, the optimal performance is better than that of a hard margin Support
Vector Machine and is very close to that of a Bayesian classifier.Comment: 26 pages, 12 figures, submitted to Physical Review
Statistical Mechanics of Nonlinear On-line Learning for Ensemble Teachers
We analyze the generalization performance of a student in a model composed of
nonlinear perceptrons: a true teacher, ensemble teachers, and the student. We
calculate the generalization error of the student analytically or numerically
using statistical mechanics in the framework of on-line learning. We treat two
well-known learning rules: Hebbian learning and perceptron learning. As a
result, it is proven that the nonlinear model shows qualitatively different
behaviors from the linear model. Moreover, it is clarified that Hebbian
learning and perceptron learning show qualitatively different behaviors from
each other. In Hebbian learning, we can analytically obtain the solutions. In
this case, the generalization error monotonically decreases. The steady value
of the generalization error is independent of the learning rate. The larger the
number of teachers is and the more variety the ensemble teachers have, the
smaller the generalization error is. In perceptron learning, we have to
numerically obtain the solutions. In this case, the dynamical behaviors of the
generalization error are non-monotonic. The smaller the learning rate is, the
larger the number of teachers is; and the more variety the ensemble teachers
have, the smaller the minimum value of the generalization error is.Comment: 13 pages, 9 figure
- …