813 research outputs found
Policy evaluation with temporal differences: a survey and comparison
Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.
This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance
On the Inversion of High Energy Proton
Inversion of the K-fold stochastic autoconvolution integral equation is an
elementary nonlinear problem, yet there are no de facto methods to solve it
with finite statistics. To fix this problem, we introduce a novel inverse
algorithm based on a combination of minimization of relative entropy, the Fast
Fourier Transform and a recursive version of Efron's bootstrap. This gives us
power to obtain new perspectives on non-perturbative high energy QCD, such as
probing the ab initio principles underlying the approximately negative binomial
distributions of observed charged particle final state multiplicities, related
to multiparton interactions, the fluctuating structure and profile of proton
and diffraction. As a proof-of-concept, we apply the algorithm to ALICE
proton-proton charged particle multiplicity measurements done at different
center-of-mass energies and fiducial pseudorapidity intervals at the LHC,
available on HEPData. A strong double peak structure emerges from the
inversion, barely visible without it.Comment: 29 pages, 10 figures, v2: extended analysis (re-projection ratios,
2D
Adaptation and learning over networks for nonlinear system modeling
In this chapter, we analyze nonlinear filtering problems in distributed
environments, e.g., sensor networks or peer-to-peer protocols. In these
scenarios, the agents in the environment receive measurements in a streaming
fashion, and they are required to estimate a common (nonlinear) model by
alternating local computations and communications with their neighbors. We
focus on the important distinction between single-task problems, where the
underlying model is common to all agents, and multitask problems, where each
agent might converge to a different model due to, e.g., spatial dependencies or
other factors. Currently, most of the literature on distributed learning in the
nonlinear case has focused on the single-task case, which may be a strong
limitation in real-world scenarios. After introducing the problem and reviewing
the existing approaches, we describe a simple kernel-based algorithm tailored
for the multitask case. We evaluate the proposal on a simulated benchmark task,
and we conclude by detailing currently open problems and lines of research.Comment: To be published as a chapter in `Adaptive Learning Methods for
Nonlinear System Modeling', Elsevier Publishing, Eds. D. Comminiello and J.C.
Principe (2018
- …