Search CORE

813 research outputs found

Policy evaluation with temporal differences: a survey and comparison

Author: Dann C.
Neumann G.
Peters J.
Publication venue: Massachusetts Institute of Technology Press (MIT Press) / Microtome Publishing
Publication date: 01/01/2014
Field of study

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches. This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance

University of Lincoln Institutional Repository

TUbiblio

MPG.PuRe

On the Inversion of High Energy Proton

Author: Mieskolainen Mikael
Publication venue
Publication date: 01/01/2019
Field of study

Inversion of the K-fold stochastic autoconvolution integral equation is an elementary nonlinear problem, yet there are no de facto methods to solve it with finite statistics. To fix this problem, we introduce a novel inverse algorithm based on a combination of minimization of relative entropy, the Fast Fourier Transform and a recursive version of Efron's bootstrap. This gives us power to obtain new perspectives on non-perturbative high energy QCD, such as probing the ab initio principles underlying the approximately negative binomial distributions of observed charged particle final state multiplicities, related to multiparton interactions, the fluctuating structure and profile of proton and diffraction. As a proof-of-concept, we apply the algorithm to ALICE proton-proton charged particle multiplicity measurements done at different center-of-mass energies and fiducial pseudorapidity intervals at the LHC, available on HEPData. A strong double peak structure emerges from the inversion, barely visible without it.Comment: 29 pages, 10 figures, v2: extended analysis (re-projection ratios, 2D

arXiv.org e-Print Archive

CERN Document Server

Adaptation and learning over networks for nonlinear system modeling

Author: Argyriou
Balakrishnan
Bouboulis
Bouboulis
Cattivelli
Cevher
Chen
Chen
Chen
Chen
Chen
Chen
Chen
Chen
Chouvardas
Di Lorenzo
Di Lorenzo
Evgeniou
Forero
Gao
Honeine
Honeine
Huang
Igelnik
Jin
Lazarevic
Li
Lopes
Mateos
Matta
Nassif
Nassif
Navia-Vazquez
Parreira
Predd
Rahimi
Richard
Rusu
Sandryhaila
Sayed
Sayed
Scardapane
Scardapane
Scardapane
Scardapane
Scarpiniti
Scarpiniti
Shin
Singh
Tsitsiklis
Yuan
Zhao
Zhao
Publication venue
Publication date: 28/04/2017
Field of study

In this chapter, we analyze nonlinear filtering problems in distributed environments, e.g., sensor networks or peer-to-peer protocols. In these scenarios, the agents in the environment receive measurements in a streaming fashion, and they are required to estimate a common (nonlinear) model by alternating local computations and communications with their neighbors. We focus on the important distinction between single-task problems, where the underlying model is common to all agents, and multitask problems, where each agent might converge to a different model due to, e.g., spatial dependencies or other factors. Currently, most of the literature on distributed learning in the nonlinear case has focused on the single-task case, which may be a strong limitation in real-world scenarios. After introducing the problem and reviewing the existing approaches, we describe a simple kernel-based algorithm tailored for the multitask case. We evaluate the proposal on a simulated benchmark task, and we conclude by detailing currently open problems and lines of research.Comment: To be published as a chapter in `Adaptive Learning Methods for Nonlinear System Modeling', Elsevier Publishing, Eds. D. Comminiello and J.C. Principe (2018

arXiv.org e-Print Archive

Crossref

HAL-INSU