813 research outputs found

    Policy evaluation with temporal differences: a survey and comparison

    Get PDF
    Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches. This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance

    On the Inversion of High Energy Proton

    Full text link
    Inversion of the K-fold stochastic autoconvolution integral equation is an elementary nonlinear problem, yet there are no de facto methods to solve it with finite statistics. To fix this problem, we introduce a novel inverse algorithm based on a combination of minimization of relative entropy, the Fast Fourier Transform and a recursive version of Efron's bootstrap. This gives us power to obtain new perspectives on non-perturbative high energy QCD, such as probing the ab initio principles underlying the approximately negative binomial distributions of observed charged particle final state multiplicities, related to multiparton interactions, the fluctuating structure and profile of proton and diffraction. As a proof-of-concept, we apply the algorithm to ALICE proton-proton charged particle multiplicity measurements done at different center-of-mass energies and fiducial pseudorapidity intervals at the LHC, available on HEPData. A strong double peak structure emerges from the inversion, barely visible without it.Comment: 29 pages, 10 figures, v2: extended analysis (re-projection ratios, 2D

    Adaptation and learning over networks for nonlinear system modeling

    Full text link
    In this chapter, we analyze nonlinear filtering problems in distributed environments, e.g., sensor networks or peer-to-peer protocols. In these scenarios, the agents in the environment receive measurements in a streaming fashion, and they are required to estimate a common (nonlinear) model by alternating local computations and communications with their neighbors. We focus on the important distinction between single-task problems, where the underlying model is common to all agents, and multitask problems, where each agent might converge to a different model due to, e.g., spatial dependencies or other factors. Currently, most of the literature on distributed learning in the nonlinear case has focused on the single-task case, which may be a strong limitation in real-world scenarios. After introducing the problem and reviewing the existing approaches, we describe a simple kernel-based algorithm tailored for the multitask case. We evaluate the proposal on a simulated benchmark task, and we conclude by detailing currently open problems and lines of research.Comment: To be published as a chapter in `Adaptive Learning Methods for Nonlinear System Modeling', Elsevier Publishing, Eds. D. Comminiello and J.C. Principe (2018
    • …
    corecore