2 research outputs found

    Off-policy reinforcement learning with Gaussian processes

    Get PDF
    An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.United States. Office of Naval Research (Autonomy Program N000140910625

    IEEE Transactions On Neural Networks And Learning Systems : Vol. 24, No. 12, December 2013

    No full text
    Canonical Correlation Analysis on Data With Censoring and Error Information - J. Sun and S. Keates Highly Accurate Moving Object Detection in Variable Bit Rate Video-Based Traffic Monitoring Systems - S. -C. Huang and B. -H. Chen Recurrent Neural Collective Classification - D. D. Monner and J. A. Reggia Online Selective Kernel-Based Temporal Difference Learning - X. Chen, Y. Gao, and R. Wang Stability and Synchronization of Discrete-Time Neural Network With Switching Parameters, and Time-Varying Delays - L. Wu, Z. Feng, and J. Lam Artificial Endocrine Controlller for Power Management in Robotic Systems C. Sauze and M. Neal Operator Control of Interneural Computing Machines - M. -H. Shih and F. -S. Tsai Multiple Graph Label Propagation by Sparse Integration - M. Karasuyama and H. Mamitsuka Universal Blind Image Quality Assessment Metrics Via Natural Scene Statistics and Multiple Kernel Learning - X. Gao, F. Gao, D. Tao, and X. Li H State Estimation for Complex Networks With Uncertain Inner Coupling and Incomplete Measurements - B. Shen, Z. wang, D. Ding, and H. Shu Goal Representation Heuristic Dynamic Programming on Maze Navigation - Z. Ni, H. He, J. Wen, and X. Xu Accelerated Canonical Polyadic Decomposition Using Mode Reduction - G. Zhou, A. Cichocki, and S. Xie Hardware Friendly Probabilistic spiking Neural Network With Long-Term and Short - Term Plasticity - H. -Y. Hsieh and K. -T. Tang Neural Network Architecture for Cognitive Navigation in Dynamic Environments - J. A. Villacorta - Atienza and V. A. Makarov An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time - M. Fairbank, E. Alonso, and D. Prokhorov Semisupervised Multitask Learning With Gaussian Processes - G Skolidis and G. Sanguinetti BRIEF PAPERS Nonlinear Projection Trick in Kernel Methods : An Alternative to the Kernel Trick - N. Kwak ANNOUNCEMENTS IEEE WCCI 2014 Etc