Search CORE

5,577 research outputs found

The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation

Author: Alonso Eduardo
Fairbank Michael
Publication venue
Publication date: 01/01/2012
Field of study

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.Comment: 8 pages, 4 figures. In Proceedings of the IEEE International Joint Conference on Neural Networks, June 2012, Brisbane (IEEE IJCNN 2012), pp. 3070--307

arXiv.org e-Print Archive

CiteSeerX

City Research Online

Crossref

Contrastive learning and neural oscillations

Author: Baldi Pierre
Pineda Fernando
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/1991
Field of study

The concept of Contrastive Learning (CL) is developed as a family of possible learning algorithms for neural networks. CL is an extension of Deterministic Boltzmann Machines to more general dynamical systems. During learning, the network oscillates between two phases. One phase has a teacher signal and one phase has no teacher signal. The weights are updated using a learning rule that corresponds to gradient descent on a contrast function that measures the discrepancy between the free network and the network with a teacher signal. The CL approach provides a general unified framework for developing new learning algorithms. It also shows that many different types of clamping and teacher signals are possible. Several examples are given and an analysis of the landscape of the contrast function is proposed with some relevant predictions for the CL curves. An approach that may be suitable for collective analog implementations is described. Simulation results and possible extensions are briefly discussed together with a new conjecture regarding the function of certain oscillations in the brain. In the appendix, we also examine two extensions of contrastive learning to time-dependent trajectories

Caltech Authors

Playing Atari with Deep Reinforcement Learning

Author: Antonoglou Ioannis
Graves Alex
Kavukcuoglu Koray
Mnih Volodymyr
Riedmiller Martin
Silver David
Wierstra Daan
Publication venue
Publication date: 01/01/2013
Field of study

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.Comment: NIPS Deep Learning Workshop 201

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery