Search CORE

271,195 research outputs found

Transfer Value Iteration Networks

Author: Pan Sinno Jialin
Shen Junyi
Xu Jin
Zhong Bin
Zhuo Hankz Hankui
Publication venue
Publication date: 26/11/2019
Field of study

Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains. However, based on our experiments, a policy learned by VINs still fail to generalize well on the domain whose action space and feature space are not identical to those in the domain where it is trained. In this paper, we propose a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such that a learned policy from a source domain can be generalized to a target domain with only limited training data, even if the source domain and the target domain have domain-specific actions and features. We empirically verify that our proposed TVINs outperform VINs when the source and the target domains have similar but not identical action and feature spaces. Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and kernel size

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Value Iteration Networks on Multiple Levels of Abstraction

Author: Behnke Sven
Klamt Tobias
Schleich Daniel
Publication venue: 'Robotics: Science and Systems Foundation'
Publication date: 01/07/2019
Field of study

Learning-based methods are promising to plan robot motion without performing extensive search, which is needed by many non-learning approaches. Recently, Value Iteration Networks (VINs) received much interest since---in contrast to standard CNN-based architectures---they learn goal-directed behaviors which generalize well to unseen domains. However, VINs are restricted to small and low-dimensional domains, limiting their applicability to real-world planning problems. To address this issue, we propose to extend VINs to representations with multiple levels of abstraction. While the vicinity of the robot is represented in sufficient detail, the representation gets spatially coarser with increasing distance from the robot. The information loss caused by the decreasing resolution is compensated by increasing the number of features representing a cell. We show that our approach is capable of solving significantly larger 2D grid world planning tasks than the original VIN implementation. In contrast to a multiresolution coarse-to-fine VIN implementation which does not employ additional descriptive features, our approach is capable of solving challenging environments, which demonstrates that the proposed method learns to encode useful information in the additional features. As an application for solving real-world planning tasks, we successfully employ our method to plan omnidirectional driving for a search-and-rescue robot in cluttered terrain

arXiv.org e-Print Archive

Crossref

The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation

Author: Alonso Eduardo
Fairbank Michael
Publication venue
Publication date: 01/01/2012
Field of study

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.Comment: 8 pages, 4 figures. In Proceedings of the IEEE International Joint Conference on Neural Networks, June 2012, Brisbane (IEEE IJCNN 2012), pp. 3070--307

arXiv.org e-Print Archive

CiteSeerX

City Research Online

Crossref

Value Iteration Networks

Author
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date
Field of study

Crossref

Representation Learning on Graphs: A Reinforcement Learning Application

Author: Madjiheurem Sephora
Toni Laura
Publication venue
Publication date: 17/01/2019
Field of study

In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithm on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and the Variational Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space

arXiv.org e-Print Archive

UCL Discovery

Dynamic Programming for Optimal Control of Set-Up Scheduling with Neural Network Modifications

Author: Bradski Gary
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/02/1992
Field of study

This paper demonstrates an optimal control solution to change of machine set-up scheduling based on dynamic programming average cost per stage value iteration as set forth by Cararnanis et. al. [2] for the 2D case. The difficulty with the optimal approach lies in the explosive computational growth of the resulting solution. A method of reducing the computational complexity is developed using ideas from biology and neural networks. A real time controller is described that uses a linear-log representation of state space with neural networks employed to fit cost surfaces.Defense Advanced Research Projects Agency (90-0083

Boston University Institutional Repository (OpenBU)