271,195 research outputs found
Transfer Value Iteration Networks
Value iteration networks (VINs) have been demonstrated to have a good
generalization ability for reinforcement learning tasks across similar domains.
However, based on our experiments, a policy learned by VINs still fail to
generalize well on the domain whose action space and feature space are not
identical to those in the domain where it is trained. In this paper, we propose
a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such
that a learned policy from a source domain can be generalized to a target
domain with only limited training data, even if the source domain and the
target domain have domain-specific actions and features. We empirically verify
that our proposed TVINs outperform VINs when the source and the target domains
have similar but not identical action and feature spaces. Furthermore, we show
that the performance improvement is consistent across different environments,
maze sizes, dataset sizes as well as different values of hyperparameters such
as number of iteration and kernel size
Value Iteration Networks on Multiple Levels of Abstraction
Learning-based methods are promising to plan robot motion without performing
extensive search, which is needed by many non-learning approaches. Recently,
Value Iteration Networks (VINs) received much interest since---in contrast to
standard CNN-based architectures---they learn goal-directed behaviors which
generalize well to unseen domains. However, VINs are restricted to small and
low-dimensional domains, limiting their applicability to real-world planning
problems.
To address this issue, we propose to extend VINs to representations with
multiple levels of abstraction. While the vicinity of the robot is represented
in sufficient detail, the representation gets spatially coarser with increasing
distance from the robot. The information loss caused by the decreasing
resolution is compensated by increasing the number of features representing a
cell. We show that our approach is capable of solving significantly larger 2D
grid world planning tasks than the original VIN implementation. In contrast to
a multiresolution coarse-to-fine VIN implementation which does not employ
additional descriptive features, our approach is capable of solving challenging
environments, which demonstrates that the proposed method learns to encode
useful information in the additional features. As an application for solving
real-world planning tasks, we successfully employ our method to plan
omnidirectional driving for a search-and-rescue robot in cluttered terrain
The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation
This paper gives specific divergence examples of value-iteration for several
major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when
using a function approximator for the value function. These divergence examples
differ from previous divergence examples in the literature, in that they are
applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps
surprisingly, with a greedy policy, it is also possible to get divergence for
the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also
achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and
GDHP.Comment: 8 pages, 4 figures. In Proceedings of the IEEE International Joint
Conference on Neural Networks, June 2012, Brisbane (IEEE IJCNN 2012), pp.
3070--307
Representation Learning on Graphs: A Reinforcement Learning Application
In this work, we study value function approximation in reinforcement learning
(RL) problems with high dimensional state or action spaces via a generalized
version of representation policy iteration (RPI). We consider the limitations
of proto-value functions (PVFs) at accurately approximating the value function
in low dimensions and we highlight the importance of features learning for an
improved low-dimensional value function approximation. Then, we adopt different
representation learning algorithm on graphs to learn the basis functions that
best represent the value function. We empirically show that node2vec, an
algorithm for scalable feature learning in networks, and the Variational Graph
Auto-Encoder constantly outperform the commonly used smooth proto-value
functions in low-dimensional feature space
Dynamic Programming for Optimal Control of Set-Up Scheduling with Neural Network Modifications
This paper demonstrates an optimal control solution to change of machine set-up scheduling based on dynamic programming average cost per stage value iteration as set forth by Cararnanis et. al. [2] for the 2D case. The difficulty with the optimal approach lies in the explosive computational growth of the resulting solution. A method of reducing the computational complexity is developed using ideas from biology and neural networks. A real time controller is described that uses a linear-log representation of state space with neural networks employed to fit cost surfaces.Defense Advanced Research Projects Agency (90-0083
- …