8,757 research outputs found
Belief Tree Search for Active Object Recognition
Active Object Recognition (AOR) has been approached as an unsupervised
learning problem, in which optimal trajectories for object inspection are not
known and are to be discovered by reducing label uncertainty measures or
training with reinforcement learning. Such approaches have no guarantees of the
quality of their solution. In this paper, we treat AOR as a Partially
Observable Markov Decision Process (POMDP) and find near-optimal policies on
training data using Belief Tree Search (BTS) on the corresponding belief Markov
Decision Process (MDP). AOR then reduces to the problem of knowledge transfer
from near-optimal policies on training set to the test set. We train a Long
Short Term Memory (LSTM) network to predict the best next action on the
training set rollouts. We sho that the proposed AOR method generalizes well to
novel views of familiar objects and also to novel objects. We compare this
supervised scheme against guided policy search, and find that the LSTM network
reaches higher recognition accuracy compared to the guided policy method. We
further look into optimizing the observation function to increase the total
collected reward of optimal policy. In AOR, the observation function is known
only approximately. We propose a gradient-based method update to this
approximate observation function to increase the total reward of any policy. We
show that by optimizing the observation function and retraining the supervised
LSTM network, the AOR performance on the test set improves significantly.Comment: IROS 201
Reinforcement Learning via AIXI Approximation
This paper introduces a principled approach for the design of a scalable
general reinforcement learning agent. This approach is based on a direct
approximation of AIXI, a Bayesian optimality notion for general reinforcement
learning agents. Previously, it has been unclear whether the theory of AIXI
could motivate the design of practical algorithms. We answer this hitherto open
question in the affirmative, by providing the first computationally feasible
approximation to the AIXI agent. To develop our approximation, we introduce a
Monte Carlo Tree Search algorithm along with an agent-specific extension of the
Context Tree Weighting algorithm. Empirically, we present a set of encouraging
results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
In recent years, state-of-the-art game-playing agents often involve policies
that are trained in self-playing processes where Monte Carlo tree search (MCTS)
algorithms and trained policies iteratively improve each other. The strongest
results have been obtained when policies are trained to mimic the search
behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design,
includes an element of exploration, policies trained in this manner are also
likely to exhibit a similar extent of exploration. In this paper, we are
interested in learning policies for a project with future goals including the
extraction of interpretable strategies, rather than state-of-the-art
game-playing performance. For these goals, we argue that such an extent of
exploration is undesirable, and we propose a novel objective function for
training policies that are not exploratory. We derive a policy gradient
expression for maximising this objective function, which can be estimated using
MCTS value estimates, rather than MCTS visit counts. We empirically evaluate
various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
Assessing the Potential of Classical Q-learning in General Game Playing
After the recent groundbreaking results of AlphaGo and AlphaZero, we have
seen strong interests in deep reinforcement learning and artificial general
intelligence (AGI) in game playing. However, deep learning is
resource-intensive and the theory is not yet well developed. For small games,
simple classical table-based Q-learning might still be the algorithm of choice.
General Game Playing (GGP) provides a good testbed for reinforcement learning
to research AGI. Q-learning is one of the canonical reinforcement learning
methods, and has been used by (Banerjee Stone, IJCAI 2007) in GGP. In this
paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe,
Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to
allow comparison to Banerjee et al.. We find that Q-learning converges to a
high win rate in GGP. For the -greedy strategy, we propose a first
enhancement, the dynamic algorithm. In addition, inspired by (Gelly
Silver, ICML 2007) we combine online search (Monte Carlo Search) to
enhance offline learning, and propose QM-learning for GGP. Both enhancements
improve the performance of classical Q-learning. In this work, GGP allows us to
show, if augmented by appropriate enhancements, that classical table-based
Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594
- …