28,289 research outputs found
QHD: A brain-inspired hyperdimensional reinforcement learning algorithm
Reinforcement Learning (RL) has opened up new opportunities to solve a wide
range of complex decision-making tasks. However, modern RL algorithms, e.g.,
Deep Q-Learning, are based on deep neural networks, putting high computational
costs when running on edge devices. In this paper, we propose QHD, a
Hyperdimensional Reinforcement Learning, that mimics brain properties toward
robust and real-time learning. QHD relies on a lightweight brain-inspired model
to learn an optimal policy in an unknown environment. We first develop a novel
mathematical foundation and encoding module that maps state-action space into
high-dimensional space. We accordingly develop a hyperdimensional regression
model to approximate the Q-value function. The QHD-powered agent makes
decisions by comparing Q-values of each possible action. We evaluate the effect
of the different RL training batch sizes and local memory capacity on the QHD
quality of learning. Our QHD is also capable of online learning with tiny local
memory capacity, which can be as small as the training batch size. QHD provides
real-time learning by further decreasing the memory capacity and the batch
size. This makes QHD suitable for highly-efficient reinforcement learning in
the edge environment, where it is crucial to support online and real-time
learning. Our solution also supports a small experience replay batch size that
provides 12.3 times speedup compared to DQN while ensuring minimal quality
loss. Our evaluation shows QHD capability for real-time learning, providing
34.6 times speedup and significantly better quality of learning than
state-of-the-art deep RL algorithms
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
- …