38,914 research outputs found
A Pragmatic Look at Deep Imitation Learning
The introduction of the generative adversarial imitation learning (GAIL)
algorithm has spurred the development of scalable imitation learning approaches
using deep neural networks. Many of the algorithms that followed used a similar
procedure, combining on-policy actor-critic algorithms with inverse
reinforcement learning. More recently there have been an even larger breadth of
approaches, most of which use off-policy algorithms. However, with the breadth
of algorithms, everything from datasets to base reinforcement learning
algorithms to evaluation settings can vary, making it difficult to fairly
compare them. In this work we re-implement 6 different IL algorithms, updating
3 of them to be off-policy, base them on a common off-policy algorithm (SAC),
and evaluate them on a widely-used expert trajectory dataset (D4RL) for the
most common benchmark (MuJoCo). After giving all algorithms the same
hyperparameter optimisation budget, we compare their results for a range of
expert trajectories. In summary, GAIL, with all of its improvements,
consistently performs well across a range of sample sizes, AdRIL is a simple
contender that performs well with one important hyperparameter to tune, and
behavioural cloning remains a strong baseline when data is more plentiful.Comment: Asian Conference on Machine Learning, 202
Efficient Supervision for Robot Learning via Imitation, Simulation, and Adaptation
Recent successes in machine learning have led to a shift in the design of
autonomous systems, improving performance on existing tasks and rendering new
applications possible. Data-focused approaches gain relevance across diverse,
intricate applications when developing data collection and curation pipelines
becomes more effective than manual behaviour design. The following work aims at
increasing the efficiency of this pipeline in two principal ways: by utilising
more powerful sources of informative data and by extracting additional
information from existing data. In particular, we target three orthogonal
fronts: imitation learning, domain adaptation, and transfer from simulation.Comment: Dissertation Summar
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement
learning using deep neural networks. DQNs require a large buffer and batch
processing for an experience replay and rely on a backpropagation based
iterative optimization, making them difficult to be implemented on
resource-limited edge devices. In this paper, we propose a lightweight
on-device reinforcement learning approach for low-cost FPGA devices. It
exploits a recently proposed neural-network based on-device learning approach
that does not rely on the backpropagation method but uses OS-ELM (Online
Sequential Extreme Learning Machine) based training algorithm. In addition, we
propose a combination of L2 regularization and spectral normalization for the
on-device reinforcement learning so that output values of the neural network
can be fit into a certain range and the reinforcement learning becomes stable.
The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a
low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate
that the proposed algorithm and its FPGA implementation complete a CartPole-v0
task 29.77x and 89.40x faster than a conventional DQN-based approach when the
number of hidden-layer nodes is 64
- …