6 research outputs found
AccMER: Accelerating Multi-Agent Experience Replay with Cache Locality-aware Prioritization
Multi-Agent Experience Replay (MER) is a key component of off-policy
reinforcement learning~(RL) algorithms. By remembering and reusing experiences
from the past, experience replay significantly improves the stability of RL
algorithms and their learning efficiency. In many scenarios, multiple agents
interact in a shared environment during online training under centralized
training and decentralized execution~(CTDE) paradigm. Current multi-agent
reinforcement learning~(MARL) algorithms consider experience replay with
uniform sampling or based on priority weights to improve transition data sample
efficiency in the sampling phase. However, moving transition data histories for
each agent through the processor memory hierarchy is a performance limiter.
Also, as the agents' transitions continuously renew every iteration, the finite
cache capacity results in increased cache misses.
To this end, we propose \name, that repeatedly reuses the
transitions~(experiences) for a window of steps in order to improve the
cache locality and minimize the transition data movement, instead of sampling
new transitions at each step. Specifically, our optimization uses priority
weights to select the transitions so that only high-priority transitions will
be reused frequently, thereby improving the cache performance. Our experimental
results on the Predator-Prey environment demonstrate the effectiveness of
reusing the essential transitions based on the priority weights, where we
observe an end-to-end training time reduction of ~(for agents)
compared to existing prioritized MER algorithms without notable degradation in
the mean reward.Comment: Accepted to ASAP'2
Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand
Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is
extremely difficult because of the high-dimensional state and action spaces,
rich contact patterns between the fingers and objects. Even though deep
reinforcement learning has made moderate progress and demonstrated its strong
potential for manipulation, it is still faced with certain challenges, such as
large-scale data collection and high sample complexity. Especially, for some
slight change scenes, it always needs to re-collect vast amounts of data and
carry out numerous iterations of fine-tuning. Remarkably, humans can quickly
transfer learned manipulation skills to different scenarios with little
supervision. Inspired by human flexible transfer learning capability, we
propose a novel dexterous in-hand manipulation progressive transfer learning
framework (PTL) based on efficiently utilizing the collected trajectories and
the source-trained dynamics model. This framework adopts progressive neural
networks for dynamics model transfer learning on samples selected by a new
samples selection method based on dynamics properties, rewards and scores of
the trajectories. Experimental results on contact-rich anthropomorphic hand
manipulation tasks show that our method can efficiently and effectively learn
in-hand manipulation skills with a few online attempts and adjustment learning
under the new scene. Compared to learning from scratch, our method can reduce
training time costs by 95%.Comment: 12 pages, 7 figures, submitted to TNNL
Single- and multiobjective reinforcement learning in dynamic adversarial games
This thesis uses reinforcement learning (RL) to address dynamic adversarial games in the context of air combat manoeuvring simulation. A sequential decision problem commonly encountered in the field of operations research, air combat manoeuvring simulation conventionally relied on agent programming methods that required significant domain knowledge to be manually encoded into the simulation environment. These methods are appropriate for determining the effectiveness of existing tactics in different simulated scenarios. However, in order to maximise the advantages provided by new technologies (such as autonomous aircraft), new tactics will need to be discovered. A proven technique for solving sequential decision problems, RL has the potential to discover these new tactics. This thesis explores four RL approaches—tabular, deep, discrete-to-deep and multiobjective— as mechanisms for discovering new behaviours in simulations of air combat manoeuvring. Itimplements and tests several methods for each approach and compares those methods in terms of the learning time, baseline and comparative performances, and implementation complexity. In addition to evaluating the utility of existing approaches to the specific task of air combat manoeuvring, this thesis proposes and investigates two novel methods, discrete-to-deep supervised policy learning (D2D-SPL) and discrete-to-deep supervised Q-value learning (D2D-SQL), which can be applied more generally. D2D-SPL and D2D-SQL offer the generalisability of deep RL at a cost closer to the tabular approach.Doctor of Philosoph