Search CORE

2,160 research outputs found

Atari games and Intel processors

Author: Adamski Robert
Grel Tomasz
Klimek Maciej
Michalewski Henryk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/05/2017
Field of study

The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we present our results on learning strategies in Atari games using a Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0 machine learning framework. We also analyze effects of asynchronous computations on the convergence of reinforcement learning algorithms

arXiv.org e-Print Archive

Crossref

Pipelined genetic propagation

Author: Guo C
Guo L
Luk W
Thomas DB
Publication venue
Publication date: 06/05/2015
Field of study

© 2015 IEEE.Genetic Algorithms (GAs) are a class of numerical and combinatorial optimisers which are especially useful for solving complex non-linear and non-convex problems. However, the required execution time often limits their application to small-scale or latency-insensitive problems, so techniques to increase the computational efficiency of GAs are needed. FPGA-based acceleration has significant potential for speeding up genetic algorithms, but existing FPGA GAs are limited by the generational approaches inherited from software GAs. Many parts of the generational approach do not map well to hardware, such as the large shared population memory and intrinsic loop-carried dependency. To address this problem, this paper proposes a new hardware-oriented approach to GAs, called Pipelined Genetic Propagation (PGP), which is intrinsically distributed and pipelined. PGP represents a GA solver as a graph of loosely coupled genetic operators, which allows the solution to be scaled to the available resources, and also to dynamically change topology at run-time to explore different solution strategies. Experiments show that pipelined genetic propagation is effective in solving seven different applications. Our PGP design is 5 times faster than a recent FPGA-based GA system, and 90 times faster than a CPU-based GA system

Spiral - Imperial College Digital Repository

Optimizing Data Collection in Deep Reinforcement Learning

Author: de Lara Eyal
Gabel Moshe
Gleeson James
Pekhimenko Gennady
Snider Daniel
Yang Yvonne
Publication venue
Publication date: 15/07/2022
Field of study

Reinforcement learning (RL) workloads take a notoriously long time to train due to the large number of samples collected at run-time from simulators. Unfortunately, cluster scale-up approaches remain expensive, and commonly used CPU implementations of simulators induce high overhead when switching back and forth between GPU computations. We explore two optimizations that increase RL data collection efficiency by increasing GPU utilization: (1) GPU vectorization: parallelizing simulation on the GPU for increased hardware parallelism, and (2) simulator kernel fusion: fusing multiple simulation steps to run in a single GPU kernel launch to reduce global memory bandwidth requirements. We find that GPU vectorization can achieve up to

1024\times

speedup over commonly used CPU simulators. We profile the performance of different implementations and show that for a simple simulator, ML compiler implementations (XLA) of GPU vectorization outperform a DNN framework (PyTorch) by

13.4\times

by reducing CPU overhead from repeated Python to DL backend API calls. We show that simulator kernel fusion speedups with a simple simulator are

11.3\times

and increase by up to

1024\times

as simulator complexity increases in terms of memory bandwidth requirements. We show that the speedups from simulator kernel fusion are orthogonal and combinable with GPU vectorization, leading to a multiplicative speedup.Comment: MLBench 2022 ( https://memani1.github.io/mlbench22/ ) camera ready submissio

arXiv.org e-Print Archive