Search CORE

14,033 research outputs found

Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Author: Choi Jungwook
Jo Youngmin
Shin Sungho
Srinivasan Vijayalakshmi
Sung Wonyong
Venkataramani Swagath
Publication venue
Publication date: 06/02/2019
Field of study

Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency.Comment: This paper is accepted in ICASSP201

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Importance mixing: Improving sample reuse in evolutionary policy search methods

Author: Perrin Nicolas
Pourchot Aloïs
Sigaud Olivier
Publication venue
Publication date: 17/08/2018
Field of study

Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable

arXiv.org e-Print Archive

Efficient and versatile data analytics for deep networks

Author: Badia Sala Rosa Maria
Conejero Javier
Cortés García Claudio Ulises
Espinosa-Oviedo Javier A.
Garcia Gasulla Dario
Moreno Vázquez Jonatan
Suzumura Toyotaro
Vargas-Solar Genoveva
Publication venue: Barcelona Supercomputing Center
Publication date: 05/05/2015
Field of study

Deep networks (DN) perform cognitive tasks related with image and text at human-level. To extract and exploit the knowledge coded within these networks we propose a framework which combines state-of-the-art technology in parallelization, storage and analysis. Our goal, to make DN models available to all data scientists

UPCommons. Portal del coneixement obert de la UPC

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Author: Ben-Nun Tal
Hoefler Torsten
Publication venue
Publication date: 15/09/2018
Field of study

Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning

arXiv.org e-Print Archive

Repository for Publications and Research Data

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Author: Hwang Kyuyeon
Sung Wonyong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/03/2015
Field of study

Recurrent neural networks (RNNs) have shown outstanding performance on processing sequence data. However, they suffer from long training time, which demands parallel implementations of the training procedure. Parallelization of the training algorithms for RNNs are very challenging because internal recurrent paths form dependencies between two different time frames. In this paper, we first propose a generalized graph-based RNN structure that covers the most popular long short-term memory (LSTM) network. Then, we present a parallelization approach that automatically explores parallelisms of arbitrary RNNs by analyzing the graph structure. The experimental results show that the proposed approach shows great speed-up even with a single training stream, and further accelerates the training when combined with multiple parallel training streams.Comment: Accepted by the 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 201

arXiv.org e-Print Archive

Crossref