Search CORE

276,837 research outputs found

An efficient task-based all-reduce for machine learning applications

Author: Abadi Martín
Chilimbi Trishul M
Jia Yangqing
Krizhevsky Alex
Moritz Philipp
Team Theano Development
Thakur Rajeev
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output. In this research we explore the performance of alternative all-reduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 108. This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%). The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speed-ups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets

Crossref

University of Birmingham Research Portal

Warwick Research Archives Portal Repository