Search CORE

204 research outputs found

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

Author: Alistarh Dan
Chilimbi Trishul
Devlin Jacob
Ho Qirong
Hoefler T.
Hoefler T.
Hoefler T.
Hsieh Kevin
Interface Forum Message Passing
Jayarajan Anand
Lian Xiangru
Recht B.
Seide Frank
Strom Nikko
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD) achieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at every training step. In this paper, we propose eager-SGD, which relaxes the global synchronization for decentralized accumulation. To implement eager-SGD, we propose to use two partial collectives: solo and majority. With solo allreduce, the faster processes contribute their gradients eagerly without waiting for the slower processes, whereas with majority allreduce, at least half of the participants must contribute gradients before continuing, all without using a central parameter server. We theoretically prove the convergence of the algorithms and describe the partial collectives in detail. Experimental results on load-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) show that eager-SGD achieves 1.27x speedup over the state-of-the-art synchronous SGD, without losing accuracy.Comment: Published in Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'20), pp. 45-61. 202

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

SparCML: High-Performance Sparse Communication for Machine Learning

Author: Abadi Martín
Alistarh Dan
Chen Tianqi
Chilimbi Trishul M
Devlin Jacob
Hoefler T.
Hoefler T.
Hoefler T.
Interface Forum Message Passing
Ketkar Nikhil
Sa Christopher De
Seide Frank
Strom Nikko
Wen Wei
Publication venue
Publication date: 01/01/2019
Field of study

Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SparCML, extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SparCML and its techniques will form the basis of future highly-scalable machine learning frameworks

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Author: Banerjee D. S.
Barnett M.
Chu Ching-Hsiang
Hoefler T.
Jia Yangqing
Liu J.
Mamidala A. R.
Shi R.
Venkatesh A.
Publication venue
Publication date: 28/07/2017
Field of study

Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This coupled with new application workloads brought forward by Deep Learning frameworks like Caffe and Microsoft CNTK pose additional design constraints due to very large message communication of GPU buffers during the training phase. In this context, special-purpose libraries like NVIDIA NCCL have been proposed for GPU-based collective communication on dense GPU systems. In this paper, we propose a pipelined chain (ring) design for the MPI_Bcast collective operation along with an enhanced collective tuning framework in MVAPICH2-GDR that enables efficient intra-/inter-node multi-GPU communication. We present an in-depth performance landscape for the proposed MPI_Bcast schemes along with a comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement, compared to NCCL-based solutions, for intra- and inter-node broadcast latency, respectively. In addition, the proposed designs provide up to 7% improvement over NCCL-based solutions for data parallel training of the VGG network on 128 GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Hybrid MPI: Efficient Message Passing for Shared and Distributed Memory

Author: Bronevetsky G
Friedley A
Hoefler T
Lumsdaine A
Publication venue: Lawrence Livermore National Laboratory
Publication date: 12/02/2013
Field of study

UNT Digital Library

Recommended from our members

Scalable High Performance Message Passing over InfiniBand for Open MPI

Author: Friedley A.
Hoefler T.
Leininger M. L.
Lumsdaine A.
Publication venue: Lawrence Livermore National Laboratory
Publication date: 24/10/2007
Field of study

InfiniBand (IB) is a popular network technology for modern high-performance computing systems. MPI implementations traditionally support IB using a reliable, connection-oriented (RC) transport. However, per-process resource usage that grows linearly with the number of processes, makes this approach prohibitive for large-scale systems. IB provides an alternative in the form of a connectionless unreliable datagram transport (UD), which allows for near-constant resource usage and initialization overhead as the process count increases. This paper describes a UD-based implementation for IB in Open MPI as a scalable alternative to existing RC-based schemes. We use the software reliability capabilities of Open MPI to provide the guaranteed delivery semantics required by MPI. Results show that UD not only requires fewer resources at scale, but also allows for shorter MPI startup times. A connectionless model also improves performance for applications that tend to send small messages to many different processes

UNT Digital Library

Towards Efficient MapReduce Using MPI

Author: A. Gara
A. Kimball
A.N. Langville
B. He
C. Ranger
C.T. Chu
D. Gregor
G.E. Fagg
J. Dean
R. Lämmel
R. Pike
S. Ghemawat
T. Hoefler
T. Hoefler
W. Gropp
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Kilometer-scale climate models: Prospects and challenges

Author: Arteaga A.
Ban N.
Charpilloz C.
Di Girolamo S.
Fuhrer O.
Hentgen L.
Hoefler T.
Lapillonne X.
Leutwyler D.
Osterried K.
Panosetti D.
Rüdisühli S.
Schlemmer L.
Schulthess T.
Schär C.
Sprenger M.
Ubbiali S.
Wernli H.
Publication venue: 'American Meteorological Society'
Publication date: 01/06/2020
Field of study

Currently major efforts are underway toward refining the horizontal resolution (or grid spacing) of climate models to about 1 km, using both global and regional climate models (GCMs and RCMs). Several groups have succeeded in conducting kilometer-scale multiweek GCM simulations and decadelong continental-scale RCM simulations. There is the well-founded hope that this increase in resolution represents a quantum jump in climate modeling, as it enables replacing the parameterization of moist convection by an explicit treatment. It is expected that this will improve the simulation of the water cycle and extreme events and reduce uncertainties in climate change projections. While kilometer-scale resolution is commonly employed in limited-area numerical weather prediction, enabling it on global scales for extended climate simulations requires a concerted effort. In this paper, we exploit an RCM that runs entirely on graphics processing units (GPUs) and show examples that highlight the prospects of this approach. A particular challenge addressed in this paper relates to the growth in output volumes. It is argued that the data avalanche of high-resolution simulations will make it impractical or impossible to store the data. Rather, repeating the simulation and conducting online analysis will become more efficient. A prototype of this methodology is presented. It makes use of a bit-reproducible model version that ensures reproducible simulations across hardware architectures, in conjunction with a data virtualization layer as a common interface for output analyses. An assessment of the potential of these novel approaches will be provided

MPG.PuRe

Increased neutrophil-lymphocyte ratio is a poor prognostic factor in patients with primary operable and inoperable pancreatic cancer

Author: A Zoughbi W.
Eisner F.
Gerger A.
Hoefler G.
Kornprat P.
L Ress A.
Lackner C.
Loibner H.
Pichler Martin
Samonigg H.
Seggewies F. S.
Stojakovic T.
Stotz M.
Szkandera J.
Publication venue
Publication date: 01/01/2013
Field of study

Background: The neutrophil-lymphocyte ratio (NLR) has been proposed as an indicator of systemic inflammatory response. Previous findings from small-scale studies revealed conflicting results about its independent prognostic significance with regard to different clinical end points in pancreatic cancer (PC) patients. Therefore, the aim of our study was the external validation of the prognostic significance of NLR in a large cohort of PC patients. Methods: Data from 371 consecutive PC patients, treated between 2004 and 2010 at a single centre, were evaluated retrospectively. The whole cohort was stratified into two groups according to the treatment modality. Group 1 comprised 261 patients with inoperable PC at diagnosis and group 2 comprised 110 patients with surgically resected PC. Cancer-specific survival (CSS) was assessed using the Kaplan–Meier method. To evaluate the independent prognostic significance of the NLR, the modified Glasgow prognostic score (mGPS) and the platelet-lymphocyte ratio univariate and multivariate Cox regression models were applied. Results: Multivariate analysis identified increased NLR as an independent prognostic factor for inoperable PC patients (hazard ratio (HR)=2.53, confidence interval (CI)=1.64–3.91, P<0.001) and surgically resected PC patients (HR=1.61, CI=1.02–2.53, P=0.039). In inoperable PC patients, the mGPS was associated with poor CSS only in univariate analysis (HR=1.44, CI=1.04–1.98). Conclusion: Risk prediction for cancer-related end points using NLR does add independent prognostic information to other well-established prognostic factors in patients with PC, regardless of the undergoing therapeutic modality. Thus, the NLR should be considered for future individual risk assessment in patients with PC

OPUS Augsburg

Crossref

PubMed Central

Report from the OECI Oncology Days 2014

Author: Bagg A.
Becker K.F.
Buiga R.
Bussolati G.
Cadariu P.A.
Ciuleanu T.
de Paoli P.
Dono M.
Folprecht G.
Goedbloed N.
Harten W.H. van
Haybaeck J.
Hoefler G.
Lemare F.
Lombardo Claudio
Lopez Guerrero J.A.
Lorenzo F. de
Lovey J.
Paparo F.
Pierotti M.
Razavi D.
Riegman P.
Rollandi G.A.
Saghatchian M.
Stanta G.
Truini M.
Weiner G.
Zupo S.
Publication venue
Publication date: 01/01/2014
Field of study

The 2014 OECI Oncology Days was held at the ‘Prof. Dr. Ion Chiricuta’ Oncology Institute in Cluj, Romania, from 12 to 13 June. The focus of this year’s gathering was on developments in personalised medicine and other treatment advances which have made the cost of cancer care too high for many regions throughout Europe

Archivio istituzionale della ricerca - Università di Trieste

PubMed Central

HAL Descartes

EUR Research Repository

DI-fusion

Erasmus University Digital Repository

University of Twente Research Information

Outcomes in randomised controlled trials in prevention and management of carious lesions:a systematic review

Author: A Agnihotry
A Ahovuo-Saloranta
C Echevarria
Colin Levey
D Eyding
D Moher
D Moher
D Moher
D Ricketts
E Gargon
F Schwendicke
Falk Schwendicke
Gerd Göstemeyer
JP Ioannidis
K Dwan
KF Schulz
LY Chong
Nicola Innes
NJ Kassebaum
NP Innes
P Glasziou
P Tugwell
PR Williamson
PS Fleming
PS Myles
R Knapp
RM Smyth
S Ebrahim
S Hancocks
S Listl
SJ Gentles
T Lamont
T Vos
Thomas Lamont
V Hoefler
V Smail-Faugeron
V Yengopal
VC Marinho
W Marcenes
Z Iheozor-Ejiofor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2017
Field of study

Abstract Background Inconsistent outcome reporting is one significant hurdle to combining results from trials into systematic reviews. Core outcome sets (COS) can reduce this barrier. The aim of this review was to map outcomes reported in caries prevention and management randomised controlled trials (RCT) as a first step to COS development. We also investigated RCT characteristics and reporting of primary outcomes and sample size calculations. Methods PubMed, Embase, Web of Knowledge and Cochrane CENTRAL were systematically searched (1 January 1968 to 25 August 2015). Inclusion criteria: RCTs comparing any technique for prevention or management of caries with another or placebo and RCTs comparing interventions to support patients undergoing treatment of caries (without setting, dentition or age restrictions). Categories were developed through piloting and group consensus and outcomes grouped accordingly. Results Of 4773 search results, 764 were potentially relevant, full text was available for 731 papers and 605 publications met the inclusion criteria and were included. For all outcomes across the time periods 1968–1980 and 2001–2010, reporting of outcome ‘caries experience’ reduced from 39% to 18%; ‘clinical performance of the restoration’ reporting increased from 33% to 42% although there was a reduction to 22% in 2011–2015. Emerging outcome domains include ‘lesion activity’ and ‘pulp health-related outcomes’, accounting for 1% and 0%, respectively, during 1968–1980 and 10% and 4% for 2011–2015. Reporting ‘resource efficiency’ and ‘quality of life measures’ have remained at a low level. No publications reported tooth survival independent of an index such as DMFT or equivalent. Primary outcomes were only identified as such in 414 (68%) of the reports. Conclusions Over the past 50 years, outcome reporting for trials on prevention and management of carious lesions have tended to focus on outcomes measuring caries experience and restoration material clinical performance with lesion activity and cost-effectiveness increasingly being reported. Patient-reported and patient-focused outcomes are becoming more common (although as secondary outcomes) but remain low in use. The challenge with developing a COS will be balancing commonly previously reported outcomes against those more relevant for the future. Trial registration PROSPERO, CRD42015025310 . Registered on 14 August 2015, Trials (Schwendicke et al., Trials 16:397, 2015) and COMET initiative online (COMET, 2017)

Crossref

Online Research @ Cardiff

Directory of Open Access Journals

University of Dundee Online Publications