64 research outputs found
Accelerating Deep Learning with Shrinkage and Recall
Deep Learning is a very powerful machine learning model. Deep Learning trains
a large number of parameters for multiple layers and is very slow when data is
in large scale and the architecture size is large. Inspired from the shrinking
technique used in accelerating computation of Support Vector Machines (SVM)
algorithm and screening technique used in LASSO, we propose a shrinking Deep
Learning with recall (sDLr) approach to speed up deep learning computation. We
experiment shrinking Deep Learning with recall (sDLr) using Deep Neural Network
(DNN), Deep Belief Network (DBN) and Convolution Neural Network (CNN) on 4 data
sets. Results show that the speedup using shrinking Deep Learning with recall
(sDLr) can reach more than 2.0 while still giving competitive classification
performance.Comment: The 22nd IEEE International Conference on Parallel and Distributed
Systems (ICPADS 2016
What does fault tolerant Deep Learning need from MPI?
Deep Learning (DL) algorithms have become the de facto Machine Learning (ML)
algorithm for large scale data analysis. DL algorithms are computationally
expensive - even distributed DL implementations which use MPI require days of
training (model learning) time on commonly studied datasets. Long running DL
applications become susceptible to faults - requiring development of a fault
tolerant system infrastructure, in addition to fault tolerant DL algorithms.
This raises an important question: What is needed from MPI for de- signing
fault tolerant DL implementations? In this paper, we address this problem for
permanent faults. We motivate the need for a fault tolerant MPI specification
by an in-depth consideration of recent innovations in DL algorithms and their
properties, which drive the need for specific fault tolerance features. We
present an in-depth discussion on the suitability of different parallelism
types (model, data and hybrid); a need (or lack thereof) for check-pointing of
any critical data structures; and most importantly, consideration for several
fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI
and their applicability to fault tolerant DL implementations. We leverage a
distributed memory implementation of Caffe, currently available under the
Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches
by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation
using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies
demonstrates the effectiveness of the proposed fault tolerant DL implementation
using OpenMPI based ULFM
CFDNet: a deep learning-based accelerator for fluid simulations
CFD is widely used in physical system design and optimization, where it is
used to predict engineering quantities of interest, such as the lift on a plane
wing or the drag on a motor vehicle. However, many systems of interest are
prohibitively expensive for design optimization, due to the expense of
evaluating CFD simulations. To render the computation tractable, reduced-order
or surrogate models are used to accelerate simulations while respecting the
convergence constraints provided by the higher-fidelity solution. This paper
introduces CFDNet -- a physical simulation and deep learning coupled framework,
for accelerating the convergence of Reynolds Averaged Navier-Stokes
simulations. CFDNet is designed to predict the primary physical properties of
the fluid including velocity, pressure, and eddy viscosity using a single
convolutional neural network at its core. We evaluate CFDNet on a variety of
use-cases, both extrapolative and interpolative, where test geometries are
observed/not-observed during training. Our results show that CFDNet meets the
convergence constraints of the domain-specific physics solver while
outperforming it by 1.9 - 7.4x on both steady laminar and turbulent flows.
Moreover, we demonstrate the generalization capacity of CFDNet by testing its
prediction on new geometries unseen during training. In this case, the approach
meets the CFD convergence criterion while still providing significant speedups
over traditional domain-only models.Comment: It has been accepted and almost published in the International
Conference in Supercomputing (ICS) 202
Iso-energy-efficiency: An approach to power-constrained parallel computation
Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making
- …