425,817 research outputs found
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
With huge amounts of training data, deep learning has made great
breakthroughs in many artificial intelligence (AI) applications. However, such
large-scale data sets present computational challenges, requiring training to
be distributed on a cluster equipped with accelerators like GPUs. With the fast
increase of GPU computing power, the data communications among GPUs have become
a potential bottleneck on the overall training performance. In this paper, we
first propose a general directed acyclic graph (DAG) model to describe the
distributed synchronous stochastic gradient descent (S-SGD) algorithm, which
has been widely used in distributed deep learning frameworks. To understand the
practical impact of data communications on training performance, we conduct
extensive empirical studies on four state-of-the-art distributed deep learning
frameworks (i.e., Caffe-MPI, CNTK, MXNet and TensorFlow) over multi-GPU and
multi-node environments with different data communication techniques, including
PCIe, NVLink, 10GbE, and InfiniBand. Through both analytical and experimental
studies, we identify the potential bottlenecks and overheads that could be
further optimized. At last, we make the data set of our experimental traces
publicly available, which could be used to support simulation-based studies.Comment: 8 pages. Accepted by ICPADS'201
IDMoB: IoT Data Marketplace on Blockchain
Today, Internet of Things (IoT) devices are the powerhouse of data generation
with their ever-increasing numbers and widespread penetration. Similarly,
artificial intelligence (AI) and machine learning (ML) solutions are getting
integrated to all kinds of services, making products significantly more
"smarter". The centerpiece of these technologies is "data". IoT device vendors
should be able keep up with the increased throughput and come up with new
business models. On the other hand, AI/ML solutions will produce better results
if training data is diverse and plentiful.
In this paper, we propose a blockchain-based, decentralized and trustless
data marketplace where IoT device vendors and AI/ML solution providers may
interact and collaborate. By facilitating a transparent data exchange platform,
access to consented data will be democratized and the variety of services
targeting end-users will increase. Proposed data marketplace is implemented as
a smart contract on Ethereum blockchain and Swarm is used as the distributed
storage platform.Comment: Presented at Crypto Valley Conference on Blockchain Technology (CVCBT
2018), 20-22 June 2018 - published version may diffe
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Highly distributed training of Deep Neural Networks (DNNs) on future compute
platforms (offering 100 of TeraOps/s of computational capacity) is expected to
be severely communication constrained. To overcome this limitation, new
gradient compression techniques are needed that are computationally friendly,
applicable to a wide variety of layers seen in Deep Neural Networks and
adaptable to variations in network architectures as well as their
hyper-parameters. In this paper we introduce a novel technique - the Adaptive
Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized
selection of gradient residues and automatically tunes the compression rate
depending on local activity. We show excellent results on a wide spectrum of
state of the art Deep Learning models in multiple domains (vision, speech,
language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers
(SGD with momentum, Adam) and network parameters (number of learners,
minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate
end-to-end compression rates of ~200X for fully-connected and recurrent layers,
and ~40X for convolutional layers, without any noticeable degradation in model
accuracies.Comment: IBM Research AI, 9 pages, 7 figures, AAAI18 accepte
- …
