26 research outputs found
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks
Training and deploying deep learning models in real-world applications
require processing large amounts of data. This is a challenging task when the
amount of data grows to a hundred terabytes, or even, petabyte-scale. We
introduce a hybrid distributed cloud framework with a unified view to multiple
clouds and an on-premise infrastructure for processing tasks using both CPU and
GPU compute instances at scale. The system implements a distributed file system
and failure-tolerant task processing scheduler, independent of the language and
Deep Learning framework used. It allows to utilize unstable cheap resources on
the cloud to significantly reduce costs. We demonstrate the scalability of the
framework on running pre-processing, distributed training, hyperparameter
search and large-scale inference tasks utilizing 10,000 CPU cores and 300 GPU
instances with the overall processing power of 30 petaflops
Brain-Inspired Computing
This open access book constitutes revised selected papers from the 4th International Workshop on Brain-Inspired Computing, BrainComp 2019, held in Cetraro, Italy, in July 2019. The 11 papers presented in this volume were carefully reviewed and selected for inclusion in this book. They deal with research on brain atlasing, multi-scale models and simulation, HPC and data infra-structures for neuroscience as well as artificial and natural neural architectures