Search CORE

26,955 research outputs found

SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization

Author: Kinnison Jeff
Kremer-Herman Nathaniel
Scheirer Walter
Thain Douglas
Publication venue
Publication date: 22/01/2018
Field of study

Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces. In this paper, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware. We first demonstrate that our framework achieves double the throughput of a standard distributed hyperparameter optimization framework by optimizing SVM for MNIST using 150 distributed workers. We then conduct model search with SHADHO over the course of one week using 74 GPUs across two compute clusters to optimize U-Net for a cell segmentation task, discovering 515 models that achieve a lower validation loss than standard U-Net.Comment: 10 pages, 6 figure

arXiv.org e-Print Archive

Crossref

The Value-of-Information in Matching with Queues

Author: Bishop C.
Chung F.
Publication venue
Publication date: 27/03/2015
Field of study

We consider the problem of \emph{optimal matching with queues} in dynamic systems and investigate the value-of-information. In such systems, the operators match tasks and resources stored in queues, with the objective of maximizing the system utility of the matching reward profile, minus the average matching cost. This problem appears in many practical systems and the main challenges are the no-underflow constraints, and the lack of matching-reward information and system dynamics statistics. We develop two online matching algorithms: Learning-aided Reward optimAl Matching (

\mathtt{LRAM}

) and Dual-

\mathtt{LRAM}

(

\mathtt{DRAM}

) to effectively resolve both challenges. Both algorithms are equipped with a learning module for estimating the matching-reward information, while

\mathtt{DRAM}

incorporates an additional module for learning the system dynamics. We show that both algorithms achieve an

O(\epsilon+\delta_r)

close-to-optimal utility performance for any

\epsilon>0

, while

\mathtt{DRAM}

achieves a faster convergence speed and a better delay compared to

\mathtt{LRAM}

, i.e.,

O(\delta_{z}/\epsilon + \log(1/\epsilon)^2))

delay and

O(\delta_z/\epsilon)

convergence under

\mathtt{DRAM}

compared to

O(1/\epsilon)

delay and convergence under

\mathtt{LRAM}

(

\delta_r

and

\delta_z

are maximum estimation errors for reward and system dynamics). Our results reveal that information of different system components can play very different roles in algorithm performance and provide a systematic way for designing joint learning-control algorithms for dynamic systems

arXiv.org e-Print Archive

Crossref

Managing Uncertainty: A Case for Probabilistic Grid Scheduling

Author: Lazarevic Aleksandar
Prnjat Ognjen
Sacks Lionel
Publication venue
Publication date: 01/07/2006
Field of study

The Grid technology is evolving into a global, service-orientated architecture, a universal platform for delivering future high demand computational services. Strong adoption of the Grid and the utility computing concept is leading to an increasing number of Grid installations running a wide range of applications of different size and complexity. In this paper we address the problem of elivering deadline/economy based scheduling in a heterogeneous application environment using statistical properties of job historical executions and its associated meta-data. This approach is motivated by a study of six-month computational load generated by Grid applications in a multi-purpose Grid cluster serving a community of twenty e-Science projects. The observed job statistics, resource utilisation and user behaviour is discussed in the context of management approaches and models most suitable for supporting a probabilistic and autonomous scheduling architecture

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery