12,918 research outputs found
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Computer vision is experiencing an AI renaissance, in which machine learning
models are expediting important breakthroughs in academic research and
commercial applications. Effectively training these models, however, is not
trivial due in part to hyperparameters: user-configured values that control a
model's ability to learn from data. Existing hyperparameter optimization
methods are highly parallel but make no effort to balance the search across
heterogeneous hardware or to prioritize searching high-impact spaces. In this
paper, we introduce a framework for massively Scalable Hardware-Aware
Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the
relative complexity of each search space and monitors performance on the
learning task over all trials. These metrics are then used as heuristics to
assign hyperparameters to distributed workers based on their hardware. We first
demonstrate that our framework achieves double the throughput of a standard
distributed hyperparameter optimization framework by optimizing SVM for MNIST
using 150 distributed workers. We then conduct model search with SHADHO over
the course of one week using 74 GPUs across two compute clusters to optimize
U-Net for a cell segmentation task, discovering 515 models that achieve a lower
validation loss than standard U-Net.Comment: 10 pages, 6 figure
Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation
Neural dialog models often lack robustness to anomalous user input and
produce inappropriate responses which leads to frustrating user experience.
Although there are a set of prior approaches to out-of-domain (OOD) utterance
detection, they share a few restrictions: they rely on OOD data or multiple
sub-domains, and their OOD detection is context-independent which leads to
suboptimal performance in a dialog. The goal of this paper is to propose a
novel OOD detection method that does not require OOD data by utilizing
counterfeit OOD turns in the context of a dialog. For the sake of fostering
further research, we also release new dialog datasets which are 3 publicly
available dialog corpora augmented with OOD turns in a controllable way. Our
method outperforms state-of-the-art dialog models equipped with a conventional
OOD detection mechanism by a large margin in the presence of OOD utterances.Comment: ICASSP 201
- …