7 research outputs found
Fast Task-Aware Architecture Inference
Neural architecture search has been shown to hold great promise towards the
automation of deep learning. However in spite of its potential, neural
architecture search remains quite costly. To this point, we propose a novel
gradient-based framework for efficient architecture search by sharing
information across several tasks. We start by training many model architectures
on several related (training) tasks. When a new unseen task is presented, the
framework performs architecture inference in order to quickly identify a good
candidate architecture, before any model is trained on the new task. At the
core of our framework lies a deep value network that can predict the
performance of input architectures on a task by utilizing task meta-features
and the previous model training experiments performed on related tasks. We
adopt a continuous parametrization of the model architecture which allows for
efficient gradient-based optimization. Given a new task, an effective
architecture is quickly identified by maximizing the estimated performance with
respect to the model architecture parameters with simple gradient ascent. It is
key to point out that our goal is to achieve reasonable performance at the
lowest cost. We provide experimental results showing the effectiveness of the
framework despite its high computational efficiency
HyperSTAR: Task-Aware Hyperparameters for Deep Networks
While deep neural networks excel in solving visual recognition tasks, they
require significant effort to find hyperparameters that make them work
optimally. Hyperparameter Optimization (HPO) approaches have automated the
process of finding good hyperparameters but they do not adapt to a given task
(task-agnostic), making them computationally inefficient. To reduce HPO time,
we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a
task-aware method to warm-start HPO for deep neural networks. HyperSTAR ranks
and recommends hyperparameters by predicting their performance conditioned on a
joint dataset-hyperparameter space. It learns a dataset (task) representation
along with the performance predictor directly from raw images in an end-to-end
fashion. The recommendations, when integrated with an existing HPO method, make
it task-aware and significantly reduce the time to achieve optimal performance.
We conduct extensive experiments on 10 publicly available large-scale image
classification datasets over two different network architectures, validating
that HyperSTAR evaluates 50% less configurations to achieve the best
performance compared to existing methods. We further demonstrate that HyperSTAR
makes Hyperband (HB) task-aware, achieving the optimal accuracy in just 25% of
the budget required by both vanilla HB and Bayesian Optimized HB~(BOHB).Comment: Published at CVPR 2020 (Oral
AutoML using Metadata Language Embeddings
As a human choosing a supervised learning algorithm, it is natural to begin
by reading a text description of the dataset and documentation for the
algorithms you might use. We demonstrate that the same idea improves the
performance of automated machine learning methods. We use language embeddings
from modern NLP to improve state-of-the-art AutoML systems by augmenting their
recommendations with vector embeddings of datasets and of algorithms. We use
these embeddings in a neural architecture to learn the distance between
best-performing pipelines. The resulting (meta-)AutoML framework improves on
the performance of existing AutoML frameworks. Our zero-shot AutoML system
using dataset metadata embeddings provides good solutions instantaneously,
running in under one second of computation. Performance is competitive with
AutoML systems OBOE, AutoSklearn, AlphaD3M, and TPOT when each framework is
allocated a minute of computation. We make our data, models, and code publicly
available
Ranking architectures using meta-learning
Neural architecture search has recently attracted lots of research efforts as
it promises to automate the manual design of neural networks. However, it
requires a large amount of computing resources and in order to alleviate this,
a performance prediction network has been recently proposed that enables
efficient architecture search by forecasting the performance of candidate
architectures, instead of relying on actual model training. The performance
predictor is task-aware taking as input not only the candidate architecture but
also task meta-features and it has been designed to collectively learn from
several tasks. In this work, we introduce a pairwise ranking loss for training
a network able to rank candidate architectures for a new unseen task
conditioning on its task meta-features. We present experimental results,
showing that the ranking network is more effective in architecture search than
the previously proposed performance predictor.Comment: NeurIPS 2019 Meta-Learning worksho
Privileged Zero-Shot AutoML
This work improves the quality of automated machine learning (AutoML) systems
by using dataset and function descriptions while significantly decreasing
computation time from minutes to milliseconds by using a zero-shot approach.
Given a new dataset and a well-defined machine learning task, humans begin by
reading a description of the dataset and documentation for the algorithms to be
used. This work is the first to use these textual descriptions, which we call
privileged information, for AutoML. We use a pre-trained Transformer model to
process the privileged text and demonstrate that using this information
improves AutoML performance. Thus, our approach leverages the progress of
unsupervised representation learning in natural language processing to provide
a significant boost to AutoML. We demonstrate that using only textual
descriptions of the data and functions achieves reasonable classification
performance, and adding textual descriptions to data meta-features improves
classification across tabular datasets. To achieve zero-shot AutoML we train a
graph neural network with these description embeddings and the data
meta-features. Each node represents a training dataset, which we use to predict
the best machine learning pipeline for a new test dataset in a zero-shot
fashion. Our zero-shot approach rapidly predicts a high-quality pipeline for a
supervised learning task and dataset. In contrast, most AutoML systems require
tens or hundreds of pipeline evaluations. We show that zero-shot AutoML reduces
running and prediction times from minutes to milliseconds, consistently across
datasets. By speeding up AutoML by orders of magnitude this work demonstrates
real-time AutoML.Comment: 16 pages, 4 figure
Neural Architecture Transfer
Neural architecture search (NAS) has emerged as a promising avenue for
automatically designing task-specific neural networks. Existing NAS approaches
require one complete search for each deployment specification of hardware or
objective. This is a computationally impractical endeavor given the potentially
large number of application scenarios. In this paper, we propose Neural
Architecture Transfer (NAT) to overcome this limitation. NAT is designed to
efficiently generate task-specific custom models that are competitive under
multiple conflicting objectives. To realize this goal we learn task-specific
supernets from which specialized subnets can be sampled without any additional
training. The key to our approach is an integrated online transfer learning and
many-objective evolutionary search procedure. A pre-trained supernet is
iteratively adapted while simultaneously searching for task-specific subnets.
We demonstrate the efficacy of NAT on 11 benchmark image classification tasks
ranging from large-scale multi-class to small-scale fine-grained datasets. In
all cases, including ImageNet, NATNets improve upon the state-of-the-art under
mobile settings ( 600M Multiply-Adds). Surprisingly, small-scale
fine-grained datasets benefit the most from NAT. At the same time, the
architecture search and transfer is orders of magnitude more efficient than
existing NAS methods. Overall, the experimental evaluation indicates that,
across diverse image classification tasks and computational objectives, NAT is
an appreciably more effective alternative to conventional transfer learning of
fine-tuning weights of an existing network architecture learned on standard
datasets. Code is available at
https://github.com/human-analysis/neural-architecture-transferComment: Code is available at
https://github.com/human-analysis/neural-architecture-transfe
Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap
Neural architecture search (NAS) has attracted increasing attentions in both
academia and industry. In the early age, researchers mostly applied individual
search methods which sample and evaluate the candidate architectures separately
and thus incur heavy computational overheads. To alleviate the burden,
weight-sharing methods were proposed in which exponentially many architectures
share weights in the same super-network, and the costly training procedure is
performed only once. These methods, though being much faster, often suffer the
issue of instability. This paper provides a literature review on NAS, in
particular the weight-sharing methods, and points out that the major challenge
comes from the optimization gap between the super-network and the
sub-architectures. From this perspective, we summarize existing approaches into
several categories according to their efforts in bridging the gap, and analyze
both advantages and disadvantages of these methodologies. Finally, we share our
opinions on the future directions of NAS and AutoML. Due to the expertise of
the authors, this paper mainly focuses on the application of NAS to computer
vision problems and may bias towards the work in our group.Comment: 24 pages, 3 figures, 2 tables, meta data update