79,398 research outputs found
Searching for A Robust Neural Architecture in Four GPU Hours
Conventional neural architecture search (NAS) approaches are based on
reinforcement learning or evolutionary strategy, which take more than 3000 GPU
hours to find a good model on CIFAR-10. We propose an efficient NAS approach
learning to search by gradient descent. Our approach represents the search
space as a directed acyclic graph (DAG). This DAG contains billions of
sub-graphs, each of which indicates a kind of neural architecture. To avoid
traversing all the possibilities of the sub-graphs, we develop a differentiable
sampler over the DAG. This sampler is learnable and optimized by the validation
loss after training the sampled architecture. In this way, our approach can be
trained in an end-to-end fashion by gradient descent, named Gradient-based
search using Differentiable Architecture Sampler (GDAS). In experiments, we can
finish one searching procedure in four GPU hours on CIFAR-10, and the
discovered model obtains a test error of 2.82\% with only 2.5M parameters,
which is on par with the state-of-the-art. Code is publicly available on
GitHub: https://github.com/D-X-Y/NAS-Projects.Comment: Minor modifications to the CVPR 2019 camera-ready version (add code
link
MANAS: Multi-Agent Neural Architecture Search
The Neural Architecture Search (NAS) problem is typically formulated as a
graph search problem where the goal is to learn the optimal operations over
edges in order to maximise a graph-level global objective. Due to the large
architecture parameter space, efficiency is a key bottleneck preventing NAS
from its practical use. In this paper, we address the issue by framing NAS as a
multi-agent problem where agents control a subset of the network and coordinate
to reach optimal architectures. We provide two distinct lightweight
implementations, with reduced memory requirements (1/8th of state-of-the-art),
and performances above those of much more computationally expensive methods.
Theoretically, we demonstrate vanishing regrets of the form O(sqrt(T)), with T
being the total number of rounds. Finally, aware that random search is an,
often ignored, effective baseline we perform additional experiments on 3
alternative datasets and 2 network configurations, and achieve favourable
results in comparison
A Study on Encodings for Neural Architecture Search
Neural architecture search (NAS) has been extensively studied in the past few
years. A popular approach is to represent each neural architecture in the
search space as a directed acyclic graph (DAG), and then search over all DAGs
by encoding the adjacency matrix and list of operations as a set of
hyperparameters. Recent work has demonstrated that even small changes to the
way each architecture is encoded can have a significant effect on the
performance of NAS algorithms.
In this work, we present the first formal study on the effect of architecture
encodings for NAS, including a theoretical grounding and an empirical study.
First we formally define architecture encodings and give a theoretical
characterization on the scalability of the encodings we study Then we identify
the main encoding-dependent subroutines which NAS algorithms employ, running
experiments to show which encodings work best with each subroutine for many
popular algorithms. The experiments act as an ablation study for prior work,
disentangling the algorithmic and encoding-based contributions, as well as a
guideline for future work. Our results demonstrate that NAS encodings are an
important design decision which can have a significant impact on overall
performance. Our code is available at
https://github.com/naszilla/nas-encodings
- …