42 research outputs found
Searching for A Robust Neural Architecture in Four GPU Hours
Conventional neural architecture search (NAS) approaches are based on
reinforcement learning or evolutionary strategy, which take more than 3000 GPU
hours to find a good model on CIFAR-10. We propose an efficient NAS approach
learning to search by gradient descent. Our approach represents the search
space as a directed acyclic graph (DAG). This DAG contains billions of
sub-graphs, each of which indicates a kind of neural architecture. To avoid
traversing all the possibilities of the sub-graphs, we develop a differentiable
sampler over the DAG. This sampler is learnable and optimized by the validation
loss after training the sampled architecture. In this way, our approach can be
trained in an end-to-end fashion by gradient descent, named Gradient-based
search using Differentiable Architecture Sampler (GDAS). In experiments, we can
finish one searching procedure in four GPU hours on CIFAR-10, and the
discovered model obtains a test error of 2.82\% with only 2.5M parameters,
which is on par with the state-of-the-art. Code is publicly available on
GitHub: https://github.com/D-X-Y/NAS-Projects.Comment: Minor modifications to the CVPR 2019 camera-ready version (add code
link
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size
Neural architecture search (NAS) has attracted a lot of attention and has
been illustrated to bring tangible benefits in a large number of applications
in the past few years. Architecture topology and architecture size have been
regarded as two of the most important aspects for the performance of deep
learning models and the community has spawned lots of searching algorithms for
both aspects of the neural architectures. However, the performance gain from
these searching algorithms is achieved under different search spaces and
training setups. This makes the overall performance of the algorithms to some
extent incomparable and the improvement from a sub-module of the searching
model unclear. In this paper, we propose NATS-Bench, a unified benchmark on
searching for both topology and size, for (almost) any up-to-date NAS
algorithm. NATS-Bench includes the search space of 15,625 neural cell
candidates for architecture topology and 32,768 for architecture size on three
datasets. We analyze the validity of our benchmark in terms of various criteria
and performance comparison of all candidates in the search space. We also show
the versatility of NATS-Bench by benchmarking 13 recent state-of-the-art NAS
algorithms on it. All logs and diagnostic information trained using the same
setup for each candidate are provided. This facilitates a much larger community
of researchers to focus on developing better NAS algorithms in a more
comparable and computationally cost friendly environment. All codes are
publicly available at: https://xuanyidong.com/assets/projects/NATS-Bench.Comment: Accepted to IEEE TPAMI 2021, an extended version of NAS-Bench-201
(ICLR 2020) [arXiv:2001.00326
Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks
Deeper and wider Convolutional Neural Networks (CNNs) achieve superior
performance but bring expensive computation cost. Accelerating such
over-parameterized neural network has received increased attention. A typical
pruning algorithm is a three-stage pipeline, i.e., training, pruning, and
retraining. Prevailing approaches fix the pruned filters to zero during
retraining, and thus significantly reduce the optimization space. Besides, they
directly prune a large number of filters at first, which would cause
unrecoverable information loss. To solve these problems, we propose an
Asymptotic Soft Filter Pruning (ASFP) method to accelerate the inference
procedure of the deep neural networks. First, we update the pruned filters
during the retraining stage. As a result, the optimization space of the pruned
model would not be reduced but be the same as that of the original model. In
this way, the model has enough capacity to learn from the training data.
Second, we prune the network asymptotically. We prune few filters at first and
asymptotically prune more filters during the training procedure. With
asymptotic pruning, the information of the training set would be gradually
concentrated in the remaining filters, so the subsequent training and pruning
process would be stable. Experiments show the effectiveness of our ASFP on
image classification benchmarks. Notably, on ILSVRC-2012, our ASFP reduces more
than 40% FLOPs on ResNet-50 with only 0.14% top-5 accuracy degradation, which
is higher than the soft filter pruning (SFP) by 8%.Comment: Extended Journal Version of arXiv:1808.0686
Triformer:Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting--Full Version
A variety of real-world applications rely on far future information to make
decisions, thus calling for efficient and accurate long sequence multivariate
time series forecasting. While recent attention-based forecasting models show
strong abilities in capturing long-term dependencies, they still suffer from
two key limitations. First, canonical self attention has a quadratic complexity
w.r.t. the input time series length, thus falling short in efficiency. Second,
different variables' time series often have distinct temporal dynamics, which
existing studies fail to capture, as they use the same model parameter space,
e.g., projection matrices, for all variables' time series, thus falling short
in accuracy. To ensure high efficiency and accuracy, we propose Triformer, a
triangular, variable-specific attention. (i) Linear complexity: we introduce a
novel patch attention with linear complexity. When stacking multiple layers of
the patch attentions, a triangular structure is proposed such that the layer
sizes shrink exponentially, thus maintaining linear complexity. (ii)
Variable-specific parameters: we propose a light-weight method to enable
distinct sets of model parameters for different variables' time series to
enhance accuracy without compromising efficiency and memory usage. Strong
empirical evidence on four datasets from multiple domains justifies our design
choices, and it demonstrates that Triformer outperforms state-of-the-art
methods w.r.t. both accuracy and efficiency. This is an extended version of
"Triformer: Triangular, Variable-Specific Attentions for Long Sequence
Multivariate Time Series Forecasting", to appear in IJCAI 2022 [Cirstea et al.,
2022a], including additional experimental results