19 research outputs found
Powering One-shot Topological NAS with Stabilized Share-parameter Proxy
One-shot NAS method has attracted much interest from the research community
due to its remarkable training efficiency and capacity to discover high
performance models. However, the search spaces of previous one-shot based works
usually relied on hand-craft design and were short for flexibility on the
network topology. In this work, we try to enhance the one-shot NAS by exploring
high-performing network architectures in our large-scale Topology Augmented
Search Space (i.e., over 3.4*10^10 different topological structures).
Specifically, the difficulties for architecture searching in such a complex
space has been eliminated by the proposed stabilized share-parameter proxy,
which employs Stochastic Gradient Langevin Dynamics to enable fast shared
parameter sampling, so as to achieve stabilized measurement of architecture
performance even in search space with complex topological structures. The
proposed method, namely Stablized Topological Neural Architecture Search
(ST-NAS), achieves state-of-the-art performance under Multiply-Adds (MAdds)
constraint on ImageNet. Our lite model ST-NAS-A achieves 76.4% top-1 accuracy
with only 326M MAdds. Our moderate model ST-NAS-B achieves 77.9% top-1 accuracy
just required 503M MAdds. Both of our models offer superior performances in
comparison to other concurrent works on one-shot NAS
DADA: Differentiable Automatic Data Augmentation
Data augmentation (DA) techniques aim to increase data variability, and thus
train deep networks with better generalisation. The pioneering AutoAugment
automated the search for optimal DA policies with reinforcement learning.
However, AutoAugment is extremely computationally expensive, limiting its wide
applicability. Followup works such as Population Based Augmentation (PBA) and
Fast AutoAugment improved efficiency, but their optimization speed remains a
bottleneck. In this paper, we propose Differentiable Automatic Data
Augmentation (DADA) which dramatically reduces the cost. DADA relaxes the
discrete DA policy selection to a differentiable optimization problem via
Gumbel-Softmax. In addition, we introduce an unbiased gradient estimator,
RELAX, leading to an efficient and effective one-pass optimization strategy to
learn an efficient and accurate DA policy. We conduct extensive experiments on
CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. Furthermore, we demonstrate
the value of Auto DA in pre-training for downstream detection problems. Results
show our DADA is at least one order of magnitude faster than the
state-of-the-art while achieving very comparable accuracy. The code is
available at https://github.com/VDIGPKU/DADA
Improving One-shot NAS by Suppressing the Posterior Fading
There is a growing interest in automated neural architecture search (NAS). To
improve the efficiency of NAS, previous approaches adopt weight sharing method
to force all models share the same set of weights. However, it has been
observed that a model performing better with shared weights does not
necessarily perform better when trained alone. In this paper, we analyse
existing weight sharing one-shot NAS approaches from a Bayesian point of view
and identify the posterior fading problem, which compromises the effectiveness
of shared weights. To alleviate this problem, we present a practical approach
to guide the parameter posterior towards its true distribution. Moreover, a
hard latency constraint is introduced during the search so that the desired
latency can be achieved. The resulted method, namely Posterior Convergent NAS
(PC-NAS), achieves state-of-the-art performance under standard GPU latency
constraint on ImageNet. In our small search space, our model PC-NAS-S attains
76.8 % top-1 accuracy, 2.1% higher than MobileNetV2 (1.4x) with the same
latency. When adopted to the large search space, PC-NAS-L achieves 78.1 % top-1
accuracy within 11ms. The discovered architecture also transfers well to other
computer vision applications such as object detection and person
re-identification
Meta Approach to Data Augmentation Optimization
Data augmentation policies drastically improve the performance of image
recognition tasks, especially when the policies are optimized for the target
data and tasks. In this paper, we propose to optimize image recognition models
and data augmentation policies simultaneously to improve the performance using
gradient descent. Unlike prior methods, our approach avoids using proxy tasks
or reducing search space, and can directly improve the validation performance.
Our method achieves efficient and scalable training by approximating the
gradient of policies by implicit gradient with Neumann series approximation. We
demonstrate that our approach can improve the performance of various image
classification tasks, including ImageNet classification and fine-grained
recognition, without using dataset-specific hyperparameter tuning
Learning Data Augmentation with Online Bilevel Optimization for Image Classification
Data augmentation is a key practice in machine learning for improving
generalization performance. However, finding the best data augmentation
hyperparameters requires domain knowledge or a computationally demanding
search. We address this issue by proposing an efficient approach to
automatically train a network that learns an effective distribution of
transformations to improve its generalization. Using bilevel optimization, we
directly optimize the data augmentation parameters using a validation set. This
framework can be used as a general solution to learn the optimal data
augmentation jointly with an end task model like a classifier. Results show
that our joint training method produces an image classification accuracy that
is comparable to or better than carefully hand-crafted data augmentation. Yet,
it does not need an expensive external validation loop on the data augmentation
hyperparameters
PV-NAS: Practical Neural Architecture Search for Video Recognition
Recently, deep learning has been utilized to solve video recognition problem
due to its prominent representation ability. Deep neural networks for video
tasks is highly customized and the design of such networks requires domain
experts and costly trial and error tests. Recent advance in network
architecture search has boosted the image recognition performance in a large
margin. However, automatic designing of video recognition network is less
explored. In this study, we propose a practical solution, namely Practical
Video Neural Architecture Search (PV-NAS).Our PV-NAS can efficiently search
across tremendous large scale of architectures in a novel spatial-temporal
network search space using the gradient based search methods. To avoid sticking
into sub-optimal solutions, we propose a novel learning rate scheduler to
encourage sufficient network diversity of the searched models. Extensive
empirical evaluations show that the proposed PV-NAS achieves state-of-the-art
performance with much fewer computational resources. 1) Within light-weight
models, our PV-NAS-L achieves 78.7% and 62.5% Top-1 accuracy on Kinetics-400
and Something-Something V2, which are better than previous state-of-the-art
methods (i.e., TSM) with a large margin (4.6% and 3.4% on each dataset,
respectively), and 2) among median-weight models, our PV-NAS-M achieves the
best performance (also a new record)in the Something-Something V2 dataset
Improving Auto-Augment via Augmentation-Wise Weight Sharing
The recent progress on automatically searching augmentation policies has
boosted the performance substantially for various tasks. A key component of
automatic argumentation search is the evaluation process for a particular
augmentation policy, which is utilized to return reward and usually runs
thousands of times. A plain evaluation process, which includes full model
training and validation, would be time-consuming. To achieve efficiency, many
choose to sacrifice evaluation reliability for speed. In this paper, we dive
into the dynamics of augmented training of the model. This inspires us to
design a powerful and efficient proxy task based on the Augmentation-Wise
Weight Sharing (AWS) to form a fast yet accurate evaluation process in an
elegant way. Comprehensive analysis verifies the superiority of this approach
in terms of effectiveness and efficiency. The augmentation policies found by
our method achieve the best accuracy compared with existing auto-augmentation
search methods. On CIFAR-10, we achieve a top-1 error rate of 1.24%, which is
currently the best performing single model without extra training data. On
ImageNet, we get a top-1 error rate of 20.36% for ResNet-50, which leads to
3.34% absolute error rate reduction over the baseline augmentation
Direct Differentiable Augmentation Search
Data augmentation has been an indispensable tool to improve the performance
of deep neural networks, however the augmentation can hardly transfer among
different tasks and datasets. Consequently, a recent trend is to adopt AutoML
technique to learn proper augmentation policy without extensive hand-crafted
tuning. In this paper, we propose an efficient differentiable search algorithm
called Direct Differentiable Augmentation Search (DDAS). It exploits
meta-learning with one-step gradient update and continuous relaxation to the
expected training loss for efficient search. Our DDAS can achieve efficient
augmentation search without relying on approximations such as Gumbel Softmax or
second order gradient approximation. To further reduce the adverse effect of
improper augmentations, we organize the search space into a two level
hierarchy, in which we first decide whether to apply augmentation, and then
determine the specific augmentation policy. On standard image classification
benchmarks, our DDAS achieves state-of-the-art performance and efficiency
tradeoff while reducing the search cost dramatically, e.g. 0.15 GPU hours for
CIFAR-10. In addition, we also use DDAS to search augmentation for object
detection task and achieve comparable performance with AutoAugment, while being
1000x faster.Comment: ICCV202
AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain Adaptation
Neural network-based semantic segmentation has achieved remarkable results
when large amounts of annotated data are available, that is, in the supervised
case. However, such data is expensive to collect and so methods have been
developed to adapt models trained on related, often synthetic data for which
labels are readily available. Current adaptation approaches do not consider the
dependence of the generalization/transferability of these models on network
architecture. In this paper, we perform neural architecture search (NAS) to
provide architecture-level perspective and analysis for domain adaptation. We
identify the optimization gap that exists when searching architectures for
unsupervised domain adaptation which makes this NAS problem uniquely difficult.
We propose bridging this gap by using maximum mean discrepancy and regional
weighted entropy to estimate the accuracy metric. Experimental results on
several widely adopted benchmarks show that our proposed AutoAdapt framework
indeed discovers architectures that improve the performance of a number of
existing adaptation techniques.Comment: short version has been accepted at 1st NAS workshop co-organized with
CVPR 202
AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation
AutoAugment has sparked an interest in automated augmentation methods for
deep learning models. These methods estimate image transformation policies for
train data that improve generalization to test data. While recent papers
evolved in the direction of decreasing policy search complexity, we show that
those methods are not robust when applied to biased and noisy data. To overcome
these limitations, we reformulate AutoAugment as a generalized automated
dataset optimization (AutoDO) task that minimizes the distribution shift
between test data and distorted train dataset. In our AutoDO model, we
explicitly estimate a set of per-point hyperparameters to flexibly change
distribution of train data. In particular, we include hyperparameters for
augmentation, loss weights, and soft-labels that are jointly estimated using
implicit differentiation. We develop a theoretical probabilistic interpretation
of this framework using Fisher information and show that its complexity scales
linearly with the dataset size. Our experiments on SVHN, CIFAR-10/100, and
ImageNet classification show up to 9.3% improvement for biased datasets with
label noise compared to prior methods and, importantly, up to 36.6% gain for
underrepresented SVHN classes.Comment: Accepted to CVPR 2021. Preprin