8 research outputs found
Learning Where To Look -- Generative NAS is Surprisingly Efficient
The efficient, automated search for well-performing neural architectures
(NAS) has drawn increasing attention in the recent past. Thereby, the
predominant research objective is to reduce the necessity of costly evaluations
of neural architectures while efficiently exploring large search spaces. To
this aim, surrogate models embed architectures in a latent space and predict
their performance, while generative models for neural architectures enable
optimization-based search within the latent space the generator draws from.
Both, surrogate and generative models, have the aim of facilitating
query-efficient search in a well-structured latent space. In this paper, we
further improve the trade-off between query-efficiency and promising
architecture generation by leveraging advantages from both, efficient surrogate
models and generative design. To this end, we propose a generative model,
paired with a surrogate predictor, that iteratively learns to generate samples
from increasingly promising latent subspaces. This approach leads to very
effective and efficient architecture search, while keeping the query amount
low. In addition, our approach allows in a straightforward manner to jointly
optimize for multiple objectives such as accuracy and hardware latency. We show
the benefit of this approach not only w.r.t. the optimization of architectures
for highest classification accuracy but also in the context of hardware
constraints and outperform state-of-the-art methods on several NAS benchmarks
for single and multiple objectives. We also achieve state-of-the-art
performance on ImageNet. The code is available at
http://github.com/jovitalukasik/AG-Net .Comment: Accepted to European Conference on Computer Vision 202
DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity
Differentiable architecture search (DARTS) is a widely researched tool for
the discovery of novel architectures, due to its promising results for image
classification. The main benefit of DARTS is the effectiveness achieved through
the weight-sharing one-shot paradigm, which allows efficient architecture
search. In this work, we investigate DARTS in a systematic case study of
inverse problems, which allows us to analyze these potential benefits in a
controlled manner. We demonstrate that the success of DARTS can be extended
from image classification to signal reconstruction, in principle. However, our
experiments also expose three fundamental difficulties in the evaluation of
DARTS-based methods in inverse problems: First, the results show a large
variance in all test cases. Second, the final performance is highly dependent
on the hyperparameters of the optimizer. And third, the performance of the
weight-sharing architecture used during training does not reflect the final
performance of the found architecture well. Thus, we conclude the necessity to
1) report the results of any DARTS-based methods from several runs along with
its underlying performance statistics, 2) show the correlation of the training
and final architecture performance, and 3) carefully consider if the
computational efficiency of DARTS outweighs the costs of hyperparameter
optimization and multiple runs.Comment: 11 pages, 5 figures. First two and last two authors contributed each
equall
Topology Learning for Prediction, Generation, and Robustness in Neural Architecture Search
In recent years, deep learning with Convolutional Neural Networks has become the key for success in computer vision tasks. However, designing new architectures is compute-intensive and a tedious trial-and-error process, which depends on human expert knowledge. Neural Architecture Search (NAS) builds on this problem by automating the architecture design process to find high-performing architectures. Yet, initial approaches in NAS rely on training and evaluating thousands of networks, resulting in compute-intensive search times. In this thesis, we introduce efficient search methods which overcome the heavy search time. First, we focus on presenting a surrogate model to predict the performance of architectures. Significantly, this surrogate model is able to predict the performance of architectures with a topology, which was not seen during training, i.e., our proposed model can extrapolate into unseen regions.
In the second part, we introduce two generative architecture search approaches. The first one is based on a variational autoencoder, which enables to search for architectures directly in the generated latent space, with the ability to generate the found architectures back to its discrete architecture topology. The second approach improves on the former and facilitates a simple generation model, which is furthermore coupled with a surrogate model to search for architectures directly. In addition, we optimize the latent space itself for a direct generation of high-performing architectures. The third part of this thesis analyzes the widely used differentiable one-shot method DARTS, with the questions, is this method indeed an efficient search method, and how sensitive is this method to domain shifts, hyperparameters, and initializations?
Lastly, we pave the way for robustness in NAS research. We introduce a dataset for architecture design and robustness, which evaluates one complete NAS search space against adversarial attacks and corruptions and thus allows for an in-depth analysis of the architectural design to improve its robustness only by its topology
Improving Native CNN Robustness with Filter Frequency Regularization
Neural networks tend to overfit the training distribution and perform poorly on out-ofdistribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we study the frequencies in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision