8 research outputs found

    Learning Where To Look -- Generative NAS is Surprisingly Efficient

    Full text link
    The efficient, automated search for well-performing neural architectures (NAS) has drawn increasing attention in the recent past. Thereby, the predominant research objective is to reduce the necessity of costly evaluations of neural architectures while efficiently exploring large search spaces. To this aim, surrogate models embed architectures in a latent space and predict their performance, while generative models for neural architectures enable optimization-based search within the latent space the generator draws from. Both, surrogate and generative models, have the aim of facilitating query-efficient search in a well-structured latent space. In this paper, we further improve the trade-off between query-efficiency and promising architecture generation by leveraging advantages from both, efficient surrogate models and generative design. To this end, we propose a generative model, paired with a surrogate predictor, that iteratively learns to generate samples from increasingly promising latent subspaces. This approach leads to very effective and efficient architecture search, while keeping the query amount low. In addition, our approach allows in a straightforward manner to jointly optimize for multiple objectives such as accuracy and hardware latency. We show the benefit of this approach not only w.r.t. the optimization of architectures for highest classification accuracy but also in the context of hardware constraints and outperform state-of-the-art methods on several NAS benchmarks for single and multiple objectives. We also achieve state-of-the-art performance on ImageNet. The code is available at http://github.com/jovitalukasik/AG-Net .Comment: Accepted to European Conference on Computer Vision 202

    DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

    Full text link
    Differentiable architecture search (DARTS) is a widely researched tool for the discovery of novel architectures, due to its promising results for image classification. The main benefit of DARTS is the effectiveness achieved through the weight-sharing one-shot paradigm, which allows efficient architecture search. In this work, we investigate DARTS in a systematic case study of inverse problems, which allows us to analyze these potential benefits in a controlled manner. We demonstrate that the success of DARTS can be extended from image classification to signal reconstruction, in principle. However, our experiments also expose three fundamental difficulties in the evaluation of DARTS-based methods in inverse problems: First, the results show a large variance in all test cases. Second, the final performance is highly dependent on the hyperparameters of the optimizer. And third, the performance of the weight-sharing architecture used during training does not reflect the final performance of the found architecture well. Thus, we conclude the necessity to 1) report the results of any DARTS-based methods from several runs along with its underlying performance statistics, 2) show the correlation of the training and final architecture performance, and 3) carefully consider if the computational efficiency of DARTS outweighs the costs of hyperparameter optimization and multiple runs.Comment: 11 pages, 5 figures. First two and last two authors contributed each equall

    Topology Learning for Prediction, Generation, and Robustness in Neural Architecture Search

    Full text link
    In recent years, deep learning with Convolutional Neural Networks has become the key for success in computer vision tasks. However, designing new architectures is compute-intensive and a tedious trial-and-error process, which depends on human expert knowledge. Neural Architecture Search (NAS) builds on this problem by automating the architecture design process to find high-performing architectures. Yet, initial approaches in NAS rely on training and evaluating thousands of networks, resulting in compute-intensive search times. In this thesis, we introduce efficient search methods which overcome the heavy search time. First, we focus on presenting a surrogate model to predict the performance of architectures. Significantly, this surrogate model is able to predict the performance of architectures with a topology, which was not seen during training, i.e., our proposed model can extrapolate into unseen regions. In the second part, we introduce two generative architecture search approaches. The first one is based on a variational autoencoder, which enables to search for architectures directly in the generated latent space, with the ability to generate the found architectures back to its discrete architecture topology. The second approach improves on the former and facilitates a simple generation model, which is furthermore coupled with a surrogate model to search for architectures directly. In addition, we optimize the latent space itself for a direct generation of high-performing architectures. The third part of this thesis analyzes the widely used differentiable one-shot method DARTS, with the questions, is this method indeed an efficient search method, and how sensitive is this method to domain shifts, hyperparameters, and initializations? Lastly, we pave the way for robustness in NAS research. We introduce a dataset for architecture design and robustness, which evaluates one complete NAS search space against adversarial attacks and corruptions and thus allows for an in-depth analysis of the architectural design to improve its robustness only by its topology

    Improving Native CNN Robustness with Filter Frequency Regularization

    No full text
    Neural networks tend to overfit the training distribution and perform poorly on out-ofdistribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we study the frequencies in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision
    corecore