6 research outputs found
PR-DARTS: Pruning-Based Differentiable Architecture Search
The deployment of Convolutional Neural Networks (CNNs) on edge devices is
hindered by the substantial gap between performance requirements and available
processing power. While recent research has made large strides in developing
network pruning methods for reducing the computing overhead of CNNs, there
remains considerable accuracy loss, especially at high pruning ratios.
Questioning that the architectures designed for non-pruned networks might not
be effective for pruned networks, we propose to search architectures for
pruning methods by defining a new search space and a novel search objective. To
improve the generalization of the pruned networks, we propose two novel
PrunedConv and PrunedLinear operations. Specifically, these operations mitigate
the problem of unstable gradients by regularizing the objective function of the
pruned networks. The proposed search objective enables us to train architecture
parameters regarding the pruned weight elements. Quantitative analyses
demonstrate that our searched architectures outperform those used in the
state-of-the-art pruning networks on CIFAR-10 and ImageNet. In terms of
hardware effectiveness, PR-DARTS increases MobileNet-v2's accuracy from 73.44%
to 81.35% (+7.91% improvement) and runs 3.87 faster.Comment: 18 pages with 11 figure
BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search
Over the past half-decade, many methods have been considered for neural
architecture search (NAS). Bayesian optimization (BO), which has long had
success in hyperparameter optimization, has recently emerged as a very
promising strategy for NAS when it is coupled with a neural predictor. Recent
work has proposed different instantiations of this framework, for example,
using Bayesian neural networks or graph convolutional networks as the
predictive model within BO. However, the analyses in these papers often focus
on the full-fledged NAS algorithm, so it is difficult to tell which individual
components of the framework lead to the best performance.
In this work, we give a thorough analysis of the "BO + neural predictor"
framework by identifying five main components: the architecture encoding,
neural predictor, uncertainty calibration method, acquisition function, and
acquisition optimization strategy. We test several different methods for each
component and also develop a novel path-based encoding scheme for neural
architectures, which we show theoretically and empirically scales better than
other encodings. Using all of our analyses, we develop a final algorithm called
BANANAS, which achieves state-of-the-art performance on NAS search spaces. We
adhere to the NAS research checklist (Lindauer and Hutter 2019) to facilitate
best practices, and our code is available at
https://github.com/naszilla/naszilla
Empirics-based Line Searches for Deep Learning
This dissertation takes an empirically based perspective on optimization in deep learning.
It is motivated by the lack of empirical understanding of the loss landscape's properties for typical deep learning tasks and a lack of understanding of why and how optimization approaches work for such tasks. We solidified the empirical understanding of stochastic loss landscapes to bring color to these white areas on the scientific map with empiric observations. Based on these observations, we introduce understandable line search approaches that compete with and, in many cases outperform, state-of-the-art line search approaches introduced for the deep learning field.
This work includes a comprehensive introduction to optimization focusing on line searches in the deep learning field. Based on and guided by this introduction, empirical observations of typical image-classification benchmark tasks' loss landscapes are presented. Further, observations of how optimizers perform and move on such loss landscapes are given. From these observations, the line search approaches Parabolic Approximation Line Search (PAL) and Large Batch Parabolic Approximation Line Search (LABPAL) are derived. In particular, the latter method outperforms all competing line searches in this field in most cases. Furthermore, these observations reveal that well-tuned Stochastic Gradient Descent is already well approximating an almost exact line search, which in parts explains why it is so hard to beat.
Given the empirical observations made, it is straightforward to comprehend why and how our optimization approaches work. This contrasts the methodology of many optimization papers in this field which builds upon non-empirically justified theoretical assumptions.
Consequently, a general contribution of this work is that it justifies and demonstrates the importance of empirical work in this rather theoretical field