3 research outputs found
Efficient Structured Pruning and Architecture Searching for Group Convolution
Efficient inference of Convolutional Neural Networks is a thriving topic
recently. It is desirable to achieve the maximal test accuracy under given
inference budget constraints when deploying a pre-trained model. Network
pruning is a commonly used technique while it may produce irregular sparse
models that can hardly gain actual speed-up. Group convolution is a promising
pruning target due to its regular structure; however, incorporating such
structure into the pruning procedure is challenging. It is because structural
constraints are hard to describe and can make pruning intractable to solve. The
need for configuring group convolution architecture, i.e., the number of
groups, that maximises test accuracy also increases difficulty.
This paper presents an efficient method to address this challenge. We
formulate group convolution pruning as finding the optimal channel permutation
to impose structural constraints and solve it efficiently by heuristics. We
also apply local search to exploring group configuration based on estimated
pruning cost to maximise test accuracy. Compared to prior work, results show
that our method produces competitive group convolution models for various tasks
within a shorter pruning period and enables rapid group configuration
exploration subject to inference budget constraints.Comment: Published as an ICCV'19 NEUARCH workshop pape
Collegial Ensembles
Modern neural network performance typically improves as model size increases.
A recent line of research on the Neural Tangent Kernel (NTK) of
over-parameterized networks indicates that the improvement with size increase
is a product of a better conditioned loss landscape. In this work, we
investigate a form of over-parameterization achieved through ensembling, where
we define collegial ensembles (CE) as the aggregation of multiple independent
models with identical architectures, trained as a single model. We show that
the optimization dynamics of CE simplify dramatically when the number of models
in the ensemble is large, resembling the dynamics of wide models, yet scale
much more favorably. We use recent theoretical results on the finite width
corrections of the NTK to perform efficient architecture search in a space of
finite width CE that aims to either minimize capacity, or maximize trainability
under a set of constraints. The resulting ensembles can be efficiently
implemented in practical architectures using group convolutions and block
diagonal layers. Finally, we show how our framework can be used to analytically
derive optimal group convolution modules originally found using expensive grid
searches, without having to train a single model
AutoML: A Survey of the State-of-the-Art
Deep learning (DL) techniques have penetrated all aspects of our lives and
brought us great convenience. However, building a high-quality DL system for a
specific task highly relies on human expertise, hindering the applications of
DL to more areas. Automated machine learning (AutoML) becomes a promising
solution to build a DL system without human assistance, and a growing number of
researchers focus on AutoML. In this paper, we provide a comprehensive and
up-to-date review of the state-of-the-art (SOTA) in AutoML. First, we introduce
AutoML methods according to the pipeline, covering data preparation, feature
engineering, hyperparameter optimization, and neural architecture search (NAS).
We focus more on NAS, as it is currently very hot sub-topic of AutoML. We
summarize the performance of the representative NAS algorithms on the CIFAR-10
and ImageNet datasets and further discuss several worthy studying directions of
NAS methods: one/two-stage NAS, one-shot NAS, and joint hyperparameter and
architecture optimization. Finally, we discuss some open problems of the
existing AutoML methods for future research.Comment: automated machine learning (AutoML), Submitted to Knowledge Based
Systems for revie