1,510 research outputs found
Interleaved Group Convolutions for Deep Neural Networks
In this paper, we present a simple and modularized neural network
architecture, named interleaved group convolutional neural networks (IGCNets).
The main point lies in a novel building block, a pair of two successive
interleaved group convolutions: primary group convolution and secondary group
convolution. The two group convolutions are complementary: (i) the convolution
on each partition in primary group convolution is a spatial convolution, while
on each partition in secondary group convolution, the convolution is a
point-wise convolution; (ii) the channels in the same secondary partition come
from different primary partitions. We discuss one representative advantage:
Wider than a regular convolution with the number of parameters and the
computation complexity preserved. We also show that regular convolutions, group
convolution with summation fusion, and the Xception block are special cases of
interleaved group convolutions. Empirical results over standard benchmarks,
CIFAR-, CIFAR-, SVHN and ImageNet demonstrate that our networks are
more efficient in using parameters and computation complexity with similar or
higher accuracy.Comment: To appear in ICCV 201
IGCV: Interleaved Structured Sparse Convolutional Neural Networks
In this paper, we study the problem of designing efficient convolutional
neural network architectures with the interest in eliminating the redundancy in
convolution kernels. In addition to structured sparse kernels, low-rank kernels
and the product of low-rank kernels, the product of structured sparse kernels,
which is a framework for interpreting the recently-developed interleaved group
convolutions (IGC) and its variants (e.g., Xception), has been attracting
increasing interests.
Motivated by the observation that the convolutions contained in a group
convolution in IGC can be further decomposed in the same manner, we present a
modularized building block, {IGCV:} interleaved structured sparse
convolutions. It generalizes interleaved group convolutions, which is composed
of two structured sparse kernels, to the product of more structured sparse
kernels, further eliminating the redundancy. We present the complementary
condition and the balance condition to guide the design of structured sparse
kernels, obtaining a balance among three aspects: model size, computation
complexity and classification accuracy. Experimental results demonstrate the
advantage on the balance among these three aspects compared to interleaved
group convolutions and Xception, and competitive performance compared to other
state-of-the-art architecture design methods.Comment: Accepted by CVPR 201
Selective Kernel Networks
In standard Convolutional Neural Networks (CNNs), the receptive fields of
artificial neurons in each layer are designed to share the same size. It is
well-known in the neuroscience community that the receptive field size of
visual cortical neurons are modulated by the stimulus, which has been rarely
considered in constructing CNNs. We propose a dynamic selection mechanism in
CNNs that allows each neuron to adaptively adjust its receptive field size
based on multiple scales of input information. A building block called
Selective Kernel (SK) unit is designed, in which multiple branches with
different kernel sizes are fused using softmax attention that is guided by the
information in these branches. Different attentions on these branches yield
different sizes of the effective receptive fields of neurons in the fusion
layer. Multiple SK units are stacked to a deep network termed Selective Kernel
Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show
that SKNet outperforms the existing state-of-the-art architectures with lower
model complexity. Detailed analyses show that the neurons in SKNet can capture
target objects with different scales, which verifies the capability of neurons
for adaptively adjusting their receptive field sizes according to the input.
The code and models are available at https://github.com/implus/SKNet.Comment: CVPR 201
Seesaw-Net: Convolution Neural Network With Uneven Group Convolution
In this paper, we are interested in boosting the representation capability of
convolution neural networks which utilizing the inverted residual structure.
Based on the success of Inverted Residual structure[Sandler et al. 2018] and
Interleaved Low-Rank Group Convolutions[Sun et al. 2018], we rethink this two
pattern of neural network structure, rather than NAS(Neural architecture
search) method[Zoph and Le 2017; Pham et al. 2018; Liu et al. 2018b], we
introduce uneven point-wise group convolution, which provide a novel search
space for designing basic blocks to obtain better trade-off between
representation capability and computational cost. Meanwhile, we propose two
novel information flow patterns that will enable cross-group information flow
for multiple group convolution layers with and without any channel
permute/shuffle operation. Dense experiments on image classification task show
that our proposed model, named Seesaw-Net, achieves state-of-the-art(SOTA)
performance with limited computation and memory cost. Our code will be
open-source and available together with pre-trained models
Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
Benefitted from its great success on many tasks, deep learning is
increasingly used on low-computational-cost devices, e.g. smartphone, embedded
devices, etc. To reduce the high computational and memory cost, in this work,
we propose a fully learnable group convolution module (FLGC for short) which is
quite efficient and can be embedded into any deep neural networks for
acceleration. Specifically, our proposed method automatically learns the group
structure in the training stage in a fully end-to-end manner, leading to a
better structure than the existing pre-defined, two-steps, or iterative
strategies. Moreover, our method can be further combined with depthwise
separable convolution, resulting in 5 times acceleration than the vanilla
Resnet50 on single CPU. An additional advantage is that in our FLGC the number
of groups can be set as any value, but not necessarily 2^k as in most existing
methods, meaning better tradeoff between accuracy and speed. As evaluated in
our experiments, our method achieves better performance than existing learnable
group convolution and standard group convolution when using the same number of
groups.Comment: Accepted by CVPR 201
Machine Learning with Clos Networks
We present a new methodology for improving the accuracy of small neural
networks by applying the concept of a clos network to achieve maximum
expression in a smaller network. We explore the design space to show that more
layers is beneficial, given the same number of parameters. We also present
findings on how the relu nonlinearity ffects accuracy in separable networks. We
present results on early work with Cifar-10 dataset
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
With a single eye fixation lasting a fraction of a second, the human visual
system is capable of forming a rich representation of a complex environment,
reaching a holistic understanding which facilitates object recognition and
detection. This phenomenon is known as recognizing the "gist" of the scene and
is accomplished by relying on relevant prior knowledge. This paper addresses
the analogous question of whether using memory in computer vision systems can
not only improve the accuracy of object detection in video streams, but also
reduce the computation time. By interleaving conventional feature extractors
with extremely lightweight ones which only need to recognize the gist of the
scene, we show that minimal computation is required to produce accurate
detections when temporal memory is present. In addition, we show that the
memory contains enough information for deploying reinforcement learning
algorithms to learn an adaptive inference policy. Our model achieves
state-of-the-art performance among mobile methods on the Imagenet VID 2015
dataset, while running at speeds of up to 70+ FPS on a Pixel 3 phone
VarGNet: Variable Group Convolutional Neural Network for Efficient Embedded Computing
In this paper, we propose a novel network design mechanism for efficient
embedded computing. Inspired by the limited computing patterns, we propose to
fix the number of channels in a group convolution, instead of the existing
practice that fixing the total group numbers. Our solution based network, named
Variable Group Convolutional Network (VarGNet), can be optimized easier on
hardware side, due to the more unified computing schemes among the layers.
Extensive experiments on various vision tasks, including classification,
detection, pixel-wise parsing and face recognition, have demonstrated the
practical value of our VarGNet.Comment: Technical repor
Deep Scale-spaces: Equivariance Over Scale
We introduce deep scale-spaces (DSS), a generalization of convolutional
neural networks, exploiting the scale symmetry structure of conventional image
recognition tasks. Put plainly, the class of an image is invariant to the scale
at which it is viewed. We construct scale equivariant cross-correlations based
on a principled extension of convolutions, grounded in the theory of
scale-spaces and semigroups. As a very basic operation, these
cross-correlations can be used in almost any modern deep learning architecture
in a plug-and-play manner. We demonstrate our networks on the Patch Camelyon
and Cityscapes datasets, to prove their utility and perform introspective
studies to further understand their properties
IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks
In this paper, we are interested in building lightweight and efficient
convolutional neural networks. Inspired by the success of two design patterns,
composition of structured sparse kernels, e.g., interleaved group convolutions
(IGC), and composition of low-rank kernels, e.g., bottle-neck modules, we study
the combination of such two design patterns, using the composition of
structured sparse low-rank kernels, to form a convolutional kernel. Rather than
introducing a complementary condition over channels, we introduce a loose
complementary condition, which is formulated by imposing the complementary
condition over super-channels, to guide the design for generating a dense
convolutional kernel. The resulting network is called IGCV3. We empirically
demonstrate that the combination of low-rank and sparse kernels boosts the
performance and the superiority of our proposed approach to the
state-of-the-arts, IGCV2 and MobileNetV2 over image classification on CIFAR and
ImageNet and object detection on COCO.Comment: 10 pages, 2 figures, accepted by BMVC 201
- …