44,523 research outputs found
Multi-Scale Dense Networks for Resource Efficient Image Classification
In this paper we investigate image classification with computational resource
limits at test time. Two such settings are: 1. anytime classification, where
the network's prediction for a test example is progressively updated,
facilitating the output of a prediction at any time; and 2. budgeted batch
classification, where a fixed amount of computation is available to classify a
set of examples that can be spent unevenly across "easier" and "harder" inputs.
In contrast to most prior work, such as the popular Viola and Jones algorithm,
our approach is based on convolutional neural networks. We train multiple
classifiers with varying resource demands, which we adaptively apply during
test time. To maximally re-use computation between the classifiers, we
incorporate them as early-exits into a single deep convolutional neural network
and inter-connect them with dense connectivity. To facilitate high quality
classification early on, we use a two-dimensional multi-scale network
architecture that maintains coarse and fine level features all-throughout the
network. Experiments on three image-classification tasks demonstrate that our
framework substantially improves the existing state-of-the-art in both
settings
Resolution Adaptive Networks for Efficient Inference
Adaptive inference is an effective mechanism to achieve a dynamic tradeoff
between accuracy and computational cost in deep networks. Existing works mainly
exploit architecture redundancy in network depth or width. In this paper, we
focus on spatial redundancy of input samples and propose a novel Resolution
Adaptive Network (RANet), which is inspired by the intuition that
low-resolution representations are sufficient for classifying "easy" inputs
containing large objects with prototypical features, while only some "hard"
samples need spatially detailed information. In RANet, the input images are
first routed to a lightweight sub-network that efficiently extracts
low-resolution representations, and those samples with high prediction
confidence will exit early from the network without being further processed.
Meanwhile, high-resolution paths in the network maintain the capability to
recognize the "hard" samples. Therefore, RANet can effectively reduce the
spatial redundancy involved in inferring high-resolution inputs. Empirically,
we demonstrate the effectiveness of the proposed RANet on the CIFAR-10,
CIFAR-100 and ImageNet datasets in both the anytime prediction setting and the
budgeted batch classification setting.Comment: CVPR 202
DSOD: Learning Deeply Supervised Object Detectors from Scratch
We present Deeply Supervised Object Detector (DSOD), a framework that can
learn object detectors from scratch. State-of-the-art object objectors rely
heavily on the off-the-shelf networks pre-trained on large-scale classification
datasets like ImageNet, which incurs learning bias due to the difference on
both the loss functions and the category distributions between classification
and detection tasks. Model fine-tuning for the detection task could alleviate
this bias to some extent but not fundamentally. Besides, transferring
pre-trained models from classification to detection between discrepant domains
is even more difficult (e.g. RGB to depth images). A better solution to tackle
these two critical problems is to train object detectors from scratch, which
motivates our proposed DSOD. Previous efforts in this direction mostly failed
due to much more complicated loss functions and limited training data in object
detection. In DSOD, we contribute a set of design principles for training
object detectors from scratch. One of the key findings is that deep
supervision, enabled by dense layer-wise connections, plays a critical role in
learning a good detector. Combining with several other principles, we develop
DSOD following the single-shot detection (SSD) framework. Experiments on PASCAL
VOC 2007, 2012 and MS COCO datasets demonstrate that DSOD can achieve better
results than the state-of-the-art solutions with much more compact models. For
instance, DSOD outperforms SSD on all three benchmarks with real-time detection
speed, while requires only 1/2 parameters to SSD and 1/10 parameters to Faster
RCNN. Our code and models are available at: https://github.com/szq0214/DSOD .Comment: ICCV 2017. Code and models are available at:
https://github.com/szq0214/DSO
Seesaw-Net: Convolution Neural Network With Uneven Group Convolution
In this paper, we are interested in boosting the representation capability of
convolution neural networks which utilizing the inverted residual structure.
Based on the success of Inverted Residual structure[Sandler et al. 2018] and
Interleaved Low-Rank Group Convolutions[Sun et al. 2018], we rethink this two
pattern of neural network structure, rather than NAS(Neural architecture
search) method[Zoph and Le 2017; Pham et al. 2018; Liu et al. 2018b], we
introduce uneven point-wise group convolution, which provide a novel search
space for designing basic blocks to obtain better trade-off between
representation capability and computational cost. Meanwhile, we propose two
novel information flow patterns that will enable cross-group information flow
for multiple group convolution layers with and without any channel
permute/shuffle operation. Dense experiments on image classification task show
that our proposed model, named Seesaw-Net, achieves state-of-the-art(SOTA)
performance with limited computation and memory cost. Our code will be
open-source and available together with pre-trained models
Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource Constraints
Deep learning models have been used to support analytics beyond simple
aggregation, where deeper and wider models have been shown to yield great
results. These models consume a huge amount of memory and computational
operations. However, most of the large-scale industrial applications are often
computational budget constrained. In practice, the peak workload of inference
service could be 10x higher than the average cases, with the presence of
unpredictable extreme cases. Lots of computational resources could be wasted
during off-peak hours and the system may crash when the workload exceeds system
capacity. How to support deep learning services with dynamic workload
cost-efficiently remains a challenging problem. In this paper, we address the
challenge with a general and novel training scheme called model slicing, which
enables deep learning models to provide predictions within the prescribed
computational resource budget dynamically. Model slicing could be viewed as an
elastic computation solution without requiring more computational resources.
Succinctly, each layer in the model is divided into groups of contiguous block
of basic components (i.e. neurons in dense layers and channels in convolutional
layers), and then partially ordered relation is introduced to these groups by
enforcing that groups participated in each forward pass always starts from the
first group to the dynamically-determined rightmost group. Trained by
dynamically indexing the rightmost group with a single parameter slice rate,
the network is engendered to build up group-wise and residual representation.
Then during inference, a sub-model with fewer groups can be readily deployed
for efficiency whose computation is roughly quadratic to the width controlled
by the slice rate. Extensive experiments show that models trained with model
slicing can effectively support on-demand workload with elastic inference cost.Comment: 14 pages, 8 figures. arXiv admin note: text overlap with
arXiv:1706.02093 by other author
Object Detection from Scratch with Deep Supervision
We propose Deeply Supervised Object Detectors (DSOD), an object detection
framework that can be trained from scratch. Recent advances in object detection
heavily depend on the off-the-shelf models pre-trained on large-scale
classification datasets like ImageNet and OpenImage. However, one problem is
that adopting pre-trained models from classification to detection task may
incur learning bias due to the different objective function and diverse
distributions of object categories. Techniques like fine-tuning on detection
task could alleviate this issue to some extent but are still not fundamental.
Furthermore, transferring these pre-trained models across discrepant domains
will be more difficult (e.g., from RGB to depth images). Thus, a better
solution to handle these critical problems is to train object detectors from
scratch, which motivates our proposed method. Previous efforts on this
direction mainly failed by reasons of the limited training data and naive
backbone network structures for object detection. In DSOD, we contribute a set
of design principles for learning object detectors from scratch. One of the key
principles is the deep supervision, enabled by layer-wise dense connections in
both backbone networks and prediction layers, plays a critical role in learning
good detectors from scratch. After involving several other principles, we build
our DSOD based on the single-shot detection framework (SSD). We evaluate our
method on PASCAL VOC 2007, 2012 and COCO datasets. DSOD achieves consistently
better results than the state-of-the-art methods with much more compact models.
Specifically, DSOD outperforms baseline method SSD on all three benchmarks,
while requiring only 1/2 parameters. We also observe that DSOD can achieve
comparable/slightly better results than Mask RCNN + FPN (under similar input
size) with only 1/3 parameters, using no extra data or pre-trained models.Comment: More results and analysis in this version. This is an extension of
our previous conference paper: arXiv:1708.0124
Training CNNs with Selective Allocation of Channels
Recent progress in deep convolutional neural networks (CNNs) have enabled a
simple paradigm of architecture design: larger models typically achieve better
accuracy. Due to this, in modern CNN architectures, it becomes more important
to design models that generalize well under certain resource constraints, e.g.
the number of parameters. In this paper, we propose a simple way to improve the
capacity of any CNN model having large-scale features, without adding more
parameters. In particular, we modify a standard convolutional layer to have a
new functionality of channel-selectivity, so that the layer is trained to
select important channels to re-distribute their parameters. Our experimental
results under various CNN architectures and datasets demonstrate that the
proposed new convolutional layer allows new optima that generalize better via
efficient resource utilization, compared to the baseline.Comment: 15 pages; Accepted to ICML 201
Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network
Acoustic scene classification is an intricate problem for a machine. As an
emerging field of research, deep Convolutional Neural Networks (CNN) achieve
convincing results. In this paper, we explore the use of multi-scale Dense
connected convolutional neural network (DenseNet) for the classification task,
with the goal to improve the classification performance as multi-scale features
can be extracted from the time-frequency representation of the audio signal. On
the other hand, most of previous CNN-based audio scene classification
approaches aim to improve the classification accuracy, by employing different
regularization techniques, such as the dropout of hidden units and data
augmentation, to reduce overfitting. It is widely known that outliers in the
training set have a high negative influence on the trained model, and culling
the outliers may improve the classification performance, while it is often
under-explored in previous studies. In this paper, inspired by the silence
removal in the speech signal processing, a novel sample dropout approach is
proposed, which aims to remove outliers in the training dataset. Using the
DCASE 2017 audio scene classification datasets, the experimental results
demonstrates the proposed multi-scale DenseNet providing a superior performance
than the traditional single-scale DenseNet, while the sample dropout method can
further improve the classification robustness of multi-scale DenseNet.Comment: Accepted to 2018 Pacific Rim Knowledge Acquisition Workshop (PKAW
NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
Current state-of-the-art convolutional architectures for object detection are
manually designed. Here we aim to learn a better architecture of feature
pyramid network for object detection. We adopt Neural Architecture Search and
discover a new feature pyramid architecture in a novel scalable search space
covering all cross-scale connections. The discovered architecture, named
NAS-FPN, consists of a combination of top-down and bottom-up connections to
fuse features across scales. NAS-FPN, combined with various backbone models in
the RetinaNet framework, achieves better accuracy and latency tradeoff compared
to state-of-the-art object detection models. NAS-FPN improves mobile detection
accuracy by 2 AP compared to state-of-the-art SSDLite with MobileNetV2 model in
[32] and achieves 48.3 AP which surpasses Mask R-CNN [10] detection accuracy
with less computation time.Comment: Accepted at CVPR 201
Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages
Object detection has made great progress in the past few years along with the
development of deep learning. However, most current object detection methods
are resource hungry, which hinders their wide deployment to many resource
restricted usages such as usages on always-on devices, battery-powered low-end
devices, etc. This paper considers the resource and accuracy trade-off for
resource-restricted usages during designing the whole object detection
framework. Based on the deeply supervised object detection (DSOD) framework, we
propose Tiny-DSOD dedicating to resource-restricted usages. Tiny-DSOD
introduces two innovative and ultra-efficient architecture blocks: depthwise
dense block (DDB) based backbone and depthwise feature-pyramid-network (D-FPN)
based front-end. We conduct extensive experiments on three famous benchmarks
(PASCAL VOC 2007, KITTI, and COCO), and compare Tiny-DSOD to the
state-of-the-art ultra-efficient object detection solutions such as Tiny-YOLO,
MobileNet-SSD (v1 & v2), SqueezeDet, Pelee, etc. Results show that Tiny-DSOD
outperforms these solutions in all the three metrics (parameter-size, FLOPs,
accuracy) in each comparison. For instance, Tiny-DSOD achieves 72.1% mAP with
only 0.95M parameters and 1.06B FLOPs, which is by far the state-of-the-arts
result with such a low resource requirement.Comment: 12 pages, 3 figures, accepted by BMVC 201
- …