11 research outputs found
A Multi-Dimensional Pruning Framework Based on Double Evaluation Mechanism
In this article, we propose EMNAPE, a multi-dimensional pruning framework to collaboratively prune the three dimensions (depth, width, and resolution) of convolutional neural networks (CNNs) for better execution efficiency on embedded hardware. In EMNAPE, we first introduce a two-stage importance evaluation framework, which efficiently and comprehensively evaluates each pruning unit according to both the local importance inside each dimension and the global importance across different dimensions. Based on the evaluation framework, we present a heuristic pruning algorithm to progressively prune the three dimensions of CNNs toward the optimal trade-off between accuracy and efficiency. Experiments on multiple benchmarks validate the advantages of EMNAPE over existing state-of-the-art (SOTA) approaches
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Dynamic computation has emerged as a promising avenue to enhance the
inference efficiency of deep networks. It allows selective activation of
computational units, leading to a reduction in unnecessary computations for
each input sample. However, the actual efficiency of these dynamic models can
deviate from theoretical predictions. This mismatch arises from: 1) the lack of
a unified approach due to fragmented research; 2) the focus on algorithm design
over critical scheduling strategies, especially in CUDA-enabled GPU contexts;
and 3) challenges in measuring practical latency, given that most libraries
cater to static operations. Addressing these issues, we unveil the
Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates
three primary dynamic paradigms-spatially adaptive computation, dynamic layer
skipping, and dynamic channel skipping. To bridge the theoretical and practical
efficiency gap, LAUDNet merges algorithmic design with scheduling optimization,
guided by a latency predictor that accurately gauges dynamic operator latency.
We've tested LAUDNet across multiple vision tasks, demonstrating its capacity
to notably reduce the latency of models like ResNet-101 by over 50% on
platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in
balancing accuracy and efficiency. Code is available at:
https://www.github.com/LeapLabTHU/LAUDNet
Zero Time Waste: Recycling Predictions in Early Exit Neural Networks
The problem of reducing processing time of large deep learning models is a
fundamental challenge in many real-world applications. Early exit methods
strive towards this goal by attaching additional Internal Classifiers (ICs) to
intermediate layers of a neural network. ICs can quickly return predictions for
easy examples and, as a result, reduce the average inference time of the whole
model. However, if a particular IC does not decide to return an answer early,
its predictions are discarded, with its computations effectively being wasted.
To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in
which each IC reuses predictions returned by its predecessors by (1) adding
direct connections between ICs and (2) combining previous outputs in an
ensemble-like manner. We conduct extensive experiments across various datasets
and architectures to demonstrate that ZTW achieves a significantly better
accuracy vs. inference time trade-off than other recently proposed early exit
methods.Comment: Accepted at NeurIPS 202
Glance and Focus Networks for Dynamic Visual Recognition
Spatial redundancy widely exists in visual recognition tasks, i.e.,
discriminative features in an image or video frame usually correspond to only a
subset of pixels, while the remaining regions are irrelevant to the task at
hand. Therefore, static models which process all the pixels with an equal
amount of computation result in considerable redundancy in terms of time and
space consumption. In this paper, we formulate the image recognition problem as
a sequential coarse-to-fine feature learning process, mimicking the human
visual system. Specifically, the proposed Glance and Focus Network (GFNet)
first extracts a quick global representation of the input image at a low
resolution scale, and then strategically attends to a series of salient (small)
regions to learn finer features. The sequential process naturally facilitates
adaptive inference at test time, as it can be terminated once the model is
sufficiently confident about its prediction, avoiding further redundant
computation. It is worth noting that the problem of locating discriminant
regions in our model is formulated as a reinforcement learning task, thus
requiring no additional manual annotations other than classification labels.
GFNet is general and flexible as it is compatible with any off-the-shelf
backbone models (such as MobileNets, EfficientNets and TSM), which can be
conveniently deployed as the feature extractor. Extensive experiments on a
variety of image classification and video recognition tasks and with various
backbone models demonstrate the remarkable efficiency of our method. For
example, it reduces the average latency of the highly efficient MobileNet-V3 on
an iPhone XS Max by 1.3x without sacrificing accuracy. Code and pre-trained
models are available at https://github.com/blackfeather-wang/GFNet-Pytorch.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI). Journal version of arXiv:2010.05300 (NeurIPS 2020).
The first two authors contributed equall