223 research outputs found
Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
While increasingly deep networks are still in general desired for achieving
state-of-the-art performance, for many specific inputs a simpler network might
already suffice. Existing works exploited this observation by learning to skip
convolutional layers in an input-dependent manner. However, we argue their
binary decision scheme, i.e., either fully executing or completely bypassing
one layer for a specific input, can be enhanced by introducing finer-grained,
"softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS)
framework. The core idea of DFS is to hypothesize layer-wise quantization (to
different bitwidths) as intermediate "soft" choices to be made between fully
utilizing and skipping a layer. For each input, DFS dynamically assigns a
bitwidth to both weights and activations of each layer, where fully executing
and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero
bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive
power during input-adaptive inference, enabling finer-grained
accuracy-computational cost trade-offs. It presents a unified view to link
input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive
experimental results demonstrate the superior tradeoff between computational
cost and model expressive power (accuracy) achieved by DFS. More visualizations
also indicate a smooth and consistent transition in the DFS behaviors,
especially the learned choices between layer skipping and different
quantizations when the total computational budgets vary, validating our
hypothesis that layer quantization could be viewed as intermediate variants of
layer skipping. Our source code and supplementary material are available at
\link{https://github.com/Torment123/DFS}
Dual Dynamic Inference: Enabling More Efficient, Adaptive and Controllable Deep Inference
State-of-the-art convolutional neural networks (CNNs) yield record-breaking
predictive performance, yet at the cost of high-energy-consumption inference,
that prohibits their widely deployments in resource-constrained Internet of
Things (IoT) applications. We propose a dual dynamic inference (DDI) framework
that highlights the following aspects: 1) we integrate both input-dependent and
resource-dependent dynamic inference mechanisms under a unified framework in
order to fit the varying IoT resource requirements in practice. DDI is able to
both constantly suppress unnecessary costs for easy samples, and to halt
inference for all samples to meet hard resource constraints enforced; 2) we
propose a flexible multi-grained learning to skip (MGL2S) approach for
input-dependent inference which allows simultaneous layer-wise and channel-wise
skipping; 3) we extend DDI to complex CNN backbones such as DenseNet and show
that DDI can be applied towards optimizing any specific resource goals
including inference latency or energy cost. Extensive experiments demonstrate
the superior inference accuracy-resource trade-off achieved by DDI, as well as
the flexibility to control such trade-offs compared to existing peer methods.
Specifically, DDI can achieve up to 4 times computational savings with the same
or even higher accuracy as compared to existing competitive baselines
- …