Latency-aware Unified Dynamic Networks for Efficient Image Recognition

Han, Yizeng; Huang, Gao; Liu, Zeyu; Pu, Yifan; Song, Shiji; Wang, Chaofei; Yuan, Zhihang

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

Authors: Yizeng Han
Gao Huang
Zeyu Liu
Yifan Pu
Shiji Song
Chaofei Wang
Zhihang Yuan
Publication date: 2 September 2023
Publisher

Abstract

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.15949

Last time updated on 12/09/2023