Search CORE

282 research outputs found

Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

Author: Bragin Mikhail A.
Ding Caiwen
Gurevin Deniz
Miao Fei
Pepin Lynn
Zhou Shanglin
Publication venue
Publication date: 08/04/2023
Field of study

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline significantly increases the overall training time. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation, which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art MLP-Mixer, Swin Transformer, and VGG-16, ResNet-18, ResNet-50 and ResNet-110, MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, KITTI benchmark, and TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy

3\times

faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges

2\times

faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hard-pruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model's accuracy.Comment: arXiv admin note: text overlap with arXiv:2012.1007

arXiv.org e-Print Archive

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

Author: Gong Yifan
Jayaweera Malith
Kaeli David
Lin Xue
Niu Wei
Ren Bin
Wang Yanzhi
Wu Yushu
Yuan Geng
Zhan Zheng
Zhang Tianyun
Zhao Pu
Publication venue
Publication date: 14/02/2023
Field of study

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deployment of SISR tasks on mobile, we combine neural architecture search with pruning search and propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement. To decrease the search cost, we leverage the weight sharing strategy by introducing a supernet and decouple the search problem into three stages, including supernet construction, compiler-aware architecture and pruning search, and compiler-aware pruning ratio search. With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality (in terms of PSNR and SSIM) on mobile platforms (Samsung Galaxy S20)

arXiv.org e-Print Archive

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

Author: Cai Yuxuan
Li Hongjia
Li Yanyu
Niu Wei
Ren Bin
Tang Xulong
Wang Yanzhi
Yuan Geng
Publication venue
Publication date: 30/12/2020
Field of study

The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14

\times

compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5

\times

speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications