145 research outputs found
Multi-layer pruning framework for compressing Single Shot MultiBox detector
We propose a framework for compressing state-of-the-art Single Shot MultiBox Detector (SSD). The framework addresses compression in the following stages: Sparsity Induction, Filter Selection, and Filter Pruning. In the Sparsity Induction stage, the object detector model is sparsified via an improved global threshold. In Filter Selection & Pruning stage, we select and remove filters using sparsity statistics of filter weights in two consecutive convolutional layers. This results in the model with the size smaller than most existing compact architectures. We evaluate the performance of our framework with multiple datasets and compare over multiple methods. Experimental results show that our method achieves state-of-the-art compression of 6.7X and 4.9X on PASCAL VOC dataset on models SSD300 and SSD512 respectively. We further show that the method produces maximum compression of 26X with SSD512 on German Traffic Sign Detection Benchmark (GTSDB). Additionally, we also empirically show our method’s adaptability for classification based architecture VGG16 on datasets CIFAR and German Traffic Sign Recognition Benchmark (GTSRB) achieving a compression rate of 125X and 200X with the reduction in flops by 90.50% and 96.6% respectively with no loss of accuracy. In addition to this, our method does not require any special libraries or hardware support for the resulting compressed models.</p
Real-time human detection for electricity conservation using pruned-SSD and arduino
Electricity conservation techniques have gained more importance in recent years. Many smart techniques are invented to save electricity with the help of assisted devices like sensors. Though it saves electricity, it adds an additional sensor cost to the system. This work aims to develop a system that manages the electric power supply, only when it is actually needed i.e., the system enables the power supply when a human is present in the location and disables it otherwise. The system avoids any additional costs by using the closed circuit television, which is installed in most of the places for security reasons. Human detection is done by a Modified-single shot detection with a specific hyperparameter tuning method. Further the model is pruned to reduce the computational cost of the framework which in turn reduces the processing speed of the network drastically. The model yields the output to the Arduino micro-controller to enable the power supply in and around the location only when a human is detected and disables it when the human exits. The model is evaluated on CHOKEPOINT dataset and real-time video surveillance footage. Experimental results have shown an average accuracy of 85.82% with 2.1 seconds of processing time per frame
Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
When approaching a novel visual recognition problem in a specialized image
domain, a common strategy is to start with a pre-trained deep neural network
and fine-tune it to the specialized domain. If the target domain covers a
smaller visual space than the source domain used for pre-training (e.g.
ImageNet), the fine-tuned network is likely to be over-parameterized. However,
applying network pruning as a post-processing step to reduce the memory
requirements has drawbacks: fine-tuning and pruning are performed
independently; pruning parameters are set once and cannot adapt over time; and
the highly parameterized nature of state-of-the-art pruning methods make it
prohibitive to manually search the pruning parameter space for deep networks,
leading to coarse approximations. We propose a principled method for jointly
fine-tuning and compressing a pre-trained convolutional network that overcomes
these limitations. Experiments on two specialized image domains (remote sensing
images and describable textures) demonstrate the validity of the proposed
approach.Comment: BMVC 2017 ora
Leveraging Filter Correlations for Deep Model Compression
We present a filter correlation based model compression approach for deep
convolutional neural networks. Our approach iteratively identifies pairs of
filters with the largest pairwise correlations and drops one of the filters
from each such pair. However, instead of discarding one of the filters from
each such pair na\"{i}vely, the model is re-optimized to make the filters in
these pairs maximally correlated, so that discarding one of the filters from
the pair results in minimal information loss. Moreover, after discarding the
filters in each round, we further finetune the model to recover from the
potential small loss incurred by the compression. We evaluate our proposed
approach using a comprehensive set of experiments and ablation studies. Our
compression method yields state-of-the-art FLOPs compression rates on various
benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, while still achieving
excellent predictive performance for tasks such as object detection on
benchmark datasets.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
202
Accuracy Booster: Performance Boosting using Feature Map Re-calibration
Convolution Neural Networks (CNN) have been extremely successful in solving
intensive computer vision tasks. The convolutional filters used in CNNs have
played a major role in this success, by extracting useful features from the
inputs. Recently researchers have tried to boost the performance of CNNs by
re-calibrating the feature maps produced by these filters, e.g.,
Squeeze-and-Excitation Networks (SENets). These approaches have achieved better
performance by Exciting up the important channels or feature maps while
diminishing the rest. However, in the process, architectural complexity has
increased. We propose an architectural block that introduces much lower
complexity than the existing methods of CNN performance boosting while
performing significantly better than them. We carry out experiments on the
CIFAR, ImageNet and MS-COCO datasets, and show that the proposed block can
challenge the state-of-the-art results. Our method boosts the ResNet-50
architecture to perform comparably to the ResNet-152 architecture, which is a
three times deeper network, on classification. We also show experimentally that
our method is not limited to classification but also generalizes well to other
tasks such as object detection.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
202
Pelee: A Real-Time Object Detection System on Mobile Devices
There has been a rising interest in running high-quality Convolutional Neural Network (CNN) models under strict constraints on memory and computational budget. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and NASNet-A. However, all these architectures are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. Meanwhile, there are few studies that combine efficient models with fast object detection algorithms. This research tries to explore the design of an efficient CNN architecture for both image classification tasks and object detection tasks. We propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy by 0.6% and 11% lower computational cost than MobileNet, the state-of-the-art efficient architecture. It is also important to point out that PeleeNet is of only 66% of the model size of MobileNet and 1/49 size of VGG.
We then propose a real-time object detection system on mobile devices. We combine PeleeNet with Single Shot MultiBox Detector (SSD) method and optimize the architecture for fast speed. Meanwhile, we port SSD to iOS and provide an optimized code implementation. Our proposed detection system, named Pelee, achieves 70.9% mAP on PASCAL VOC2007 dataset at the speed of 17 FPS on iPhone 6s and 23.6 FPS on iPhone 8. Compared to TinyYOLOv2, the most widely used computational efficient object detection system, our proposed Pelee is more accurate (70.9% vs. 57.1%), 2.88 times lower in computational cost and 2.92 times smaller in model size
Model Compression Methods for YOLOv5: A Review
Over the past few years, extensive research has been devoted to enhancing
YOLO object detectors. Since its introduction, eight major versions of YOLO
have been introduced with the purpose of improving its accuracy and efficiency.
While the evident merits of YOLO have yielded to its extensive use in many
areas, deploying it on resource-limited devices poses challenges. To address
this issue, various neural network compression methods have been developed,
which fall under three main categories, namely network pruning, quantization,
and knowledge distillation. The fruitful outcomes of utilizing model
compression methods, such as lowering memory usage and inference time, make
them favorable, if not necessary, for deploying large neural networks on
hardware-constrained edge devices. In this review paper, our focus is on
pruning and quantization due to their comparative modularity. We categorize
them and analyze the practical results of applying those methods to YOLOv5. By
doing so, we identify gaps in adapting pruning and quantization for compressing
YOLOv5, and provide future directions in this area for further exploration.
Among several versions of YOLO, we specifically choose YOLOv5 for its excellent
trade-off between recency and popularity in literature. This is the first
specific review paper that surveys pruning and quantization methods from an
implementation point of view on YOLOv5. Our study is also extendable to newer
versions of YOLO as implementing them on resource-limited devices poses the
same challenges that persist even today. This paper targets those interested in
the practical deployment of model compression methods on YOLOv5, and in
exploring different compression techniques that can be used for subsequent
versions of YOLO.Comment: 18 pages, 7 Figure
- …