145 research outputs found

    Multi-layer pruning framework for compressing Single Shot MultiBox detector

    Get PDF
    We propose a framework for compressing state-of-the-art Single Shot MultiBox Detector (SSD). The framework addresses compression in the following stages: Sparsity Induction, Filter Selection, and Filter Pruning. In the Sparsity Induction stage, the object detector model is sparsified via an improved global threshold. In Filter Selection &amp; Pruning stage, we select and remove filters using sparsity statistics of filter weights in two consecutive convolutional layers. This results in the model with the size smaller than most existing compact architectures. We evaluate the performance of our framework with multiple datasets and compare over multiple methods. Experimental results show that our method achieves state-of-the-art compression of 6.7X and 4.9X on PASCAL VOC dataset on models SSD300 and SSD512 respectively. We further show that the method produces maximum compression of 26X with SSD512 on German Traffic Sign Detection Benchmark (GTSDB). Additionally, we also empirically show our method’s adaptability for classification based architecture VGG16 on datasets CIFAR and German Traffic Sign Recognition Benchmark (GTSRB) achieving a compression rate of 125X and 200X with the reduction in flops by 90.50% and 96.6% respectively with no loss of accuracy. In addition to this, our method does not require any special libraries or hardware support for the resulting compressed models.</p

    Real-time human detection for electricity conservation using pruned-SSD and arduino

    Get PDF
    Electricity conservation techniques have gained more importance in recent years. Many smart techniques are invented to save electricity with the help of assisted devices like sensors. Though it saves electricity, it adds an additional sensor cost to the system. This work aims to develop a system that manages the electric power supply, only when it is actually needed i.e., the system enables the power supply when a human is present in the location and disables it otherwise. The system avoids any additional costs by using the closed circuit television, which is installed in most of the places for security reasons. Human detection is done by a Modified-single shot detection with a specific hyperparameter tuning method. Further the model is pruned to reduce the computational cost of the framework which in turn reduces the processing speed of the network drastically. The model yields the output to the Arduino micro-controller to enable the power supply in and around the location only when a human is detected and disables it when the human exits. The model is evaluated on CHOKEPOINT dataset and real-time video surveillance footage. Experimental results have shown an average accuracy of 85.82% with 2.1 seconds of processing time per frame

    Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

    Full text link
    When approaching a novel visual recognition problem in a specialized image domain, a common strategy is to start with a pre-trained deep neural network and fine-tune it to the specialized domain. If the target domain covers a smaller visual space than the source domain used for pre-training (e.g. ImageNet), the fine-tuned network is likely to be over-parameterized. However, applying network pruning as a post-processing step to reduce the memory requirements has drawbacks: fine-tuning and pruning are performed independently; pruning parameters are set once and cannot adapt over time; and the highly parameterized nature of state-of-the-art pruning methods make it prohibitive to manually search the pruning parameter space for deep networks, leading to coarse approximations. We propose a principled method for jointly fine-tuning and compressing a pre-trained convolutional network that overcomes these limitations. Experiments on two specialized image domains (remote sensing images and describable textures) demonstrate the validity of the proposed approach.Comment: BMVC 2017 ora

    Leveraging Filter Correlations for Deep Model Compression

    Get PDF
    We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair na\"{i}vely, the model is re-optimized to make the filters in these pairs maximally correlated, so that discarding one of the filters from the pair results in minimal information loss. Moreover, after discarding the filters in each round, we further finetune the model to recover from the potential small loss incurred by the compression. We evaluate our proposed approach using a comprehensive set of experiments and ablation studies. Our compression method yields state-of-the-art FLOPs compression rates on various benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, while still achieving excellent predictive performance for tasks such as object detection on benchmark datasets.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV), 202

    Accuracy Booster: Performance Boosting using Feature Map Re-calibration

    Get PDF
    Convolution Neural Networks (CNN) have been extremely successful in solving intensive computer vision tasks. The convolutional filters used in CNNs have played a major role in this success, by extracting useful features from the inputs. Recently researchers have tried to boost the performance of CNNs by re-calibrating the feature maps produced by these filters, e.g., Squeeze-and-Excitation Networks (SENets). These approaches have achieved better performance by Exciting up the important channels or feature maps while diminishing the rest. However, in the process, architectural complexity has increased. We propose an architectural block that introduces much lower complexity than the existing methods of CNN performance boosting while performing significantly better than them. We carry out experiments on the CIFAR, ImageNet and MS-COCO datasets, and show that the proposed block can challenge the state-of-the-art results. Our method boosts the ResNet-50 architecture to perform comparably to the ResNet-152 architecture, which is a three times deeper network, on classification. We also show experimentally that our method is not limited to classification but also generalizes well to other tasks such as object detection.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV), 202

    Pelee: A Real-Time Object Detection System on Mobile Devices

    Get PDF
    There has been a rising interest in running high-quality Convolutional Neural Network (CNN) models under strict constraints on memory and computational budget. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and NASNet-A. However, all these architectures are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. Meanwhile, there are few studies that combine efficient models with fast object detection algorithms. This research tries to explore the design of an efficient CNN architecture for both image classification tasks and object detection tasks. We propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy by 0.6% and 11% lower computational cost than MobileNet, the state-of-the-art efficient architecture. It is also important to point out that PeleeNet is of only 66% of the model size of MobileNet and 1/49 size of VGG. We then propose a real-time object detection system on mobile devices. We combine PeleeNet with Single Shot MultiBox Detector (SSD) method and optimize the architecture for fast speed. Meanwhile, we port SSD to iOS and provide an optimized code implementation. Our proposed detection system, named Pelee, achieves 70.9% mAP on PASCAL VOC2007 dataset at the speed of 17 FPS on iPhone 6s and 23.6 FPS on iPhone 8. Compared to TinyYOLOv2, the most widely used computational efficient object detection system, our proposed Pelee is more accurate (70.9% vs. 57.1%), 2.88 times lower in computational cost and 2.92 times smaller in model size

    Model Compression Methods for YOLOv5: A Review

    Full text link
    Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.Comment: 18 pages, 7 Figure
    • …
    corecore