23,694 research outputs found

    SBNet: Sparse Blocks Network for Fast Inference

    Full text link
    Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. We show that such computation masks can be used to reduce computation in the high-resolution main network. Variants of sparse activation CNNs have previously been explored on small-scale tasks and showed no degradation in terms of object classification accuracy, but often measured gains in terms of theoretical FLOPs without realizing a practical speed-up when compared to highly optimized dense convolution implementations. In this work, we leverage the sparsity structure of computation masks and propose a novel tiling-based sparse convolution algorithm. We verified the effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we report significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.Comment: 10 pages, CVPR 201

    Fast object detection in compressed JPEG Images

    Full text link
    Object detection in still images has drawn a lot of attention over past few years, and with the advent of Deep Learning impressive performances have been achieved with numerous industrial applications. Most of these deep learning models rely on RGB images to localize and identify objects in the image. However in some application scenarii, images are compressed either for storage savings or fast transmission. Therefore a time consuming image decompression step is compulsory in order to apply the aforementioned deep models. To alleviate this drawback, we propose a fast deep architecture for object detection in JPEG images, one of the most widespread compression format. We train a neural network to detect objects based on the blockwise DCT (discrete cosine transform) coefficients {issued from} the JPEG compression algorithm. We modify the well-known Single Shot multibox Detector (SSD) by replacing its first layers with one convolutional layer dedicated to process the DCT inputs. Experimental evaluations on PASCAL VOC and industrial dataset comprising images of road traffic surveillance show that the model is about 2×2\times faster than regular SSD with promising detection performances. To the best of our knowledge, this paper is the first to address detection in compressed JPEG images

    Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation

    Full text link
    As the demand for enabling high-level autonomous driving has increased in recent years and visual perception is one of the critical features to enable fully autonomous driving, in this paper, we introduce an efficient approach for simultaneous object detection, depth estimation and pixel-level semantic segmentation using a shared convolutional architecture. The proposed network model, which we named Driving Scene Perception Network (DSPNet), uses multi-level feature maps and multi-task learning to improve the accuracy and efficiency of object detection, depth estimation and image segmentation tasks from a single input image. Hence, the resulting network model uses less than 850 MiB of GPU memory and achieves 14.0 fps on NVIDIA GeForce GTX 1080 with a 1024x512 input image, and both precision and efficiency have been improved over combination of single tasks.Comment: 9 pages, 7 figures, WACV'1

    DEEP FULLY RESIDUAL CONVOLUTIONAL NEURAL NETWORK FOR SEMANTIC IMAGE SEGMENTATION

    Get PDF
    Department of Computer Science and EngineeringThe goal of semantic image segmentation is to partition the pixels of an image into semantically meaningful parts and classifying those parts according to a predefined label set. Although object recognition models achieved remarkable performance recently and they even surpass human???s ability to recognize objects, but semantic segmentation models are still behind. One of the reason that makes semantic segmentation relatively a hard problem is the image understanding at pixel level by considering global context as oppose to object recognition. One other challenge is transferring the knowledge of an object recognition model for the task of semantic segmentation. In this thesis, we are delineating some of the main challenges we faced approaching semantic image segmentation with machine learning algorithms. Our main focus was how we can use deep learning algorithms for this task since they require the least amount of feature engineering and also it was shown that such models can be applied to large scale datasets and exhibit remarkable performance. More precisely, we worked on a variation of convolutional neural networks (CNN) suitable for the semantic segmentation task. We proposed a model called deep fully residual convolutional networks (DFRCN) to tackle this problem. Utilizing residual learning makes training of deep models feasible which ultimately leads to having a rich powerful visual representation. Our model also benefits from skip-connections which ease the propagation of information from the encoder module to the decoder module. This would enable our model to have less parameters in the decoder module while it also achieves better performance. We also benchmarked the effective variation of the proposed model on a semantic segmentation benchmark. We first make a thorough review of current high-performance models and the problems one might face when trying to replicate such models which mainly arose from the lack of sufficient provided information. Then, we describe our own novel method which we called deep fully residual convolutional network (DFRCN). We showed that our method exhibits state of the art performance on a challenging benchmark for aerial image segmentation.clos

    FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

    Full text link
    In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of densities in each frame, which significantly accelerates the training of networks. To preserve feature map resolution, we propose a Hyper-Atrous combination to integrate atrous convolution in FCN and combine feature maps of different convolution layers. FCN-rLSTM enables refined feature representation and a novel end-to-end trainable mapping from pixels to vehicle count. We extensively evaluated the proposed method on different counting tasks with three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21 on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201

    Practical classification of different moving targets using automotive radar and deep neural networks

    Get PDF
    In this work, the authors present results for classification of different classes of targets (car, single and multiple people, bicycle) using automotive radar data and different neural networks. A fast implementation of radar algorithms for detection, tracking, and micro-Doppler extraction is proposed in conjunction with the automotive radar transceiver TEF810X and microcontroller unit SR32R274 manufactured by NXP Semiconductors. Three different types of neural networks are considered, namely a classic convolutional network, a residual network, and a combination of convolutional and recurrent network, for different classification problems across the four classes of targets recorded. Considerable accuracy (close to 100% in some cases) and low latency of the radar pre-processing prior to classification (∼0.55 s to produce a 0.5 s long spectrogram) are demonstrated in this study, and possible shortcomings and outstanding issues are discussed
    corecore