108,632 research outputs found

    Applying the Convolutional Neural Network Deep Learning Technology to Behavioural Recognition in Intelligent Video

    Get PDF
    In order to improve the accuracy and real-time performance of abnormal behaviour identification in massive video monitoring data, the authors design intelligent video technology based on convolutional neural network deep learning and apply it to the smart city on the basis of summarizing video development technology. First, the technical framework of intelligent video monitoring algorithm is divided into bottom (object detection), middle (object identification) and high (behaviour analysis) layers. The object detection based on background modelling is applied to routine real-time detection and early warning. The object detection based on object modelling is applied to after-event data query and retrieval. The related optical flow algorithms are used to achieve the identification and detection of abnormal behaviours. In order to improve the accuracy, effectiveness and intelligence of identification, the deep learning technology based on convolutional neural network is applied to enhance the learning and identification ability of learning machine and realize the real-time upgrade of intelligence video’s "brain". This research has a good popularization value in the application field of intelligent video technology

    Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

    Get PDF
    Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system

    Real-Time Facial Emotion Recognition Using Fast R-CNN

    Get PDF
    In computer vision and image processing, object detection algorithms are used to detect semantic objects of certain classes of images and videos. Object detector algorithms use deep learning networks to classify detected regions. Unprecedented advancements in Convolutional Neural Networks (CNN) have led to new possibilities and implementations for object detectors. An object detector which uses a deep learning algorithm detect objects through proposed regions, and then classifies the region using a CNN. Object detectors are computationally efficient unlike a typical CNN which is computationally complex and expensive. Object detectors are widely used for face detection, recognition, and object tracking. In this thesis, deep learning based object detection algorithms are implemented to classify facially expressed emotions in real-time captured through a webcam. A typical CNN would classify images without specifying regions within an image, which could be considered as a limitation towards better understanding the network performance which depend on different training options. It would also be more difficult to verify whether a network have converged and is able to generalize, which is the ability to classify unseen data, data which was not part of the training set. Fast Region-based Convolutional Neural Network, an object detection algorithm; used to detect facially expressed emotion in real-time by classifying proposed regions. The Fast R-CNN is trained using a high-quality video database, consisting of 24 actors, facially expressing eight different emotions, obtained from images which were processed from 60 videos per actor. An object detector’s performance is measured using various metrics. Regardless of how an object detector performed with respect to average precision or miss rate, doing well on such metrics would not necessarily mean that the network is correctly classifying regions. This may result from the fact that the network model has been over-trained. In our work we showed that object detector algorithm such as Fast R-CNN performed surprisingly well in classifying facially expressed emotions in real-time, performing better than CNN

    Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving

    Get PDF
    Autonomous driving has harsh requirements of small model size and energy efficiency, in order to enable the embedded system to achieve real-time on-board object detection. Recent deep convolutional neural network based object detectors have achieved state-of-the-art accuracy. However, such models are trained with numerous parameters and their high computational costs and large storage prohibit the deployment to memory and computation resource limited systems. Low-precision neural networks are popular techniques for reducing the computation requirements and memory footprint. Among them, binary weight neural network (BWN) is the extreme case which quantizes the float-point into just 11 bit. BWNs are difficult to train and suffer from accuracy deprecation due to the extreme low-bit representation. To address this problem, we propose a knowledge transfer (KT) method to aid the training of BWN using a full-precision teacher network. We built DarkNet- and MobileNet-based binary weight YOLO-v2 detectors and conduct experiments on KITTI benchmark for car, pedestrian and cyclist detection. The experimental results show that the proposed method maintains high detection accuracy while reducing the model size of DarkNet-YOLO from 257 MB to 8.8 MB and MobileNet-YOLO from 193 MB to 7.9 MB.Comment: Accepted by ICRA 201

    SBNet: Sparse Blocks Network for Fast Inference

    Full text link
    Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. We show that such computation masks can be used to reduce computation in the high-resolution main network. Variants of sparse activation CNNs have previously been explored on small-scale tasks and showed no degradation in terms of object classification accuracy, but often measured gains in terms of theoretical FLOPs without realizing a practical speed-up when compared to highly optimized dense convolution implementations. In this work, we leverage the sparsity structure of computation masks and propose a novel tiling-based sparse convolution algorithm. We verified the effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we report significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.Comment: 10 pages, CVPR 201
    corecore