329 research outputs found

    Threshold adaptation and XOR accumulation algorithm for objects detection

    Get PDF
    Object detection, tracking and video analysis are vital and energetic tasks for intelligent video surveillance systems and computer vision applications. Object detection based on background modelling is a major technique used in dynamically objects extraction over video streams. This paper presents the threshold adaptation and XOR accumulation (TAXA) algorithm in three systematic stages throughout video sequences. First, the continuous calculation, updating and elimination of noisy background details with hybrid statistical techniques. Second, thresholds are calculated with an effective mean and gaussian for the detection of the pixels of the objects. The third is a novel step in making decisions by using XOR-accumulation to extract pixels of the objects from the thresholds accurately. Each stage was presented with practical representations and theoretical explanations. On high resolution video which has difficult scenes and lighting conditions, the proposed algorithm was used and tested. As a result, with a precision average of 0.90% memory uses of 6.56% and the use of CPU 20% as well as time performance, the result excellent overall superior to all the major used foreground object extraction algorithms. As a conclusion, in comparison to other popular OpenCV methods the proposed TAXA algorithm has excellent detection ability

    Deep Learning-Based Low Complexity and High Efficiency Moving Object Detection Methods

    Get PDF
    Moving object detection (MOD) is the process of extracting dynamic foreground content from the video frames, such as moving vehicles or pedestrians, while discarding the nonmoving background. It plays an essential role in computer vision field. The traditional methods meet difficulties when applied in complex scenarios, such as videos with illumination changes, shadows, night scenes,and dynamic backgrounds. Deep learning methods have been actively applied to moving object detection in recent years and demonstrated impressive results. However, many existing models render superior detection accuracy at the cost of high computational complexity and slow inference speed. This fact has hindered the development of such models in mobile and embedded vision tasks, which need to be carried out in a timely fashion on a computationally limited platform. The current research aims to use the technique of separable convolution in both 2D and 3D CNN together with our proposed multi-input multi-output strategy and two-branch structure to devise new deep network models that significantly improve inference speed, yet require smaller model size and achieve reduction in floating-point operations as compared to existing deep learning models with competitive detection accuracy. This research devised three deep neural network models, addressing the following main problems in the area of moving object detection: 1. Improving Detection Accuracy by extracting both spatial and temporal information: To improve detection accuracy, the proposed models adopt 3D convolution which is more suitable to extract both spatial and temporal information in video data than 2D convolution. We also put this 3D convolution into two-branch network that extracts both high-level global features and low-level detailed features can further increase the accuracy. 2. Reduce model size and computational complexity by changing network structure: The standard 2D and 3D convolution are further decomposed into depthwise and pointwise convolutions. While existing 3D separable CNN all addressed other problems such as gesture recognition, force prediction, 3D object classification or reconstruction, our work applied it to the moving object detection task for the first time in the literature. 3. Increasing inference speed by changing the input-output relationship: We proposed a multi-input multi-output (MIMO) strategy to increase inference speed, which can take multiple frames as the network input and output multiple frames of detection results. This MIMO embedded in 3Dseparable CNN can further increase model inference speed significantly and maintain high detection accuracy. Compared to state-of-the-art approaches, our proposed methods significantly increases the inference speed, reduces the model size, meanwhile achieving the highest detection accuracy in the scene dependent evaluation (SDE) setup and maintaining a competitive detection accuracy in the scene independent evaluation (SIE) setup. The SDE setup is widely used to tune and test the model on a specific video as the training and test sets are from the same video. The SIE setup is designed to assess the generalization capability of the model on completely unseen videos

    Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos

    Get PDF
    Foreground/background (fg/bg) classification is an important first step for several video analysis tasks such as people counting, activity recognition and anomaly detection. As is the case for several other Computer Vision problems, the advent of deep Convolutional Neural Network (CNN) methods has led to major improvements in this field. However, despite their success, CNN-based methods have difficulties in coping with multi-scene videos where the scenes change multiple times along the time sequence. In this paper, we propose a deep features fusion network based foreground segmentation method (DFFnetSeg), which is both robust to scene changes and unseen scenes comparing with competitive state-of-the-art methods. In the heart of DFFnetSeg lies a fusion network that takes as input deep features extracted from a current frame, a previous frame, and a reference frame and produces as output a segmentation mask into background and foreground objects. We show the advantages of using a fusion network and the three frames group in dealing with the unseen scene and bootstrap challenge. In addition, we show that a simple reference frame updating strategy enables DFFnetSeg to be robust to sudden scene changes inside video sequences and prepare a motion map based post-processing method which further reduces false positives. Experimental results on the test dataset generated from CDnet2014 and Lasiesta demonstrate the advantages of the DFFnetSeg method

    Video foreground segmentation with deep learning

    Get PDF
    This thesis tackles the problem of foreground segmentation in videos, even under extremely challenging conditions. This task comes with a plethora of hurdles, as the model needs to distinguish the difference between moving objects and irrelevant background motion which can be caused by the weather, illumination, camera movement etc. As foreground segmentation is often the first step of various highly important applications (video surveillance for security, patient/infant monitoring etc.), it is crucial to develop a model capable of producing excellent results in all kinds of conditions. In order to tackle this problem, we follow the recent trend in other computer vision areas and harness the power of deep learning. We design architectures of convolutional neural networks specifically targeted to counter the aforementioned challenges. We first propose a 3D CNN that models the spatial and temporal information of the scene simultaneously. The network is deep enough to successfully cover more than 50 different scenes of various conditions with no need for any fine-tuning. These conditions include illumination (day or night), weather (sunny, rainy or snowing), background movements (trees moving from the wind, fountains etc) and others. Next, we propose a data augmentation method specifically targeted to illumination changes. We show that artificially augmenting the data set with this method significantly improves the segmentation results, even when tested under sudden illumination changes. We also present a post-processing method that exploits the temporal information of the input video. Finally, we propose a complex deep learning model which learns the illumination of the scene and performs foreground segmentation simultaneously

    Data-driven Speech Enhancement:from Non-negative Matrix Factorization to Deep Representation Learning

    Get PDF

    Moving Object Detection based on RGBD Information

    Get PDF
    This thesis is targeting the Moving Object Detection topic, more specifically, the Background Subtraction. In this study, we proposed two approaches using color and depth information to solve the background subtraction. The following two paragraphs will give a brief abstract for each approach. In this research study, we propose a framework for improving traditional Background Subtraction techniques. This framework is based on two data types: color and depth; it stands for obtaining preliminary results of the background segmentation using Depth and RGB channels independently, then using an algorithm to fuse them to create the final results. The experiments on the SBM-RGBD dataset using four methods: ViBe, LOBSTER, SuBSENSE, and PAWCS, proved that the proposed framework achieves an impressive performance compared to the original RGB-based techniques from the state-of-the-art. This dissertation also proposes a novel deep learning model called Deep Multi-Scale Network (DMSN) for Background Subtraction. This convolutional neural network is built to use RGB color channels and Depth maps as inputs with which it can fuse semantic and spatial information. Compared with previous Deep Learning Background Subtraction techniques that lack information due to their use of only RGB channels, our RGBD version can overcome most of the drawbacks, especially in some particular challenges. Further, this study introduces a new protocol for the SBM-RGBD dataset regarding scene-independent evaluation, dedicated to Deep Learning methods to set up a competitive platform that includes more challenging situations. The proposed method proved its efficiency in solving the background subtraction in complex problems at different levels. The experimental results verify that the proposed work outperforms the state-of-the-art on SBM-RGBD and GSM datasets
    • …
    corecore