230 research outputs found

    CVABS: Moving Object Segmentation with Common Vector Approach for Videos

    Full text link
    Background modelling is a fundamental step for several real-time computer vision applications that requires security systems and monitoring. An accurate background model helps detecting activity of moving objects in the video. In this work, we have developed a new subspace based background modelling algorithm using the concept of Common Vector Approach with Gram-Schmidt orthogonalization. Once the background model that involves the common characteristic of different views corresponding to the same scene is acquired, a smart foreground detection and background updating procedure is applied based on dynamic control parameters. A variety of experiments is conducted on different problem types related to dynamic backgrounds. Several types of metrics are utilized as objective measures and the obtained visual results are judged subjectively. It was observed that the proposed method stands successfully for all problem types reported on CDNet2014 dataset by updating the background frames with a self-learning feedback mechanism.Comment: 12 Pages, 4 Figures, 1 Tabl

    Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos

    Get PDF
    Foreground/background (fg/bg) classification is an important first step for several video analysis tasks such as people counting, activity recognition and anomaly detection. As is the case for several other Computer Vision problems, the advent of deep Convolutional Neural Network (CNN) methods has led to major improvements in this field. However, despite their success, CNN-based methods have difficulties in coping with multi-scene videos where the scenes change multiple times along the time sequence. In this paper, we propose a deep features fusion network based foreground segmentation method (DFFnetSeg), which is both robust to scene changes and unseen scenes comparing with competitive state-of-the-art methods. In the heart of DFFnetSeg lies a fusion network that takes as input deep features extracted from a current frame, a previous frame, and a reference frame and produces as output a segmentation mask into background and foreground objects. We show the advantages of using a fusion network and the three frames group in dealing with the unseen scene and bootstrap challenge. In addition, we show that a simple reference frame updating strategy enables DFFnetSeg to be robust to sudden scene changes inside video sequences and prepare a motion map based post-processing method which further reduces false positives. Experimental results on the test dataset generated from CDnet2014 and Lasiesta demonstrate the advantages of the DFFnetSeg method

    Hierarchical improvement of foreground segmentation masks in background subtraction

    Full text link
    A plethora of algorithms have been defined for foreground segmentation, a fundamental stage for many computer vision applications. In this work, we propose a post-processing framework to improve foreground segmentation performance of background subtraction algorithms. We define a hierarchical framework for extending segmented foreground pixels to undetected foreground object areas and for removing erroneously segmented foreground. Firstly, we create a motion-aware hierarchical image segmentation of each frame that prevents merging foreground and background image regions. Then, we estimate the quality of the foreground mask through the fitness of the binary regions in the mask and the hierarchy of segmented regions. Finally, the improved foreground mask is obtained as an optimal labeling by jointly exploiting foreground quality and spatial color relations in a pixel-wise fully-connected Conditional Random Field. Experiments are conducted over four large and heterogeneous datasets with varied challenges (CDNET2014, LASIESTA, SABS and BMC) demonstrating the capability of the proposed framework to improve background subtraction resultsThis work was partially supported by the Spanish Government (HAVideo, TEC2014-53176-R

    Background Subtraction with Real-time Semantic Segmentation

    Full text link
    Accurate and fast foreground object extraction is very important for object tracking and recognition in video surveillance. Although many background subtraction (BGS) methods have been proposed in the recent past, it is still regarded as a tough problem due to the variety of challenging situations that occur in real-world scenarios. In this paper, we explore this problem from a new perspective and propose a novel background subtraction framework with real-time semantic segmentation (RTSS). Our proposed framework consists of two components, a traditional BGS segmenter B\mathcal{B} and a real-time semantic segmenter S\mathcal{S}. The BGS segmenter B\mathcal{B} aims to construct background models and segments foreground objects. The real-time semantic segmenter S\mathcal{S} is used to refine the foreground segmentation outputs as feedbacks for improving the model updating accuracy. B\mathcal{B} and S\mathcal{S} work in parallel on two threads. For each input frame ItI_t, the BGS segmenter B\mathcal{B} computes a preliminary foreground/background (FG/BG) mask BtB_t. At the same time, the real-time semantic segmenter S\mathcal{S} extracts the object-level semantics St{S}_t. Then, some specific rules are applied on Bt{B}_t and St{S}_t to generate the final detection Dt{D}_t. Finally, the refined FG/BG mask Dt{D}_t is fed back to update the background model. Comprehensive experiments evaluated on the CDnet 2014 dataset demonstrate that our proposed method achieves state-of-the-art performance among all unsupervised background subtraction methods while operating at real-time, and even performs better than some deep learning based supervised algorithms. In addition, our proposed framework is very flexible and has the potential for generalization

    Detection in Aerial Images Using Spatial Transformer Networks

    Get PDF
    Many tasks in the field of computer vision rely on an underlying change detection algorithm in images or video sequences. Although much research has focused on change detection in consumer images, there is little work related to change detection on aerial imagery, where individual images are recorded from aerial platforms over time. This thesis presents two deep learning approaches for detection in aerial images. Both systems leverage Spatial Transformer Networks (STN) that identify the coordinate transformation for their localization capabilities. The first approach is based on a semisupervised approach which learns to locate changes within a difference image. The second is a fully-supervised approach which learns to locate and discriminate relevant targets. The supervised approach is shown to locate nearly 78% of positive samples with an Intersection Over Union (IOU) criterion of over 0.5, and nearly 94% of positive samples with an IOU over 0.3

    ROBUST BACKGROUND SUBTRACTION FOR MOVING CAMERAS AND THEIR APPLICATIONS IN EGO-VISION SYSTEMS

    Get PDF
    Background subtraction is the algorithmic process that segments out the region of interest often known as foreground from the background. Extensive literature and numerous algorithms exist in this domain, but most research have focused on videos captured by static cameras. The proliferation of portable platforms equipped with cameras has resulted in a large amount of video data being generated from moving cameras. This motivates the need for foundational algorithms for foreground/background segmentation in videos from moving cameras. In this dissertation, I propose three new types of background subtraction algorithms for moving cameras based on appearance, motion, and a combination of them. Comprehensive evaluation of the proposed approaches on publicly available test sequences show superiority of our system over state-of-the-art algorithms. The first method is an appearance-based global modeling of foreground and background. Features are extracted by sliding a fixed size window over the entire image without any spatial constraint to accommodate arbitrary camera movements. Supervised learning method is then used to build foreground and background models. This method is suitable for limited scene scenarios such as Pan-Tilt-Zoom surveillance cameras. The second method relies on motion. It comprises of an innovative background motion approximation mechanism followed by spatial regulation through a Mega-Pixel denoising process. This work does not need to maintain any costly appearance models and is therefore appropriate for resource constraint ego-vision systems. The proposed segmentation combined with skin cues is validated by a novel application on authenticating hand-gestured signature captured by wearable cameras. The third method combines both motion and appearance. Foreground probabilities are jointly estimated by motion and appearance. After the mega-pixel denoising process, the probability estimates and gradient image are combined by Graph-Cut to produce the segmentation mask. This method is universal as it can handle all types of moving cameras
    • …
    corecore