8 research outputs found

    High-quality region-based foreground segmentation using a spatial grid of SVM classifiers

    Get PDF
    This paper presents a novel background modeling system that uses a spatial grid of Support Vector Machines classifiers for segmenting moving objects, which is a key step in many video-based consumer applications. The system is able to adapt to a large range of dynamic background situations since no parametric model or statistical distribution are assumed. This is achieved by using a different classifier per image region that learns the specific appearance of that scene region and its variations (illumination changes, dynamic backgrounds, etc.). The proposed system has been tested with a recent public database, outperforming other state-of-the-art algorithms

    A Fusion Framework for Camouflaged Moving Foreground Detection in the Wavelet Domain

    Full text link
    Detecting camouflaged moving foreground objects has been known to be difficult due to the similarity between the foreground objects and the background. Conventional methods cannot distinguish the foreground from background due to the small differences between them and thus suffer from under-detection of the camouflaged foreground objects. In this paper, we present a fusion framework to address this problem in the wavelet domain. We first show that the small differences in the image domain can be highlighted in certain wavelet bands. Then the likelihood of each wavelet coefficient being foreground is estimated by formulating foreground and background models for each wavelet band. The proposed framework effectively aggregates the likelihoods from different wavelet bands based on the characteristics of the wavelet transform. Experimental results demonstrated that the proposed method significantly outperformed existing methods in detecting camouflaged foreground objects. Specifically, the average F-measure for the proposed algorithm was 0.87, compared to 0.71 to 0.8 for the other state-of-the-art methods.Comment: 13 pages, accepted by IEEE TI

    Foreground segmentation in depth imagery using depth and spatial dynamic models for video surveillance applications

    Get PDF
    Low-cost systems that can obtain a high-quality foreground segmentation almostindependently of the existing illumination conditions for indoor environments are verydesirable, especially for security and surveillance applications. In this paper, a novelforeground segmentation algorithm that uses only a Kinect depth sensor is proposedto satisfy the aforementioned system characteristics. This is achieved by combininga mixture of Gaussians-based background subtraction algorithm with a new Bayesiannetwork that robustly predicts the foreground/background regions between consecutivetime steps. The Bayesian network explicitly exploits the intrinsic characteristics ofthe depth data by means of two dynamic models that estimate the spatial and depthevolution of the foreground/background regions. The most remarkable contribution is thedepth-based dynamic model that predicts the changes in the foreground depth distributionbetween consecutive time steps. This is a key difference with regard to visible imagery,where the color/gray distribution of the foreground is typically assumed to be constant.Experiments carried out on two different depth-based databases demonstrate that theproposed combination of algorithms is able to obtain a more accurate segmentation of theforeground/background than other state-of-the art approaches

    Background modeling using adaptive pixelwise kernel variances in a hybrid feature space

    No full text
    Recent work on background subtraction has shown de- velopments on two major fronts. In one, there has been increasing sophistication of probabilistic models, from mix- tures of Gaussians at each pixel [ 7 ], to kernel density esti- mates at each pixel [ 1 ], and more recently to joint domain- range density estimates that incorporate spatial informa- tion [ 6 ]. Another line of work has shown the benefits of increasingly complex feature representations, including the use of texture information, local binary patterns, and re- cently scale-invariant local ternary patterns [ 4 ]. In this work, we use joint domain-range based estimates for back- ground and foreground scores and show that dynamically choosing kernel variances in our kernel estimates at each individual pixel can significantly improve results. We give a heuristic method for selectively applying the adaptive ker- nel calculations which is nearly as accurate as the full pro- cedure but runs much faster. We combine these modeling improvements with recently developed complex features [ 4 ] and show significant improvements on a standard back- grounding benchmark

    Interest Detection in Image, Video and Multiple Videos: Model and Applications

    Get PDF
    Interest detection is detecting an object, event, or process that draws attention. In this dissertation, we focus on interest detection in images, video and multiple videos. Interest detection in an image or a video is closely related to visual attention. However, the interest detection in multiple videos needs to consider all the videos as a whole rather than considering the attention in each single video independently. Visual attention is an important mechanism of human vision. The computational model of visual attention has recently attracted a lot of interest in the computer vision community mainly because it helps find the objects or regions that efficiently represent a scene and thus aids in solving complex vision problems such as scene understanding. In this dissertation, we first introduce a new computational visual-attention model for detecting region of interest in static images and/or videos. This model constructs the saliency map for each image and takes the region with the highest saliency value as the region of interest. Specifically, we use the Earth Mover’s Distance (EMD) to measure the center-surround difference in the receptive field. Furthermore, we propose to take two steps of biologically-inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the Lm-norm and then combining the super features using the Winner-Take- All mechanism. Then, we extend the proposed model to construct dynamic saliency maps from videos by computing the center-surround difference in the spatio-temporal receptive field. Motivated by the natural relation between visual saliency and object/region of interest, we then propose an algorithm to isolate infrequently moving foreground from background with frequent local motions, in which the saliency detection technique is used to identify the foreground (object/region of interest) and background. Traditional motion detection usually assumes that the background is static while the foreground objects are moving most of the time. However, in practice, especially in surveillance, the foreground objects may show infrequent motion. For example, a person may stand in the same place for most of the time. Meanwhile, the background may contain frequent local motions, such as trees and/or grass waving in the breeze. Such complexities may prevent the existing background subtraction algorithms from correctly identifying the foreground objects. In this dissertation, we propose a background subtraction approach that can detect the foreground objects with frequent and/or infrequent motions. Finally, we focus on the task of locating the co-interest person from multiple temporally synchronized videos taken by the multiple wearable cameras. More specifically, we propose a co-interest detection algorithm that can find persons that draw attention from most camera wearers, even if multiple similar-appearance persons are present in the videos. Our basic idea is to exploit the motion pattern, location, and size of persons detected in different synchronized videos and use them to correlate the detected persons across different videos – one person in a video may be the same person in another video at the same time. We utilized a Conditional Random Field (CRF) to achieve this goal, by taking each frame as a node and the detected persons as the states at each node. We collect three sets of wearable-camera videos for testing the proposed algorithm where each set consists of six temporally synchronized videos
    corecore