663 research outputs found

    Foreground-Background Segmentation Based on Codebook and Edge Detector

    Full text link
    Background modeling techniques are used for moving object detection in video. Many algorithms exist in the field of object detection with different purposes. In this paper, we propose an improvement of moving object detection based on codebook segmentation. We associate the original codebook algorithm with an edge detection algorithm. Our goal is to prove the efficiency of using an edge detection algorithm with a background modeling algorithm. Throughout our study, we compared the quality of the moving object detection when codebook segmentation algorithm is associated with some standard edge detectors. In each case, we use frame-based metrics for the evaluation of the detection. The different results are presented and analyzed.Comment: to appear in the 10th International Conference on Signal Image Technology & Internet Based Systems, 201

    Human shape modelling for carried object detection and segmentation

    Get PDF
    La détection des objets transportés est un des prérequis pour développer des systèmes qui cherchent à comprendre les activités impliquant des personnes et des objets. Cette thèse présente de nouvelles méthodes pour détecter et segmenter les objets transportés dans des vidéos de surveillance. Les contributions sont divisées en trois principaux chapitres. Dans le premier chapitre, nous introduisons notre détecteur d’objets transportés, qui nous permet de détecter un type générique d’objets. Nous formulons la détection d’objets transportés comme un problème de classification de contours. Nous classifions le contour des objets mobiles en deux classes : objets transportés et personnes. Un masque de probabilités est généré pour le contour d’une personne basé sur un ensemble d’exemplaires (ECE) de personnes qui marchent ou se tiennent debout de différents points de vue. Les contours qui ne correspondent pas au masque de probabilités généré sont considérés comme des candidats pour être des objets transportés. Ensuite, une région est assignée à chaque objet transporté en utilisant la Coupe Biaisée Normalisée (BNC) avec une probabilité obtenue par une fonction pondérée de son chevauchement avec l’hypothèse du masque de contours de la personne et du premier plan segmenté. Finalement, les objets transportés sont détectés en appliquant une Suppression des Non-Maxima (NMS) qui élimine les scores trop bas pour les objets candidats. Le deuxième chapitre de contribution présente une approche pour détecter des objets transportés avec une méthode innovatrice pour extraire des caractéristiques des régions d’avant-plan basée sur leurs contours locaux et l’information des super-pixels. Initiallement, un objet bougeant dans une séquence vidéo est segmente en super-pixels sous plusieurs échelles. Ensuite, les régions ressemblant à des personnes dans l’avant-plan sont identifiées en utilisant un ensemble de caractéristiques extraites de super-pixels dans un codebook de formes locales. Ici, les régions ressemblant à des humains sont équivalentes au masque de probabilités de la première méthode (ECE). Notre deuxième détecteur d’objets transportés bénéficie du nouveau descripteur de caractéristiques pour produire une carte de probabilité plus précise. Les compléments des super-pixels correspondants aux régions ressemblant à des personnes dans l’avant-plan sont considérés comme une carte de probabilité des objets transportés. Finalement, chaque groupe de super-pixels voisins avec une haute probabilité d’objets transportés et qui ont un fort support de bordure sont fusionnés pour former un objet transporté. Finalement, dans le troisième chapitre, nous présentons une méthode pour détecter et segmenter les objets transportés. La méthode proposée adopte le nouveau descripteur basé sur les super-pixels pour iii identifier les régions ressemblant à des objets transportés en utilisant la modélisation de la forme humaine. En utilisant l’information spatio-temporelle des régions candidates, la consistance des objets transportés récurrents, vus dans le temps, est obtenue et sert à détecter les objets transportés. Enfin, les régions d’objets transportés sont raffinées en intégrant de l’information sur leur apparence et leur position à travers le temps avec une extension spatio-temporelle de GrabCut. Cette étape finale sert à segmenter avec précision les objets transportés dans les séquences vidéo. Nos méthodes sont complètement automatiques, et font des suppositions minimales sur les personnes, les objets transportés, et les les séquences vidéo. Nous évaluons les méthodes décrites en utilisant deux ensembles de données, PETS 2006 et i-Lids AVSS. Nous évaluons notre détecteur et nos méthodes de segmentation en les comparant avec l’état de l’art. L’évaluation expérimentale sur les deux ensembles de données démontre que notre détecteur d’objets transportés et nos méthodes de segmentation surpassent de façon significative les algorithmes compétiteurs.Detecting carried objects is one of the requirements for developing systems that reason about activities involving people and objects. This thesis presents novel methods to detect and segment carried objects in surveillance videos. The contributions are divided into three main chapters. In the first, we introduce our carried object detector which allows to detect a generic class of objects. We formulate carried object detection in terms of a contour classification problem. We classify moving object contours into two classes: carried object and person. A probability mask for person’s contours is generated based on an ensemble of contour exemplars (ECE) of walking/standing humans in different viewing directions. Contours that are not falling in the generated hypothesis mask are considered as candidates for carried object contours. Then, a region is assigned to each carried object candidate contour using Biased Normalized Cut (BNC) with a probability obtained by a weighted function of its overlap with the person’s contour hypothesis mask and segmented foreground. Finally, carried objects are detected by applying a Non-Maximum Suppression (NMS) method which eliminates the low score carried object candidates. The second contribution presents an approach to detect carried objects with an innovative method for extracting features from foreground regions based on their local contours and superpixel information. Initially, a moving object in a video frame is segmented into multi-scale superpixels. Then human-like regions in the foreground area are identified by matching a set of extracted features from superpixels against a codebook of local shapes. Here the definition of human like regions is equivalent to a person’s probability map in our first proposed method (ECE). Our second carried object detector benefits from the novel feature descriptor to produce a more accurate probability map. Complement of the matching probabilities of superpixels to human-like regions in the foreground are considered as a carried object probability map. At the end, each group of neighboring superpixels with a high carried object probability which has strong edge support is merged to form a carried object. Finally, in the third contribution we present a method to detect and segment carried objects. The proposed method adopts the new superpixel-based descriptor to identify carried object-like candidate regions using human shape modeling. Using spatio-temporal information of the candidate regions, consistency of recurring carried object candidates viewed over time is obtained and serves to detect carried objects. Last, the detected carried object regions are refined by integrating information of their appearances and their locations over time with a spatio-temporal extension of GrabCut. This final stage is used to accurately segment carried objects in frames. Our methods are fully automatic, and make minimal assumptions about a person, carried objects and videos. We evaluate the aforementioned methods using two available datasets PETS 2006 and i-Lids AVSS. We compare our detector and segmentation methods against a state-of-the-art detector. Experimental evaluation on the two datasets demonstrates that both our carried object detection and segmentation methods significantly outperform competing algorithms

    Video foreground extraction for mobile camera platforms

    Get PDF
    Foreground object detection is a fundamental task in computer vision with many applications in areas such as object tracking, event identification, and behavior analysis. Most conventional foreground object detection methods work only in a stable illumination environments using fixed cameras. In real-world applications, however, it is often the case that the algorithm needs to operate under the following challenging conditions: drastic lighting changes, object shape complexity, moving cameras, low frame capture rates, and low resolution images. This thesis presents four novel approaches for foreground object detection on real-world datasets using cameras deployed on moving vehicles.The first problem addresses passenger detection and tracking tasks for public transport buses investigating the problem of changing illumination conditions and low frame capture rates. Our approach integrates a stable SIFT (Scale Invariant Feature Transform) background seat modelling method with a human shape model into a weighted Bayesian framework to detect passengers. To deal with the problem of tracking multiple targets, we employ the Reversible Jump Monte Carlo Markov Chain tracking algorithm. Using the SVM classifier, the appearance transformation models capture changes in the appearance of the foreground objects across two consecutives frames under low frame rate conditions. In the second problem, we present a system for pedestrian detection involving scenes captured by a mobile bus surveillance system. It integrates scene localization, foreground-background separation, and pedestrian detection modules into a unified detection framework. The scene localization module performs a two stage clustering of the video data.In the first stage, SIFT Homography is applied to cluster frames in terms of their structural similarity, and the second stage further clusters these aligned frames according to consistency in illumination. This produces clusters of images that are differential in viewpoint and lighting. A kernel density estimation (KDE) technique for colour and gradient is then used to construct background models for each image cluster, which is further used to detect candidate foreground pixels. Finally, using a hierarchical template matching approach, pedestrians can be detected.In addition to the second problem, we present three direct pedestrian detection methods that extend the HOG (Histogram of Oriented Gradient) techniques (Dalal and Triggs, 2005) and provide a comparative evaluation of these approaches. The three approaches include: a) a new histogram feature, that is formed by the weighted sum of both the gradient magnitude and the filter responses from a set of elongated Gaussian filters (Leung and Malik, 2001) corresponding to the quantised orientation, which we refer to as the Histogram of Oriented Gradient Banks (HOGB) approach; b) the codebook based HOG feature with branch-and-bound (efficient subwindow search) algorithm (Lampert et al., 2008) and; c) the codebook based HOGB approach.In the third problem, a unified framework that combines 3D and 2D background modelling is proposed to detect scene changes using a camera mounted on a moving vehicle. The 3D scene is first reconstructed from a set of videos taken at different times. The 3D background modelling identifies inconsistent scene structures as foreground objects. For the 2D approach, foreground objects are detected using the spatio-temporal MRF algorithm. Finally, the 3D and 2D results are combined using morphological operations.The significance of these research is that it provides basic frameworks for automatic large-scale mobile surveillance applications and facilitates many higher-level applications such as object tracking and behaviour analysis

    Survey of Object Detection Methods in Camouflaged Image

    Get PDF
    Camouflage is an attempt to conceal the signature of a target object into the background image. Camouflage detection methods or Decamouflaging method is basically used to detect foreground object hidden in the background image. In this research paper authors presented survey of camouflage detection methods for different applications and areas

    A comprehensive review of vehicle detection using computer vision

    Get PDF
    A crucial step in designing intelligent transport systems (ITS) is vehicle detection. The challenges of vehicle detection in urban roads arise because of camera position, background variations, occlusion, multiple foreground objects as well as vehicle pose. The current study provides a synopsis of state-of-the-art vehicle detection techniques, which are categorized according to motion and appearance-based techniques starting with frame differencing and background subtraction until feature extraction, a more complicated model in comparison. The advantages and disadvantages among the techniques are also highlighted with a conclusion as to the most accurate one for vehicle detection

    Hand Pointing Detection Using Live Histogram Template of Forehead Skin

    Full text link
    Hand pointing detection has multiple applications in many fields such as virtual reality and control devices in smart homes. In this paper, we proposed a novel approach to detect pointing vector in 2D space of a room. After background subtraction, face and forehead is detected. In the second step, forehead skin H-S plane histograms in HSV space is calculated. By using these histogram templates of users skin, and back projection method, skin areas are detected. The contours of hand are extracted using Freeman chain code algorithm. Next step is finding fingertips. Points in hand contour which are candidates for the fingertip can be found in convex defects of convex hull and contour. We introduced a novel method for finding the fingertip based on the special points on the contour and their relationships. Our approach detects hand-pointing vectors in live video from a common webcam with 94%TP and 85%TN.Comment: Accepted for oral presentation in DSP201

    Weakly Labeled Action Recognition and Detection

    Get PDF
    Research in human action recognition strives to develop increasingly generalized methods that are robust to intra-class variability and inter-class ambiguity. Recent years have seen tremendous strides in improving recognition accuracy on ever larger and complex benchmark datasets, comprising realistic actions in the wild videos. Unfortunately, the all-encompassing, dense, global representations that bring about such improvements often benefit from the inherent characteristics, specific to datasets and classes, that do not necessarily reflect knowledge about the entity to be recognized. This results in specific models that perform well within datasets but generalize poorly. Furthermore, training of supervised action recognition and detection methods need several precise spatio-temporal manual annotations to achieve good recognition and detection accuracy. For instance, current deep learning architectures require millions of accurately annotated videos to learn robust action classifiers. However, these annotations are quite difficult to achieve. In the first part of this dissertation, we explore the reasons for poor classifier performance when tested on novel datasets, and quantify the effect of scene backgrounds on action representations and recognition. We attempt to address the problem of recognizing human actions while training and testing on distinct datasets when test videos are neither labeled nor available during training. In this scenario, learning of a joint vocabulary, or domain transfer techniques are not applicable. We perform different types of partitioning of the GIST feature space for several datasets and compute measures of background scene complexity, as well as, for the extent to which scenes are helpful in action classification. We then propose a new process to obtain a measure of confidence in each pixel of the video being a foreground region using motion, appearance, and saliency together in a 3D-Markov Random Field (MRF) based framework. We also propose multiple ways to exploit the foreground confidence: to improve bag-of-words vocabulary, histogram representation of a video, and a novel histogram decomposition based representation and kernel. The above-mentioned work provides probability of each pixel being belonging to the actor, however, it does not give the precise spatio-temporal location of the actor. Furthermore, above framework would require precise spatio-temporal manual annotations to train an action detector. However, manual annotations in videos are laborious, require several annotators and contain human biases. Therefore, in the second part of this dissertation, we propose a weakly labeled approach to automatically obtain spatio-temporal annotations of actors in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high-quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized Maximum Clique Graph problem (GMCP) using shape, global and fine-grained similarity of proposals across the videos. The output of our method is the most action representative proposals from each video. Using our method can also annotate multiple instances of the same action in a video can also be annotated. Moreover, action detection experiments using annotations obtained by our method and several baselines demonstrate the superiority of our approach. The above-mentioned annotation method uses multiple videos of the same action. Therefore, in the third part of this dissertation, we tackle the problem of spatio-temporal action localization in a video, without assuming the availability of multiple videos or any prior annotations. The action is localized by employing images downloaded from the Internet using action label. Given web images, we first dampen image noise using random walk and evade distracting backgrounds within images using image action proposals. Then, given a video, we generate multiple spatio-temporal action proposals. We suppress camera and background generated proposals by exploiting optical flow gradients within proposals. To obtain the most action representative proposals, we propose to reconstruct action proposals in the video by leveraging the action proposals in images. Moreover, we preserve the temporal smoothness of the video and reconstruct all proposal bounding boxes jointly using the constraints that push the coefficients for each bounding box toward a common consensus, thus enforcing the coefficient similarity across multiple frames. We solve this optimization problem using the variant of two-metric projection algorithm. Finally, the video proposal that has the lowest reconstruction cost and is motion salient is used to localize the action. Our method is not only applicable to the trimmed videos, but it can also be used for action localization in untrimmed videos, which is a very challenging problem. Finally, in the third part of this dissertation, we propose a novel approach to generate a few properly ranked action proposals from a large number of noisy proposals. The proposed approach begins with dividing each proposal into sub-proposals. We assume that the quality of proposal remains the same within each sub-proposal. We, then employ a graph optimization method to recombine the sub-proposals in all action proposals in a single video in order to optimally build new action proposals and rank them by the combined node and edge scores. For an untrimmed video, we first divide the video into shots and then make the above-mentioned graph within each shot. Our method generates a few ranked proposals that can be better than all the existing underlying proposals. Our experimental results validated that the properly ranked action proposals can significantly boost action detection results. Our extensive experimental results on different challenging and realistic action datasets, comparisons with several competitive baselines and detailed analysis of each step of proposed methods validate the proposed ideas and frameworks

    Real-time Foreground Object Detection Combining the PBAS Background Modelling Algorithm and Feedback from Scene Analysis Module

    Get PDF
    The article presents a hardware implementation of the foreground object detection algorithm PBAS (Pixel-Based Adaptive Segmenter) with a scene analysis module. A mechanism for static object detection is proposed, which is based on consecutive frame differencing. The method allows to distinguish stopped foreground objects (e.g. a car at the intersection, abandoned luggage) from false detections (so-called ghosts) using edge similarity. The improved algorithm was compared with the original version on popular test sequences from the changedetection.net dataset. The obtained results indicate that the proposed approach allows to improve the performance of the method for sequences with the stopped objects. The algorithm has been implemented and successfully verified on a hardware platform with Virtex 7 FPGA device. The PBAS segmentation, consecutive frame differencing, Sobel edge detection and advanced one-pass connected component analysis modules were designed. The system is capable of processing 50 frames with a resolution of 720 × 576 pixels per second.
    corecore