956 research outputs found

    Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context

    Full text link
    We present an algorithm for finding temporally consistent occlusion boundaries in videos to support segmentation of dynamic scenes. We learn occlusion boundaries in a pairwise Markov random field (MRF) framework. We first estimate the probability of an spatio-temporal edge being an occlusion boundary by using appearance, flow, and geometric features. Next, we enforce occlusion boundary continuity in a MRF model by learning pairwise occlusion probabilities using a random forest. Then, we temporally smooth boundaries to remove temporal inconsistencies in occlusion boundary estimation. Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene. We have developed a dataset with fully annotated ground-truth occlusion boundaries of over 30 videos ($5000 frames). This dataset is used to evaluate temporal occlusion boundaries and provides a much needed baseline for future studies. We perform experiments to demonstrate the role of scene layout, and temporal information for occlusion reasoning in dynamic scenes.Comment: Applications of Computer Vision (WACV), 2015 IEEE Winter Conference o

    Long-range video motion estimation using point trajectories

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (leaves 97-104).This thesis describes a new approach to video motion estimation, in which motion is represented using a set of particles. Each particle is an image point sample with a long-duration trajectory and other properties. To optimize these particles, we measure point-based matching along the particle trajectories and distortion between the particles. The resulting motion representation is useful for a variety of applications and differs from optical flow, feature tracking, and parametric or layer-based models. We demonstrate the algorithm on challenging real-world videos that include complex scene geometry, multiple types of occlusion, regions with low texture, and non-rigid deformation.by Peter Sand.Ph.D

    A Fusion Approach for Multi-Frame Optical Flow Estimation

    Full text link
    To date, top-performing optical flow estimation methods only take pairs of consecutive frames into account. While elegant and appealing, the idea of using more than two frames has not yet produced state-of-the-art results. We present a simple, yet effective fusion approach for multi-frame optical flow that benefits from longer-term temporal cues. Our method first warps the optical flow from previous frames to the current, thereby yielding multiple plausible estimates. It then fuses the complementary information carried by these estimates into a new optical flow field. At the time of writing, our method ranks first among published results in the MPI Sintel and KITTI 2015 benchmarks. Our models will be available on https://github.com/NVlabs/PWC-Net.Comment: Work accepted at IEEE Winter Conference on Applications of Computer Vision (WACV 2019

    Tracking in the Context of Interaction

    Get PDF
    Detection, tracking and event analysis are areas of video analysis which have great importance in robotics applications and automated surveillance. Although they have been greatly studied individually, there has been little work on performing them jointly where they mutually influence and improve each other. In this thesis we present a novel approach for jointly estimating the track of a moving object and recognising the events in which it participates. The contributions are divided into three main chapters. In the first, we will introduce our geometric carried object detector which allows to detect a generic class of objects. This detector primarily uses geometric shape models instead of using pre-trained object class models and does not solely rely on protrusion regions. The second main chapter presents our spatial consistency tracker which incorporates events at a detection level within a tracklet building process. This tracker enforces spatial consistency between objects and other pre-tracked entities in the scene. Finally, in the third main chapter we present our joint tracking and event analysis framework posed as maximisation of a posterior probability defined over event sequences and temporally-disjoint subsets of tracklets. In this framework events are incorporated at a tracking level, where tracking and event analysis mutually influence and improve each other. We evaluate the aforementioned framework using three datasets. We compare our detector and spatial consistency tracker against a state-of-the-art detector by providing detection and tracking results. We evaluate the tracking performance of our joint tracking and event analysis framework using tracklets from two state of the art trackers, and additionally our own from our spatial consistency tracker; we demonstrate improved tracking performance in each case due to jointly incorporating events within the tracking process, while also subsequently improving event recognition

    Human shape modelling for carried object detection and segmentation

    Get PDF
    La détection des objets transportés est un des prérequis pour développer des systèmes qui cherchent à comprendre les activités impliquant des personnes et des objets. Cette thèse présente de nouvelles méthodes pour détecter et segmenter les objets transportés dans des vidéos de surveillance. Les contributions sont divisées en trois principaux chapitres. Dans le premier chapitre, nous introduisons notre détecteur d’objets transportés, qui nous permet de détecter un type générique d’objets. Nous formulons la détection d’objets transportés comme un problème de classification de contours. Nous classifions le contour des objets mobiles en deux classes : objets transportés et personnes. Un masque de probabilités est généré pour le contour d’une personne basé sur un ensemble d’exemplaires (ECE) de personnes qui marchent ou se tiennent debout de différents points de vue. Les contours qui ne correspondent pas au masque de probabilités généré sont considérés comme des candidats pour être des objets transportés. Ensuite, une région est assignée à chaque objet transporté en utilisant la Coupe Biaisée Normalisée (BNC) avec une probabilité obtenue par une fonction pondérée de son chevauchement avec l’hypothèse du masque de contours de la personne et du premier plan segmenté. Finalement, les objets transportés sont détectés en appliquant une Suppression des Non-Maxima (NMS) qui élimine les scores trop bas pour les objets candidats. Le deuxième chapitre de contribution présente une approche pour détecter des objets transportés avec une méthode innovatrice pour extraire des caractéristiques des régions d’avant-plan basée sur leurs contours locaux et l’information des super-pixels. Initiallement, un objet bougeant dans une séquence vidéo est segmente en super-pixels sous plusieurs échelles. Ensuite, les régions ressemblant à des personnes dans l’avant-plan sont identifiées en utilisant un ensemble de caractéristiques extraites de super-pixels dans un codebook de formes locales. Ici, les régions ressemblant à des humains sont équivalentes au masque de probabilités de la première méthode (ECE). Notre deuxième détecteur d’objets transportés bénéficie du nouveau descripteur de caractéristiques pour produire une carte de probabilité plus précise. Les compléments des super-pixels correspondants aux régions ressemblant à des personnes dans l’avant-plan sont considérés comme une carte de probabilité des objets transportés. Finalement, chaque groupe de super-pixels voisins avec une haute probabilité d’objets transportés et qui ont un fort support de bordure sont fusionnés pour former un objet transporté. Finalement, dans le troisième chapitre, nous présentons une méthode pour détecter et segmenter les objets transportés. La méthode proposée adopte le nouveau descripteur basé sur les super-pixels pour iii identifier les régions ressemblant à des objets transportés en utilisant la modélisation de la forme humaine. En utilisant l’information spatio-temporelle des régions candidates, la consistance des objets transportés récurrents, vus dans le temps, est obtenue et sert à détecter les objets transportés. Enfin, les régions d’objets transportés sont raffinées en intégrant de l’information sur leur apparence et leur position à travers le temps avec une extension spatio-temporelle de GrabCut. Cette étape finale sert à segmenter avec précision les objets transportés dans les séquences vidéo. Nos méthodes sont complètement automatiques, et font des suppositions minimales sur les personnes, les objets transportés, et les les séquences vidéo. Nous évaluons les méthodes décrites en utilisant deux ensembles de données, PETS 2006 et i-Lids AVSS. Nous évaluons notre détecteur et nos méthodes de segmentation en les comparant avec l’état de l’art. L’évaluation expérimentale sur les deux ensembles de données démontre que notre détecteur d’objets transportés et nos méthodes de segmentation surpassent de façon significative les algorithmes compétiteurs.Detecting carried objects is one of the requirements for developing systems that reason about activities involving people and objects. This thesis presents novel methods to detect and segment carried objects in surveillance videos. The contributions are divided into three main chapters. In the first, we introduce our carried object detector which allows to detect a generic class of objects. We formulate carried object detection in terms of a contour classification problem. We classify moving object contours into two classes: carried object and person. A probability mask for person’s contours is generated based on an ensemble of contour exemplars (ECE) of walking/standing humans in different viewing directions. Contours that are not falling in the generated hypothesis mask are considered as candidates for carried object contours. Then, a region is assigned to each carried object candidate contour using Biased Normalized Cut (BNC) with a probability obtained by a weighted function of its overlap with the person’s contour hypothesis mask and segmented foreground. Finally, carried objects are detected by applying a Non-Maximum Suppression (NMS) method which eliminates the low score carried object candidates. The second contribution presents an approach to detect carried objects with an innovative method for extracting features from foreground regions based on their local contours and superpixel information. Initially, a moving object in a video frame is segmented into multi-scale superpixels. Then human-like regions in the foreground area are identified by matching a set of extracted features from superpixels against a codebook of local shapes. Here the definition of human like regions is equivalent to a person’s probability map in our first proposed method (ECE). Our second carried object detector benefits from the novel feature descriptor to produce a more accurate probability map. Complement of the matching probabilities of superpixels to human-like regions in the foreground are considered as a carried object probability map. At the end, each group of neighboring superpixels with a high carried object probability which has strong edge support is merged to form a carried object. Finally, in the third contribution we present a method to detect and segment carried objects. The proposed method adopts the new superpixel-based descriptor to identify carried object-like candidate regions using human shape modeling. Using spatio-temporal information of the candidate regions, consistency of recurring carried object candidates viewed over time is obtained and serves to detect carried objects. Last, the detected carried object regions are refined by integrating information of their appearances and their locations over time with a spatio-temporal extension of GrabCut. This final stage is used to accurately segment carried objects in frames. Our methods are fully automatic, and make minimal assumptions about a person, carried objects and videos. We evaluate the aforementioned methods using two available datasets PETS 2006 and i-Lids AVSS. We compare our detector and segmentation methods against a state-of-the-art detector. Experimental evaluation on the two datasets demonstrates that both our carried object detection and segmentation methods significantly outperform competing algorithms
    • …
    corecore