106 research outputs found

    Probabilistic modeling of texture transition for fast tracking and delineation

    Get PDF
    In this thesis a probabilistic approach to texture boundary detection for tracking applications is presented. We have developed a novel fast algorithm for Bayesian estimation of texture transition locations from a short sequence of pixels on a scanline that combines the desirable speed of edge-based line search and the sophistication of Bayesian texture analysis given a small set of observations. For the cases where the given observations are too few for reliable Bayesian estimation of probability of texture change we propose an innovative machine learning technique to generate a probabilistic texture transition model. This is achieved by considering a training dataset containing small patches of blending textures. By encompassing in the training set enough examples to accurately model texture transitions of interest we can construct a predictor that can be used for object boundary tracking that can deal with few observations and demanding cases of tracking of arbitrary textured objects against cluttered background. Object outlines are then obtained by combining the texture crossing probabilities across a set of scanlines. We show that a rigid geometric model of the object to be tracked or smoothness constraints in the absence of such a model can be used to coalesce the scanline texture crossing probabilities obtained using the methods mentioned above. We propose a Hidden Markov Model to aggregate robustly the sparse transition probabilities of scanlines sampled along the projected hypothesis model contour. As a result continuous object contours can be extracted using a posteriori maximization of texture transition probabilities. On the other hand, stronger geometric constraints such as available rigid models of the target are directly enforced by robust stochastic optimization. In addition to being fast, the allure of the proposed probabilistic framework is that it accommodates a unique infrastructure for tracking of heterogeneous objects which utilizes the machine learning-based predictor as well as the Bayesian estimator interchangeably in conjunction with robust optimization to extract object contours robustly. We apply the developed methods to tracking of textured and non textured rigid objects as well as deformable body outlines and monocular articulated human motion in challenging conditions. Finally, because it is fast, our method can also serve as an interactive texture segmentation tool

    Object Recognition and Modeling Using SIFT Features

    Get PDF
    In this paper we present a technique for object recognition and modelling based on local image features matching. Given a complete set of views of an object the goal of our technique is the recognition of the same object in an image of a cluttered environment containing the object and an estimate of its pose. The method is based on visual modeling of objects from a multi-view representation of the object to recognize. The first step consists of creating object model, selecting a subset of the available views using SIFT descriptors to evaluate image similarity and relevance. The selected views are then assumed as the model of the object and we show that they can effectively be used to visually represent the main aspects of the object. Recognition is done making comparison between the image containing an object in generic position and the views selected as object models. Once an object has been recognized the pose can be estimated searching the complete set of views of the object. Experimental results are very encouraging using both a private dataset we acquired in our lab and a publicly available dataset

    Video Object Recognition and Modeling by SIFT Matching Optimization

    Get PDF
    In this paper we present a novel technique for object modeling and object recognition in video. Given a set of videos containing 360 degrees views of objects we compute a model for each object, then we analyze short videos to determine if the object depicted in the video is one of the modeled objects. The object model is built from a video spanning a 360 degree view of the object taken against a uniform background. In order to create the object model, the proposed techniques selects a few representative frames from each video and local features of such frames. The object recognition is performed selecting a few frames from the query video, extracting local features from each frame and looking for matches in all the representative frames constituting the models of all the objects. If the number of matches exceed a fixed threshold the corresponding object is considered the recognized objects .To evaluate our approach we acquired a dataset of 25 videos representing 25 different objects and used these videos to build the objects model. Then we took 25 test videos containing only one of the known objects and 5 videos containing only unknown objects. Experiments showed that, despite a significant compression in the model, recognition results are satisfactory

    Adversarially Tuned Scene Generation

    Full text link
    Generalization performance of trained computer vision systems that use computer graphics (CG) generated data is not yet effective due to the concept of 'domain-shift' between virtual and real data. Although simulated data augmented with a few real world samples has been shown to mitigate domain shift and improve transferability of trained models, guiding or bootstrapping the virtual data generation with the distributions learnt from target real world domain is desired, especially in the fields where annotating even few real images is laborious (such as semantic labeling, and intrinsic images etc.). In order to address this problem in an unsupervised manner, our work combines recent advances in CG (which aims to generate stochastic scene layouts coupled with large collections of 3D object models) and generative adversarial training (which aims train generative models by measuring discrepancy between generated and real data in terms of their separability in the space of a deep discriminatively-trained classifier). Our method uses iterative estimation of the posterior density of prior distributions for a generative graphical model. This is done within a rejection sampling framework. Initially, we assume uniform distributions as priors on the parameters of a scene described by a generative graphical model. As iterations proceed the prior distributions get updated to distributions that are closer to the (unknown) distributions of target data. We demonstrate the utility of adversarially tuned scene generation on two real-world benchmark datasets (CityScapes and CamVid) for traffic scene semantic labeling with a deep convolutional net (DeepLab). We realized performance improvements by 2.28 and 3.14 points (using the IoU metric) between the DeepLab models trained on simulated sets prepared from the scene generation models before and after tuning to CityScapes and CamVid respectively.Comment: 9 pages, accepted at CVPR 201

    Unsupervised video indexing on audiovisual characterization of persons

    Get PDF
    Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker

    Integrating Grammar and Segmentation for Human Pose Estimation

    Full text link
    • …
    corecore