88 research outputs found

    Visual Object Tracking: The Initialisation Problem

    Get PDF
    Model initialisation is an important component of object tracking. Tracking algorithms are generally provided with the first frame of a sequence and a bounding box (BB) indicating the location of the object. This BB may contain a large number of background pixels in addition to the object and can lead to parts-based tracking algorithms initialising their object models in background regions of the BB. In this paper, we tackle this as a missing labels problem, marking pixels sufficiently away from the BB as belonging to the background and learning the labels of the unknown pixels. Three techniques, One-Class SVM (OC-SVM), Sampled-Based Background Model (SBBM) (a novel background model based on pixel samples), and Learning Based Digital Matting (LBDM), are adapted to the problem. These are evaluated with leave-one-video-out cross-validation on the VOT2016 tracking benchmark. Our evaluation shows both OC-SVMs and SBBM are capable of providing a good level of segmentation accuracy but are too parameter-dependent to be used in real-world scenarios. We show that LBDM achieves significantly increased performance with parameters selected by cross validation and we show that it is robust to parameter variation.Comment: 15th Conference on Computer and Robot Vision (CRV 2018). Source code available at https://github.com/georgedeath/initialisation-proble

    Segmentation by transduction

    Get PDF
    International audienceThis paper addresses the problem of segmenting an image into regions consistent with user-supplied seeds (e.g., a sparse set of broad brush strokes). We view this task as a statistical transductive inference, in which some pixels are already associated with given zones and the remaining ones need to be classified. Our method relies on the Laplacian graph regularizer, a powerful manifold learning tool that is based on the estimation of variants of the Laplace-Beltrami operator and is tightly related to diffusion processes. Segmentation is modeled as the task of finding matting coefficients for unclassified pixels given known matting coefficients for seed pixels. The proposed algorithm essentially relies on a high margin assumption in the space of pixel characteristics. It is simple, fast, and accurate, as demonstrated by qualitative results on natural images and a quantitative comparison with state-of-the-art methods on the Microsoft GrabCut segmentation database

    Object Tracking in Video with Part-Based Tracking by Feature Sampling

    Get PDF
    Visual tracking of arbitrary objects is an active research topic in computer vision, with applications across multiple disciplines including video surveillance, activity analysis, robot vision, and human computer interface. Despite great progress having been made in object tracking in recent years, it still remains a challenge to design trackers that can deal with difficult tracking scenarios, such as camera motion, object motion change, occlusion, illumination changes, and object deformation. A promising way of tackling these types of problems is to use a part-based method; one which models and tracks small regions of the object and estimates the location of the object based on the tracked part's positions. These approaches typically model parts of objects with histograms of various hand-crafted features extracted from the region in which the part is located. However, it is unclear how such relatively homogeneous regions should be represented to form an effective part-based tracker. In this thesis we present a part-based tracker that includes a model for object parts that is designed to empirically characterise the underlying colour distribution of an image region, representing it by pairs of randomly selected colour features and counts of how many pixels are similar to each feature. This novel feature representation is used to find probable locations for the part in future frames via a Bhattacharyya Distance-based metric, which is modified to prefer higher quality matches. Sets of candidate patch locations are generated by randomly generating non-shearing affine transformations of the part's previous locations and locally optimising the most likely sets of parts to allow for small intra-frame object deformations. We also present a study of model initialisation in online, model-free tracking and evaluate several techniques for selecting the regions of an image, given a target bounding box most likely to contain an object. The strengths and limitations of the combined tracker are evaluated on the VOT2016 and VOT2018 datasets using their evaluation protocol, which also allows an extensive evaluation of parameter robustness. The presented tracker is ranked first among part-based trackers on the VOT2018 dataset and is particularly robust to changes in object and camera motion, as well as object size changes

    FactorMatte: Redefining Video Matting for Re-Composition Tasks

    Full text link
    We propose "factor matting", an alternative formulation of the video matting problem in terms of counterfactual video synthesis that is better suited for re-composition tasks. The goal of factor matting is to separate the contents of video into independent components, each visualizing a counterfactual version of the scene where contents of other components have been removed. We show that factor matting maps well to a more general Bayesian framing of the matting problem that accounts for complex conditional interactions between layers. Based on this observation, we present a method for solving the factor matting problem that produces useful decompositions even for video with complex cross-layer interactions like splashes, shadows, and reflections. Our method is trained per-video and requires neither pre-training on external large datasets, nor knowledge about the 3D structure of the scene. We conduct extensive experiments, and show that our method not only can disentangle scenes with complex interactions, but also outperforms top methods on existing tasks such as classical video matting and background subtraction. In addition, we demonstrate the benefits of our approach on a range of downstream tasks. Please refer to our project webpage for more details: https://factormatte.github.ioComment: Project webpage: https://factormatte.github.i

    Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

    Full text link
    We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms.Comment: 14 page

    Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy

    Get PDF
    Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are: (1) the generation of high quality images, (2) diversity of image generation, and (3) stable training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state of the art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress towards addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress towards important computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Code related to GAN-variants studied in this work is summarized on https://github.com/sheqi/GAN_Review.Comment: Accepted by ACM Computing Surveys, 23 November 202

    ON NEURAL ARCHITECTURES FOR SEGMENTATION IN NATURAL AND MEDICAL IMAGES

    Get PDF
    Segmentation is an important research field in computer vision. It requires recognizing and segmenting the objects at the pixel level. In the past decade, many deep neural networks have been proposed, which have been central to the development in this area. These frameworks have demonstrated human-level or beyond performance on many challenging benchmarks, and have been widely used in many real-life applications, including surveillance, autonomous driving, and medical image analysis. However, it is non-trivial to design neural architectures with both efficiency and effectiveness, especially when they need to be tailored to the target tasks and datasets. In this dissertation, I will present our research works in this area from the following aspects. (i) To enable automatic neural architecture design on the costly 3D medical image segmentation, we propose an efficient and effective neural architecture search algorithm that tackles the problem in a coarse-to-fine manner. (ii) To further take advantage of the neural architecture search, we propose to search for a channel-level replacement for 3D networks, which leads to strong alternatives to 3D networks. (iii) To perform segmentation with great detail, we design a coarse-to-fine segmentation framework for matting-level segmentation; (iv) To provide stronger features for segmentation, we propose a stronger transformer-based backbone that can work on dense tasks. (v) To better resolve the panoptic segmentation problem in an end-to-end manner, we propose to combine transformers with the traditional clustering algorithm, which leads to a more intuitive segmentation framework with better performance

    Realistic reconstruction and rendering of detailed 3D scenarios from multiple data sources

    Get PDF
    During the last years, we have witnessed significant improvements in digital terrain modeling, mainly through photogrammetric techniques based on satellite and aerial photography, as well as laser scanning. These techniques allow the creation of Digital Elevation Models (DEM) and Digital Surface Models (DSM) that can be streamed over the network and explored through virtual globe applications like Google Earth or NASA WorldWind. The resolution of these 3D scenes has improved noticeably in the last years, reaching in some urban areas resolutions up to 1m or less for DEM and buildings, and less than 10 cm per pixel in the associated aerial imagery. However, in rural, forest or mountainous areas, the typical resolution for elevation datasets ranges between 5 and 30 meters, and typical resolution of corresponding aerial photographs ranges between 25 cm to 1 m. This current level of detail is only sufficient for aerial points of view, but as the viewpoint approaches the surface the terrain loses its realistic appearance. One approach to augment the detail on top of currently available datasets is adding synthetic details in a plausible manner, i.e. including elements that match the features perceived in the aerial view. By combining the real dataset with the instancing of models on the terrain and other procedural detail techniques, the effective resolution can potentially become arbitrary. There are several applications that do not need an exact reproduction of the real elements but would greatly benefit from plausibly enhanced terrain models: videogames and entertainment applications, visual impact assessment (e.g. how a new ski resort would look), virtual tourism, simulations, etc. In this thesis we propose new methods and tools to help the reconstruction and synthesis of high-resolution terrain scenes from currently available data sources, in order to achieve realistically looking ground-level views. In particular, we decided to focus on rural scenarios, mountains and forest areas. Our main goal is the combination of plausible synthetic elements and procedural detail with publicly available real data to create detailed 3D scenes from existing locations. Our research has focused on the following contributions: - An efficient pipeline for aerial imagery segmentation - Plausible terrain enhancement from high-resolution examples - Super-resolution of DEM by transferring details from the aerial photograph - Synthesis of arbitrary tree picture variations from a reduced set of photographs - Reconstruction of 3D tree models from a single image - A compact and efficient tree representation for real-time rendering of forest landscapesDurant els darrers anys, hem presenciat avenços significatius en el modelat digital de terrenys, principalment gràcies a tècniques fotogramètriques, basades en fotografia aèria o satèl·lit, i a escàners làser. Aquestes tècniques permeten crear Models Digitals d'Elevacions (DEM) i Models Digitals de Superfícies (DSM) que es poden retransmetre per la xarxa i ser explorats mitjançant aplicacions de globus virtuals com ara Google Earth o NASA WorldWind. La resolució d'aquestes escenes 3D ha millorat considerablement durant els darrers anys, arribant a algunes àrees urbanes a resolucions d'un metre o menys per al DEM i edificis, i fins a menys de 10 cm per píxel a les fotografies aèries associades. No obstant, en entorns rurals, boscos i zones muntanyoses, la resolució típica per a dades d'elevació es troba entre 5 i 30 metres, i per a les corresponents fotografies aèries varia entre 25 cm i 1m. Aquest nivell de detall només és suficient per a punts de vista aeris, però a mesura que ens apropem a la superfície el terreny perd tot el realisme. Una manera d'augmentar el detall dels conjunts de dades actuals és afegint a l'escena detalls sintètics de manera plausible, és a dir, incloure elements que encaixin amb les característiques que es perceben a la vista aèria. Així, combinant les dades reals amb instàncies de models sobre el terreny i altres tècniques de detall procedural, la resolució efectiva del model pot arribar a ser arbitrària. Hi ha diverses aplicacions per a les quals no cal una reproducció exacta dels elements reals, però que es beneficiarien de models de terreny augmentats de manera plausible: videojocs i aplicacions d'entreteniment, avaluació de l'impacte visual (per exemple, com es veuria una nova estació d'esquí), turisme virtual, simulacions, etc. En aquesta tesi, proposem nous mètodes i eines per ajudar a la reconstrucció i síntesi de terrenys en alta resolució partint de conjunts de dades disponibles públicament, per tal d'aconseguir vistes a nivell de terra realistes. En particular, hem decidit centrar-nos en escenes rurals, muntanyes i àrees boscoses. El nostre principal objectiu és la combinació d'elements sintètics plausibles i detall procedural amb dades reals disponibles públicament per tal de generar escenes 3D d'ubicacions existents. La nostra recerca s'ha centrat en les següents contribucions: - Un pipeline eficient per a segmentació d'imatges aèries - Millora plausible de models de terreny a partir d'exemples d’alta resolució - Super-resolució de models d'elevacions transferint-hi detalls de la fotografia aèria - Síntesis d'un nombre arbitrari de variacions d’imatges d’arbres a partir d'un conjunt reduït de fotografies - Reconstrucció de models 3D d'arbres a partir d'una única fotografia - Una representació compacta i eficient d'arbres per a navegació en temps real d'escenesPostprint (published version
    corecore