529 research outputs found

    Co-interest Person Detection from Multiple Wearable Camera Videos

    Full text link
    Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.Comment: ICCV 201

    Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

    Full text link
    Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition (CVPR), 201

    A Survey on Deep Learning Technique for Video Segmentation

    Full text link
    Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research -- generic object segmentation (of unknown categories) in videos, and video semantic segmentation -- by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out open issues in this field, and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/tfzhou/VS-Survey.Comment: Accepted by TPAMI. Website: https://github.com/tfzhou/VS-Surve

    Higher-order Losses and Optimization for Low-level and Deep Segmentation

    Get PDF
    Regularized objectives are common in low-level and deep segmentation. Regularization incorporates prior knowledge into objectives or losses. It represents constraints necessary to address ill-posedness, data noise, outliers, lack of supervision, etc. However, such constraints come at significant costs. First, regularization priors may lead to unintended biases, known or unknown. Since these can adversely affect specific applications, it is important to understand the causes & effects of these biases and to develop their solutions. Second, common regularized objectives are highly non-convex and present challenges for optimization. As known in low-level vision, first-order approaches like gradient descent are significantly weaker than more advanced algorithms. Yet, variants of the gradient descent dominate optimization of the loss functions for deep neural networks due to their size and complexity. Hence, standard segmentation networks still require an overwhelming amount of precise pixel-level supervision for training. This thesis addresses three related problems concerning higher-order objectives and higher-order optimizers. First, we focus on a challenging application—unsupervised vascular tree extraction in large 3D volumes containing complex ``entanglements" of near-capillary vessels. In the context of vasculature with unrestricted topology, we propose a new general curvature-regularizing model for arbitrarily complex one-dimensional curvilinear structures. In contrast, the standard surface regularization methods are impractical for thin vessels due to strong shrinking bias or the complexity of Gaussian/min curvature modeling for two-dimensional manifolds. In general, the shrinking bias is one well-known example of bias in the standard regularization methods. The second contribution of this thesis is a characterization of other new forms of biases in classical segmentation models that were not understood in the past. We develop new theories establishing data density biases in common pair-wise or graph-based clustering objectives, such as kernel K-means and normalized cut. This theoretical understanding inspires our new segmentation algorithms avoiding such biases. The third contribution of the thesis is a new optimization algorithm addressing the limitations of gradient descent in the context of regularized losses for deep learning. Our general trust-region algorithm can be seen as a high-order chain rule for network training. It can use many standard low-level regularizers and their powerful solvers. We improve the state-of-the-art in weakly-supervised semantic segmentation using a well-motivated low-level regularization model and its graph-cut solver

    Advanced deep learning for medical image segmentation:Towards global and data-efficient learning

    Get PDF
    • …
    corecore