529 research outputs found
Co-interest Person Detection from Multiple Wearable Camera Videos
Wearable cameras, such as Google Glass and Go Pro, enable video data
collection over larger areas and from different views. In this paper, we tackle
a new problem of locating the co-interest person (CIP), i.e., the one who draws
attention from most camera wearers, from temporally synchronized videos taken
by multiple wearable cameras. Our basic idea is to exploit the motion patterns
of people and use them to correlate the persons across different videos,
instead of performing appearance-based matching as in traditional video
co-segmentation/localization. This way, we can identify CIP even if a group of
people with similar appearance are present in the view. More specifically, we
detect a set of persons on each frame as the candidates of the CIP and then
build a Conditional Random Field (CRF) model to select the one with consistent
motion patterns in different videos and high spacial-temporal consistency in
each video. We collect three sets of wearable-camera videos for testing the
proposed algorithm. All the involved people have similar appearances in the
collected videos and the experiments demonstrate the effectiveness of the
proposed algorithm.Comment: ICCV 201
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
A Survey on Deep Learning Technique for Video Segmentation
Video segmentation -- partitioning video frames into multiple segments or
objects -- plays a critical role in a broad range of practical applications,
from enhancing visual effects in movie, to understanding scenes in autonomous
driving, to creating virtual background in video conferencing. Recently, with
the renaissance of connectionism in computer vision, there has been an influx
of deep learning based approaches for video segmentation that have delivered
compelling performance. In this survey, we comprehensively review two basic
lines of research -- generic object segmentation (of unknown categories) in
videos, and video semantic segmentation -- by introducing their respective task
settings, background concepts, perceived need, development history, and main
challenges. We also offer a detailed overview of representative literature on
both methods and datasets. We further benchmark the reviewed methods on several
well-known datasets. Finally, we point out open issues in this field, and
suggest opportunities for further research. We also provide a public website to
continuously track developments in this fast advancing field:
https://github.com/tfzhou/VS-Survey.Comment: Accepted by TPAMI. Website: https://github.com/tfzhou/VS-Surve
Higher-order Losses and Optimization for Low-level and Deep Segmentation
Regularized objectives are common in low-level and deep segmentation. Regularization incorporates prior knowledge into objectives or losses. It represents constraints necessary to address ill-posedness, data noise, outliers, lack of supervision, etc. However, such constraints come at significant costs. First, regularization priors may lead to unintended biases, known or unknown. Since these can adversely affect specific applications, it is important to understand the causes & effects of these biases and to develop their solutions. Second, common regularized objectives are highly non-convex and present challenges for optimization. As known in low-level vision, first-order approaches like gradient descent are significantly weaker than more advanced algorithms. Yet, variants of the gradient descent dominate optimization of the loss functions for deep neural networks due to their size and complexity. Hence, standard segmentation networks still require an overwhelming amount of precise pixel-level supervision for training.
This thesis addresses three related problems concerning higher-order objectives and higher-order optimizers. First, we focus on a challenging application—unsupervised vascular tree extraction in large 3D volumes containing complex ``entanglements" of near-capillary vessels. In the context of vasculature with unrestricted topology, we propose a new general curvature-regularizing model for arbitrarily complex one-dimensional curvilinear structures. In contrast, the standard surface regularization methods are impractical for thin vessels due to strong shrinking bias or the complexity of Gaussian/min curvature modeling for two-dimensional manifolds. In general, the shrinking bias is one well-known example of bias in the standard regularization methods. The second contribution of this thesis is a characterization of other new forms of biases in classical segmentation models that were not understood in the past. We develop new theories establishing data density biases in common pair-wise or graph-based clustering objectives, such as kernel K-means and normalized cut. This theoretical understanding inspires our new segmentation algorithms avoiding such biases. The third contribution of the thesis is a new optimization algorithm addressing the limitations of gradient descent in the context of regularized losses for deep learning. Our general trust-region algorithm can be seen as a high-order chain rule for network training. It can use many standard low-level regularizers and their powerful solvers. We improve the state-of-the-art in weakly-supervised semantic segmentation using a well-motivated low-level regularization model and its graph-cut solver
- …