18,421 research outputs found
Unsupervised Action Proposal Ranking through Proposal Recombination
Recently, action proposal methods have played an important role in action
recognition tasks, as they reduce the search space dramatically. Most
unsupervised action proposal methods tend to generate hundreds of action
proposals which include many noisy, inconsistent, and unranked action
proposals, while supervised action proposal methods take advantage of
predefined object detectors (e.g., human detector) to refine and score the
action proposals, but they require thousands of manual annotations to train.
Given the action proposals in a video, the goal of the proposed work is to
generate a few better action proposals that are ranked properly. In our
approach, we first divide action proposal into sub-proposal and then use
Dynamic Programming based graph optimization scheme to select the optimal
combinations of sub-proposals from different proposals and assign each new
proposal a score. We propose a new unsupervised image-based actioness detector
that leverages web images and employs it as one of the node scores in our graph
formulation. Moreover, we capture motion information by estimating the number
of motion contours within each action proposal patch. The proposed method is an
unsupervised method that neither needs bounding box annotations nor video level
labels, which is desirable with the current explosion of large-scale action
datasets. Our approach is generic and does not depend on a specific action
proposal method. We evaluate our approach on several publicly available trimmed
and un-trimmed datasets and obtain better performance compared to several
proposal ranking methods. In addition, we demonstrate that properly ranked
proposals produce significantly better action detection as compared to
state-of-the-art proposal based methods
Temporally coherent 4D reconstruction of complex dynamic scenes
This paper presents an approach for reconstruction of 4D temporally coherent
models of complex dynamic scenes. No prior knowledge is required of scene
structure or camera calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated with joint
multi-view segmentation and reconstruction to obtain a complete 4D
representation of static and dynamic objects. Temporal coherence is exploited
to overcome visual ambiguities resulting in improved reconstruction of complex
scenes. Robust joint segmentation and reconstruction of dynamic objects is
achieved by introducing a geodesic star convexity constraint. Comparative
evaluation is performed on a variety of unstructured indoor and outdoor dynamic
scenes with hand-held cameras and multiple people. This demonstrates
reconstruction of complete temporally coherent 4D scene models with improved
nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2016 . Video available at:
https://www.youtube.com/watch?v=bm_P13_-Ds
A spatially distributed model for foreground segmentation
Foreground segmentation is a fundamental first processing stage for vision systems which monitor real-world activity. In this paper we consider the problem of achieving robust segmentation in scenes where the appearance of the background varies unpredictably over time. Variations may be caused by processes such as moving water, or foliage moved by wind, and typically degrade the performance of standard per-pixel background models.
Our proposed approach addresses this problem by modeling homogeneous regions of scene pixels as an adaptive mixture of Gaussians in color and space. Model components are used to represent both the scene background and moving foreground objects. Newly observed pixel values are probabilistically classified, such that the spatial variance of the model components supports correct classification even when the background appearance is significantly distorted. We evaluate our method over several challenging video sequences, and compare our results with both per-pixel and Markov Random Field based models. Our results show the effectiveness of our approach in reducing incorrect classifications
- …