11,166 research outputs found
TREC video retrieval evaluation: a case study and status report
The TREC Video Retrieval Evaluation is a multiyear, international effort, funded by the US Advanced Research and Development Agency (ARDA) and the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. Now beginning its fourth year, it aims over time to develop both a better understanding of
how systems can effectively accomplish such retrieval
and how one can reliably benchmark their performance. This paper can be seen as a case study in the development of video retrieval systems and their evaluation as well as a report on their status to-date. After an introduction to the evolution of the evaluation over the past three years, the paper reports on the most recent evaluation TRECVID 2003: the evaluation framework — the 4 tasks (shot boundary determination, high-level feature extraction, story segmentation and typing, search), 133 hours of US television
news data, and measures —, the results, and the approaches taken by the 24 participating groups
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
- …