5 research outputs found
Content-sensitive Supervoxels via uniform tessellations on video manifolds
Supervoxels are perceptually meaningful atomic regions
in videos, obtained by grouping voxels that exhibit coherence
in both appearance and motion. In this paper,
we propose content-sensitive supervoxels (CSS), which are
regularly-shaped 3D primitive volumes that possess the following
characteristic: they are typically larger and longer
in content-sparse regions (i.e., with homogeneous appearance
and motion), and smaller and shorter in content-dense
regions (i.e., with high variation of appearance and/or
motion). To compute CSS, we map a video to a 3-
dimensional manifold M embedded in R6, whose volume
elements give a good measure of the content density in .
We propose an efficient Lloyd-like method with a splittingmerging
scheme to compute a uniform tessellation on M,
which induces the CSS in . Theoretically our method has
a good competitive ratio O(1). We also present a simple
extension of CSS to stream CSS for processing long videos
that cannot be loaded into main memory at once. We evaluate
CSS, stream CSS and seven representative supervoxel
methods on four video datasets. The results show that our
method outperforms existing supervoxel methods
Feature-aware uniform tessellations on video manifold for content-sensitive supervoxels
Over-segmenting a video into supervoxels has strong potential to reduce the complexity of computer vision applications. Content-sensitive supervoxels (CSS) are typically smaller in content-dense regionsand larger in content-sparse regions. In this paper, we propose to compute feature-aware CSS (FCSS) that are regularly shaped 3D primitive volumes well aligned with local object/region/motion boundaries in video.To compute FCSS, we map a video to a 3-dimensional manifold, in which the volume elements of video manifold give a good measure of the video content density. Then any uniform tessellation on manifold can induce CSS. Our idea is that among all possible uniform tessellations, FCSS find one whose cell boundaries well align with local video boundaries. To achieve this goal, we propose a novel tessellation method that simultaneously minimizes the tessellation energy and maximizes the average boundary distance.Theoretically our method has an optimal competitive ratio O(1). We also present a simple extension of FCSS to streaming FCSS for processing long videos that cannot be loaded into main memory at once. We evaluate FCSS, streaming FCSS and ten representative supervoxel methods on four video datasets and two novel video applications. The results show that our method simultaneously achieves state-of-the-art performance with respect to various evaluation criteria
Evaluation on the compactness of supervoxels
Supervoxels are perceptually meaningful atomic spatiotemporal
regions in videos, which has great potential to
reduce the computational complexity of downstream video
applications. Many methods have been proposed for generating
supervoxels. To effectively evaluate these methods,
a novel supervoxel library and benchmark called LIBSVX
with seven collected metrics was recently established. In this
paper, we propose a new compactness metric which measures
the shape regularity of supervoxels and is served as a necessary
complement to the existing metrics. To demonstrate
its necessity, we first explore the relations between the new
metric and existing ones. Correlation analysis shows that the
new metric has a weak correlation with (i.e., nearly independent
of) existing metrics, and so reflects a new characteristic
of supervoxel quality. Second, we investigate two real-world
video applications. Experimental results show that the new
metric can effectively predict some important application
performance, while most existing metrics cannot do so