38,278 research outputs found
InstMove: Instance Motion for Object-centric Video Segmentation
Despite significant efforts, cutting-edge video segmentation methods still
remain sensitive to occlusion and rapid movement, due to their reliance on the
appearance of objects in the form of object embeddings, which are vulnerable to
these disturbances. A common solution is to use optical flow to provide motion
information, but essentially it only considers pixel-level motion, which still
relies on appearance similarity and hence is often inaccurate under occlusion
and fast movement. In this work, we study the instance-level motion and present
InstMove, which stands for Instance Motion for Object-centric Video
Segmentation. In comparison to pixel-wise motion, InstMove mainly relies on
instance-level motion information that is free from image feature embeddings,
and features physical interpretations, making it more accurate and robust
toward occlusion and fast-moving objects. To better fit in with the video
segmentation tasks, InstMove uses instance masks to model the physical presence
of an object and learns the dynamic model through a memory network to predict
its position and shape in the next frame. With only a few lines of code,
InstMove can be integrated into current SOTA methods for three different video
segmentation tasks and boost their performance. Specifically, we improve the
previous arts by 1.5 AP on OVIS dataset, which features heavy occlusions, and
4.9 AP on YouTubeVIS-Long dataset, which mainly contains fast-moving objects.
These results suggest that instance-level motion is robust and accurate, and
hence serving as a powerful solution in complex scenarios for object-centric
video segmentation.Comment: Accepted to CVPR 202
Lucid Data Dreaming for Video Object Segmentation
Convolutional networks reach top quality in pixel-level video object
segmentation but require a large amount of training data (1k~100k) to deliver
such results. We propose a new training strategy which achieves
state-of-the-art results across three evaluation datasets while using 20x~1000x
less annotated data than competing methods. Our approach is suitable for both
single and multiple object segmentation. Instead of using large training sets
hoping to generalize across domains, we generate in-domain training data using
the provided annotation on the first frame of each video to synthesize ("lucid
dream") plausible future video frames. In-domain per-video training data allows
us to train high quality appearance- and motion-based models, as well as tune
the post-processing stage. This approach allows to reach competitive results
even when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the video object segmentation task a smaller
training set that is closer to the target domain is more effective. This
changes the mindset regarding how many training samples and general
"objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV
Temporally coherent 4D reconstruction of complex dynamic scenes
This paper presents an approach for reconstruction of 4D temporally coherent
models of complex dynamic scenes. No prior knowledge is required of scene
structure or camera calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated with joint
multi-view segmentation and reconstruction to obtain a complete 4D
representation of static and dynamic objects. Temporal coherence is exploited
to overcome visual ambiguities resulting in improved reconstruction of complex
scenes. Robust joint segmentation and reconstruction of dynamic objects is
achieved by introducing a geodesic star convexity constraint. Comparative
evaluation is performed on a variety of unstructured indoor and outdoor dynamic
scenes with hand-held cameras and multiple people. This demonstrates
reconstruction of complete temporally coherent 4D scene models with improved
nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2016 . Video available at:
https://www.youtube.com/watch?v=bm_P13_-Ds
General Dynamic Scene Reconstruction from Multiple View Video
This paper introduces a general approach to dynamic scene reconstruction from
multiple moving cameras without prior knowledge or limiting constraints on the
scene structure, appearance, or illumination. Existing techniques for dynamic
scene reconstruction from multiple wide-baseline camera views primarily focus
on accurate reconstruction in controlled environments, where the cameras are
fixed and calibrated and background is known. These approaches are not robust
for general dynamic scenes captured with sparse moving cameras. Previous
approaches for outdoor dynamic scene reconstruction assume prior knowledge of
the static background appearance and structure. The primary contributions of
this paper are twofold: an automatic method for initial coarse dynamic scene
segmentation and reconstruction without prior knowledge of background
appearance or structure; and a general robust approach for joint segmentation
refinement and dense reconstruction of dynamic scenes from multiple
wide-baseline static or moving cameras. Evaluation is performed on a variety of
indoor and outdoor scenes with cluttered backgrounds and multiple dynamic
non-rigid objects such as people. Comparison with state-of-the-art approaches
demonstrates improved accuracy in both multiple view segmentation and dense
reconstruction. The proposed approach also eliminates the requirement for prior
knowledge of scene structure and appearance
- …