38,284 research outputs found
Learning to Segment Moving Objects in Videos
We segment moving objects in videos by ranking spatio-temporal segment
proposals according to "moving objectness": how likely they are to contain a
moving object. In each video frame, we compute segment proposals using multiple
figure-ground segmentations on per frame motion boundaries. We rank them with a
Moving Objectness Detector trained on image and motion fields to detect moving
objects and discard over/under segmentations or background parts of the scene.
We extend the top ranked segments into spatio-temporal tubes using random
walkers on motion affinities of dense point trajectories. Our final tube
ranking consistently outperforms previous segmentation methods in the two
largest video segmentation benchmarks currently available, for any number of
proposals. Further, our per frame moving object proposals increase the
detection rate up to 7\% over previous state-of-the-art static proposal
methods
Recommended from our members
Motion Segmentation - Segmentation of Independently Moving Objects in Video
The ability to recognize motion is one of the most important functions of our visual system. Motion allows us both to recognize objects and to get a better understanding of the 3D world in which we are moving. Because of its importance, motion is used to answer a wide variety of fundamental questions in computer vision such as: (1) Which objects are moving independently in the world? (2) Which objects are close and which objects are far away? (3) How is the camera moving?My work addresses the problem of moving object segmentation in unconstrained videos. I developed a probabilistic approach to segment independently moving objects in a video sequence, connecting aspects of camera motion estimation, relative depth and flow statistics. My work consists of three major parts: Modeling motion using a simple (rigid) motion model strictly following the principles of perspective projection and segmenting the video into its different motion components by assigning each pixel to its most likely motion model in a Bayesian fashion. Combining piecewise rigid motions to more complex, deformable and articulated objects, guided by learned semantic object segmentations. Learning highly variable motion patterns using a neural network trained on synthetic (unlimited) training data. Training data is automatically generated strictly following the principles of perspective projection. In this way well-known geometric constraints are precisely characterized during training to learn the principles of motion segmentation rather than identifying well-known structures that are likely to move.
This work shows that a careful analysis of the motion field not only leads to a consistent segmentation of moving objects in a video sequence, but also helps us understand the scene geometry of the world we are moving in
Learning Video Object Segmentation with Visual Memory
This paper addresses the task of segmenting moving objects in unconstrained
videos. We introduce a novel two-stream neural network with an explicit memory
module to achieve this. The two streams of the network encode spatial and
temporal features in a video sequence respectively, while the memory module
captures the evolution of objects over time. The module to build a "visual
memory" in video, i.e., a joint representation of all the video frames, is
realized with a convolutional recurrent unit learned from a small number of
training video sequences. Given a video frame as input, our approach assigns
each pixel an object or background label based on the learned spatio-temporal
features as well as the "visual memory" specific to the video, acquired
automatically without any manually-annotated frames. The visual memory is
implemented with convolutional gated recurrent units, which allows to propagate
spatial information over time. We evaluate our method extensively on two
benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show
state-of-the-art results. For example, our approach outperforms the top method
on the DAVIS dataset by nearly 6%. We also provide an extensive ablative
analysis to investigate the influence of each component in the proposed
framework
Learning Features by Watching Objects Move
This paper presents a novel yet intuitive approach to unsupervised feature
learning. Inspired by the human visual system, we explore whether low-level
motion-based grouping cues can be used to learn an effective visual
representation. Specifically, we use unsupervised motion-based segmentation on
videos to obtain segments, which we use as 'pseudo ground truth' to train a
convolutional network to segment objects from a single frame. Given the
extensive evidence that motion plays a key role in the development of the human
visual system, we hope that this straightforward approach to unsupervised
learning will be more effective than cleverly designed 'pretext' tasks studied
in the literature. Indeed, our extensive experiments show that this is the
case. When used for transfer learning on object detection, our representation
significantly outperforms previous unsupervised approaches across multiple
settings, especially when training data for the target task is scarce.Comment: CVPR 201
- …