80,437 research outputs found
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
Optical Flow in Mostly Rigid Scenes
The optical flow of natural scenes is a combination of the motion of the
observer and the independent motion of objects. Existing algorithms typically
focus on either recovering motion and structure under the assumption of a
purely static world or optical flow for general unconstrained scenes. We
combine these approaches in an optical flow algorithm that estimates an
explicit segmentation of moving objects from appearance and physical
constraints. In static regions we take advantage of strong constraints to
jointly estimate the camera motion and the 3D structure of the scene over
multiple frames. This allows us to also regularize the structure instead of the
motion. Our formulation uses a Plane+Parallax framework, which works even under
small baselines, and reduces the motion estimation to a one-dimensional search
problem, resulting in more accurate estimation. In moving regions the flow is
treated as unconstrained, and computed with an existing optical flow method.
The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art
results on both the MPI-Sintel and KITTI-2015 benchmarks.Comment: 15 pages, 10 figures; accepted for publication at CVPR 201
Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
We address the unsupervised learning of several interconnected problems in
low-level vision: single view depth prediction, camera motion estimation,
optical flow, and segmentation of a video into the static scene and moving
regions. Our key insight is that these four fundamental vision problems are
coupled through geometric constraints. Consequently, learning to solve them
together simplifies the problem because the solutions can reinforce each other.
We go beyond previous work by exploiting geometry more explicitly and
segmenting the scene into static and moving regions. To that end, we introduce
Competitive Collaboration, a framework that facilitates the coordinated
training of multiple specialized neural networks to solve complex problems.
Competitive Collaboration works much like expectation-maximization, but with
neural networks that act as both competitors to explain pixels that correspond
to static or moving regions, and as collaborators through a moderator that
assigns pixels to be either static or independently moving. Our novel method
integrates all these problems in a common framework and simultaneously reasons
about the segmentation of the scene into moving objects and the static
background, the camera motion, depth of the static scene structure, and the
optical flow of moving objects. Our model is trained without any supervision
and achieves state-of-the-art performance among joint unsupervised methods on
all sub-problems.Comment: CVPR 201
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
Vision and Learning for Deliberative Monocular Cluttered Flight
Cameras provide a rich source of information while being passive, cheap and
lightweight for small and medium Unmanned Aerial Vehicles (UAVs). In this work
we present the first implementation of receding horizon control, which is
widely used in ground vehicles, with monocular vision as the only sensing mode
for autonomous UAV flight in dense clutter. We make it feasible on UAVs via a
number of contributions: novel coupling of perception and control via relevant
and diverse, multiple interpretations of the scene around the robot, leveraging
recent advances in machine learning to showcase anytime budgeted cost-sensitive
feature selection, and fast non-linear regression for monocular depth
prediction. We empirically demonstrate the efficacy of our novel pipeline via
real world experiments of more than 2 kms through dense trees with a quadrotor
built from off-the-shelf parts. Moreover our pipeline is designed to combine
information from other modalities like stereo and lidar as well if available
- …