1,143 research outputs found
Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency
We present a learning based approach for multi-view stereopsis (MVS). While
current deep MVS methods achieve impressive results, they crucially rely on
ground-truth 3D training data, and acquisition of such precise 3D geometry for
supervision is a major hurdle. Our framework instead leverages photometric
consistency between multiple views as supervisory signal for learning depth
prediction in a wide baseline MVS setup. However, naively applying photo
consistency constraints is undesirable due to occlusion and lighting changes
across views. To overcome this, we propose a robust loss formulation that: a)
enforces first order consistency and b) for each point, selectively enforces
consistency with some views, thus implicitly handling occlusions. We
demonstrate our ability to learn MVS without 3D supervision using a real
dataset, and show that each component of our proposed robust loss results in a
significant improvement. We qualitatively observe that our reconstructions are
often more complete than the acquired ground truth, further showing the merits
of this approach. Lastly, our learned model generalizes to novel settings, and
our approach allows adaptation of existing CNNs to datasets without
ground-truth 3D by unsupervised finetuning. Project webpage:
https://tejaskhot.github.io/unsup_mv
Learning a Multi-View Stereo Machine
We present a learnt system for multi-view stereopsis. In contrast to recent
learning based methods for 3D reconstruction, we leverage the underlying 3D
geometry of the problem through feature projection and unprojection along
viewing rays. By formulating these operations in a differentiable manner, we
are able to learn the system end-to-end for the task of metric 3D
reconstruction. End-to-end learning allows us to jointly reason about shape
priors while conforming geometric constraints, enabling reconstruction from
much fewer images (even a single image) than required by classical approaches
as well as completion of unseen surfaces. We thoroughly evaluate our approach
on the ShapeNet dataset and demonstrate the benefits over classical approaches
as well as recent learning based methods
Volumetric visualization of 3D data
In recent years, there has been a rapid growth in the ability to obtain detailed data on large complex structures in three dimensions. This development occurred first in the medical field, with CAT (computer aided tomography) scans and now magnetic resonance imaging, and in seismological exploration. With the advances in supercomputing and computational fluid dynamics, and in experimental techniques in fluid dynamics, there is now the ability to produce similar large data fields representing 3D structures and phenomena in these disciplines. These developments have produced a situation in which currently there is access to data which is too complex to be understood using the tools available for data reduction and presentation. Researchers in these areas are becoming limited by their ability to visualize and comprehend the 3D systems they are measuring and simulating
DeepV2D: Video to Depth with Differentiable Structure from Motion
We propose DeepV2D, an end-to-end deep learning architecture for predicting
depth from video. DeepV2D combines the representation ability of neural
networks with the geometric principles governing image formation. We compose a
collection of classical geometric algorithms, which are converted into
trainable modules and combined into an end-to-end differentiable architecture.
DeepV2D interleaves two stages: motion estimation and depth estimation. During
inference, motion and depth estimation are alternated and converge to accurate
depth. Code is available https://github.com/princeton-vl/DeepV2D
Self-Supervised Monocular Image Depth Learning and Confidence Estimation
Convolutional Neural Networks (CNNs) need large amounts of data with ground
truth annotation, which is a challenging problem that has limited the
development and fast deployment of CNNs for many computer vision tasks. We
propose a novel framework for depth estimation from monocular images with
corresponding confidence in a self-supervised manner. A fully differential
patch-based cost function is proposed by using the Zero-Mean Normalized Cross
Correlation (ZNCC) that takes multi-scale patches as a matching strategy. This
approach greatly increases the accuracy and robustness of the depth learning.
In addition, the proposed patch-based cost function can provide a 0 to 1
confidence, which is then used to supervise the training of a parallel network
for confidence map learning and estimation. Evaluation on KITTI dataset shows
that our method outperforms the state-of-the-art results
Cue combination for 3D location judgements
Cue combination rules have often been applied to the perception of surface shape but not to judgements of object location. Here, we used immersive virtual reality to explore the relationship between different cues to distance. Participants viewed a virtual scene and judged the change in distance of an object presented in two intervals, where the scene changed in size between intervals (by a factor of between 0.25 and 4). We measured thresholds for detecting a change in object distance when there were only 'physical' (stereo and motion parallax) or 'texture-based' cues (independent of the scale of the scene) and used these to predict biases in a distance matching task. Under a range of conditions, in which the viewing distance and position of the tarte relative to other objects was varied, the ration of 'physical' to 'texture-based' thresholds was a good predictor of biases in the distance matching task. The cue combination approach, which successfully accounts for our data, relies on quite different principles from those underlying geometric reconstruction
From Multiview Image Curves to 3D Drawings
Reconstructing 3D scenes from multiple views has made impressive strides in
recent years, chiefly by correlating isolated feature points, intensity
patterns, or curvilinear structures. In the general setting - without
controlled acquisition, abundant texture, curves and surfaces following
specific models or limiting scene complexity - most methods produce unorganized
point clouds, meshes, or voxel representations, with some exceptions producing
unorganized clouds of 3D curve fragments. Ideally, many applications require
structured representations of curves, surfaces and their spatial relationships.
This paper presents a step in this direction by formulating an approach that
combines 2D image curves into a collection of 3D curves, with topological
connectivity between them represented as a 3D graph. This results in a 3D
drawing, which is complementary to surface representations in the same sense as
a 3D scaffold complements a tent taut over it. We evaluate our results against
truth on synthetic and real datasets.Comment: Expanded ECCV 2016 version with tweaked figures and including an
overview of the supplementary material available at
multiview-3d-drawing.sourceforge.ne
Stereoscopic Cinema
Stereoscopic cinema has seen a surge of activity in recent years, and for the
first time all of the major Hollywood studios released 3-D movies in 2009. This
is happening alongside the adoption of 3-D technology for sports broadcasting,
and the arrival of 3-D TVs for the home. Two previous attempts to introduce 3-D
cinema in the 1950s and the 1980s failed because the contemporary technology
was immature and resulted in viewer discomfort. But current technologies --
such as accurately-adjustable 3-D camera rigs with onboard computers to
automatically inform a camera operator of inappropriate stereoscopic shots,
digital processing for post-shooting rectification of the 3-D imagery, digital
projectors for accurate positioning of the two stereo projections on the cinema
screen, and polarized silver screens to reduce cross-talk between the viewers
left- and right-eyes -- mean that the viewer experience is at a much higher
level of quality than in the past. Even so, creation of stereoscopic cinema is
an open, active research area, and there are many challenges from acquisition
to post-production to automatic adaptation for different-sized display. This
chapter describes the current state-of-the-art in stereoscopic cinema, and
directions of future work.Comment: Published as Ronfard, R\'emi and Taubin, Gabriel. Image and Geometry
Processing for 3-D Cinematography, 5, Springer Berlin Heidelberg, pp.11-51,
2010, Geometry and Computing, 978-3-642-12392-
The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning
The three-dimensional reconstruction of scenes from multiple views has made
impressive strides in recent years, chiefly by methods correlating isolated
feature points, intensities, or curvilinear structure. In the general setting,
i.e., without requiring controlled acquisition, limited number of objects,
abundant patterns on objects, or object curves to follow particular models, the
majority of these methods produce unorganized point clouds, meshes, or voxel
representations of the reconstructed scene, with some exceptions producing 3D
drawings as networks of curves. Many applications, e.g., robotics, urban
planning, industrial design, and hard surface modeling, however, require
structured representations which make explicit 3D curves, surfaces, and their
spatial relationships. Reconstructing surface representations can now be
constrained by the 3D drawing acting like a scaffold to hang on the computed
representations, leading to increased robustness and quality of reconstruction.
This paper presents one way of completing such 3D drawings with surface
reconstructions, by exploring occlusion reasoning through lofting algorithms.Comment: CVPR 2017 expanded version with improvements over camera ready,
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
CVPR, 201
Depth Perception in Autostereograms: 1/f-Noise is Best
An autostereogram is a single image that encodes depth information that pops
out when looking at it. The trick is achieved by replicating a vertical strip
that sets a basic two-dimensional pattern with disparity shifts that encode a
three-dimensional scene. It is of interest to explore the dependency between
the ease of perceiving depth in autostereograms and the choice of the basic
pattern used for generating them. In this work we confirm a theory proposed by
Bruckstein et al. to explain the process of autostereographic depth perception,
providing a measure for the ease of "locking into" the depth profile, based on
the spectral properties of the basic pattern used. We report the results of
three sets of psychophysical experiments using autostereograms generated from
two-dimensional random noise patterns having power spectra of the form
. The experiments were designed to test the ability of human
subjects to identify smooth, low resolution surfaces, as well as detail, in the
form of higher resolution objects in the depth profile, and to determine limits
in identifying small objects as a function of their size. In accordance with
the theory, we discover a significant advantage of the noise pattern
(pink noise) for fast depth lock-in and fine detail detection, showing that
such patterns are optimal choices for autostereogram design. Validating the
theoretical model predictions strengthens its underlying assumptions, and
contributes to a better understanding of the visual system's binocular
disparity mechanisms
- …