1,143 research outputs found

    Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

    Full text link
    We present a learning based approach for multi-view stereopsis (MVS). While current deep MVS methods achieve impressive results, they crucially rely on ground-truth 3D training data, and acquisition of such precise 3D geometry for supervision is a major hurdle. Our framework instead leverages photometric consistency between multiple views as supervisory signal for learning depth prediction in a wide baseline MVS setup. However, naively applying photo consistency constraints is undesirable due to occlusion and lighting changes across views. To overcome this, we propose a robust loss formulation that: a) enforces first order consistency and b) for each point, selectively enforces consistency with some views, thus implicitly handling occlusions. We demonstrate our ability to learn MVS without 3D supervision using a real dataset, and show that each component of our proposed robust loss results in a significant improvement. We qualitatively observe that our reconstructions are often more complete than the acquired ground truth, further showing the merits of this approach. Lastly, our learned model generalizes to novel settings, and our approach allows adaptation of existing CNNs to datasets without ground-truth 3D by unsupervised finetuning. Project webpage: https://tejaskhot.github.io/unsup_mv

    Learning a Multi-View Stereo Machine

    Full text link
    We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods

    Volumetric visualization of 3D data

    Get PDF
    In recent years, there has been a rapid growth in the ability to obtain detailed data on large complex structures in three dimensions. This development occurred first in the medical field, with CAT (computer aided tomography) scans and now magnetic resonance imaging, and in seismological exploration. With the advances in supercomputing and computational fluid dynamics, and in experimental techniques in fluid dynamics, there is now the ability to produce similar large data fields representing 3D structures and phenomena in these disciplines. These developments have produced a situation in which currently there is access to data which is too complex to be understood using the tools available for data reduction and presentation. Researchers in these areas are becoming limited by their ability to visualize and comprehend the 3D systems they are measuring and simulating

    DeepV2D: Video to Depth with Differentiable Structure from Motion

    Full text link
    We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D

    Self-Supervised Monocular Image Depth Learning and Confidence Estimation

    Full text link
    Convolutional Neural Networks (CNNs) need large amounts of data with ground truth annotation, which is a challenging problem that has limited the development and fast deployment of CNNs for many computer vision tasks. We propose a novel framework for depth estimation from monocular images with corresponding confidence in a self-supervised manner. A fully differential patch-based cost function is proposed by using the Zero-Mean Normalized Cross Correlation (ZNCC) that takes multi-scale patches as a matching strategy. This approach greatly increases the accuracy and robustness of the depth learning. In addition, the proposed patch-based cost function can provide a 0 to 1 confidence, which is then used to supervise the training of a parallel network for confidence map learning and estimation. Evaluation on KITTI dataset shows that our method outperforms the state-of-the-art results

    Cue combination for 3D location judgements

    Get PDF
    Cue combination rules have often been applied to the perception of surface shape but not to judgements of object location. Here, we used immersive virtual reality to explore the relationship between different cues to distance. Participants viewed a virtual scene and judged the change in distance of an object presented in two intervals, where the scene changed in size between intervals (by a factor of between 0.25 and 4). We measured thresholds for detecting a change in object distance when there were only 'physical' (stereo and motion parallax) or 'texture-based' cues (independent of the scale of the scene) and used these to predict biases in a distance matching task. Under a range of conditions, in which the viewing distance and position of the tarte relative to other objects was varied, the ration of 'physical' to 'texture-based' thresholds was a good predictor of biases in the distance matching task. The cue combination approach, which successfully accounts for our data, relies on quite different principles from those underlying geometric reconstruction

    From Multiview Image Curves to 3D Drawings

    Full text link
    Reconstructing 3D scenes from multiple views has made impressive strides in recent years, chiefly by correlating isolated feature points, intensity patterns, or curvilinear structures. In the general setting - without controlled acquisition, abundant texture, curves and surfaces following specific models or limiting scene complexity - most methods produce unorganized point clouds, meshes, or voxel representations, with some exceptions producing unorganized clouds of 3D curve fragments. Ideally, many applications require structured representations of curves, surfaces and their spatial relationships. This paper presents a step in this direction by formulating an approach that combines 2D image curves into a collection of 3D curves, with topological connectivity between them represented as a 3D graph. This results in a 3D drawing, which is complementary to surface representations in the same sense as a 3D scaffold complements a tent taut over it. We evaluate our results against truth on synthetic and real datasets.Comment: Expanded ECCV 2016 version with tweaked figures and including an overview of the supplementary material available at multiview-3d-drawing.sourceforge.ne

    Stereoscopic Cinema

    Full text link
    Stereoscopic cinema has seen a surge of activity in recent years, and for the first time all of the major Hollywood studios released 3-D movies in 2009. This is happening alongside the adoption of 3-D technology for sports broadcasting, and the arrival of 3-D TVs for the home. Two previous attempts to introduce 3-D cinema in the 1950s and the 1980s failed because the contemporary technology was immature and resulted in viewer discomfort. But current technologies -- such as accurately-adjustable 3-D camera rigs with onboard computers to automatically inform a camera operator of inappropriate stereoscopic shots, digital processing for post-shooting rectification of the 3-D imagery, digital projectors for accurate positioning of the two stereo projections on the cinema screen, and polarized silver screens to reduce cross-talk between the viewers left- and right-eyes -- mean that the viewer experience is at a much higher level of quality than in the past. Even so, creation of stereoscopic cinema is an open, active research area, and there are many challenges from acquisition to post-production to automatic adaptation for different-sized display. This chapter describes the current state-of-the-art in stereoscopic cinema, and directions of future work.Comment: Published as Ronfard, R\'emi and Taubin, Gabriel. Image and Geometry Processing for 3-D Cinematography, 5, Springer Berlin Heidelberg, pp.11-51, 2010, Geometry and Computing, 978-3-642-12392-

    The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning

    Full text link
    The three-dimensional reconstruction of scenes from multiple views has made impressive strides in recent years, chiefly by methods correlating isolated feature points, intensities, or curvilinear structure. In the general setting, i.e., without requiring controlled acquisition, limited number of objects, abundant patterns on objects, or object curves to follow particular models, the majority of these methods produce unorganized point clouds, meshes, or voxel representations of the reconstructed scene, with some exceptions producing 3D drawings as networks of curves. Many applications, e.g., robotics, urban planning, industrial design, and hard surface modeling, however, require structured representations which make explicit 3D curves, surfaces, and their spatial relationships. Reconstructing surface representations can now be constrained by the 3D drawing acting like a scaffold to hang on the computed representations, leading to increased robustness and quality of reconstruction. This paper presents one way of completing such 3D drawings with surface reconstructions, by exploring occlusion reasoning through lofting algorithms.Comment: CVPR 2017 expanded version with improvements over camera ready, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, 201

    Depth Perception in Autostereograms: 1/f-Noise is Best

    Full text link
    An autostereogram is a single image that encodes depth information that pops out when looking at it. The trick is achieved by replicating a vertical strip that sets a basic two-dimensional pattern with disparity shifts that encode a three-dimensional scene. It is of interest to explore the dependency between the ease of perceiving depth in autostereograms and the choice of the basic pattern used for generating them. In this work we confirm a theory proposed by Bruckstein et al. to explain the process of autostereographic depth perception, providing a measure for the ease of "locking into" the depth profile, based on the spectral properties of the basic pattern used. We report the results of three sets of psychophysical experiments using autostereograms generated from two-dimensional random noise patterns having power spectra of the form 1/fβ1/f^\beta. The experiments were designed to test the ability of human subjects to identify smooth, low resolution surfaces, as well as detail, in the form of higher resolution objects in the depth profile, and to determine limits in identifying small objects as a function of their size. In accordance with the theory, we discover a significant advantage of the 1/f1/f noise pattern (pink noise) for fast depth lock-in and fine detail detection, showing that such patterns are optimal choices for autostereogram design. Validating the theoretical model predictions strengthens its underlying assumptions, and contributes to a better understanding of the visual system's binocular disparity mechanisms
    • …
    corecore