46 research outputs found

    A survey on rotation optimization in structure from motion

    Full text link
    We consider the problem of robust rotation optimization in Structure from Motion applications. A number of different approaches have been recently proposed, with solutions that are at times incompatible, and at times complementary. The goal of this paper is to survey and compare these ideas in a unified manner, and to benchmark their robustness against the presence of outliers. In all, we have tested more than forty variants of a these methods (including novel ones), and we find the best performing combination.NSFDGE-0966142 (IGERT), NSF-IIS-1317788, NSF-IIP-1439681 (I/UCRC), NSF-IIS-1426840, ARL MAST-CTA W911NF-08-2-0004, ARL RCTA W911NF-10-2-0016, ONR N000141310778

    WarpNet: Weakly Supervised Matching for Single-view Reconstruction

    Full text link
    We present an approach to matching images of objects in fine-grained datasets without using part annotations, with an application to the challenging problem of weakly supervised single-view reconstruction. This is in contrast to prior works that require part annotations, since matching objects across class and pose variations is challenging with appearance features alone. We overcome this challenge through a novel deep learning architecture, WarpNet, that aligns an object in one image with a different object in another. We exploit the structure of the fine-grained dataset to create artificial data for training this network in an unsupervised-discriminative learning approach. The output of the network acts as a spatial prior that allows generalization at test time to match real images across variations in appearance, viewpoint and articulation. On the CUB-200-2011 dataset of bird categories, we improve the AP over an appearance-only network by 13.6%. We further demonstrate that our WarpNet matches, together with the structure of fine-grained datasets, allow single-view reconstructions with quality comparable to using annotated point correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Robust Camera Location Estimation by Convex Programming

    Full text link
    33D structure recovery from a collection of 22D images requires the estimation of the camera locations and orientations, i.e. the camera motion. For large, irregular collections of images, existing methods for the location estimation part, which can be formulated as the inverse problem of estimating nn locations t1,t2,…,tn\mathbf{t}_1, \mathbf{t}_2, \ldots, \mathbf{t}_n in R3\mathbb{R}^3 from noisy measurements of a subset of the pairwise directions ti−tj∥ti−tj∥\frac{\mathbf{t}_i - \mathbf{t}_j}{\|\mathbf{t}_i - \mathbf{t}_j\|}, are sensitive to outliers in direction measurements. In this paper, we firstly provide a complete characterization of well-posed instances of the location estimation problem, by presenting its relation to the existing theory of parallel rigidity. For robust estimation of camera locations, we introduce a two-step approach, comprised of a pairwise direction estimation method robust to outliers in point correspondences between image pairs, and a convex program to maintain robustness to outlier directions. In the presence of partially corrupted measurements, we empirically demonstrate that our convex formulation can even recover the locations exactly. Lastly, we demonstrate the utility of our formulations through experiments on Internet photo collections.Comment: 10 pages, 6 figures, 3 table

    DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks

    Full text link
    3D scene understanding is important for robots to interact with the 3D world in a meaningful way. Most previous works on 3D scene understanding focus on recognizing geometrical or semantic properties of the scene independently. In this work, we introduce Data Associated Recurrent Neural Networks (DA-RNNs), a novel framework for joint 3D scene mapping and semantic labeling. DA-RNNs use a new recurrent neural network architecture for semantic labeling on RGB-D videos. The output of the network is integrated with mapping techniques such as KinectFusion in order to inject semantic information into the reconstructed 3D scene. Experiments conducted on a real world dataset and a synthetic dataset with RGB-D videos demonstrate the ability of our method in semantic 3D scene mapping.Comment: Published in RSS 201

    GraphMatch: Efficient Large-Scale Graph Construction for Structure from Motion

    Full text link
    We present GraphMatch, an approximate yet efficient method for building the matching graph for large-scale structure-from-motion (SfM) pipelines. Unlike modern SfM pipelines that use vocabulary (Voc.) trees to quickly build the matching graph and avoid a costly brute-force search of matching image pairs, GraphMatch does not require an expensive offline pre-processing phase to construct a Voc. tree. Instead, GraphMatch leverages two priors that can predict which image pairs are likely to match, thereby making the matching process for SfM much more efficient. The first is a score computed from the distance between the Fisher vectors of any two images. The second prior is based on the graph distance between vertices in the underlying matching graph. GraphMatch combines these two priors into an iterative "sample-and-propagate" scheme similar to the PatchMatch algorithm. Its sampling stage uses Fisher similarity priors to guide the search for matching image pairs, while its propagation stage explores neighbors of matched pairs to find new ones with a high image similarity score. Our experiments show that GraphMatch finds the most image pairs as compared to competing, approximate methods while at the same time being the most efficient.Comment: Published at IEEE 3DV 201

    View Selection with Geometric Uncertainty Modeling

    Full text link
    Estimating positions of world points from features observed in images is a key problem in 3D reconstruction, image mosaicking,simultaneous localization and mapping and structure from motion. We consider a special instance in which there is a dominant ground plane G\mathcal{G} viewed from a parallel viewing plane S\mathcal{S} above it. Such instances commonly arise, for example, in aerial photography. Consider a world point g∈Gg \in \mathcal{G} and its worst case reconstruction uncertainty ε(g,S)\varepsilon(g,\mathcal{S}) obtained by merging \emph{all} possible views of gg chosen from S\mathcal{S}. We first show that one can pick two views sps_p and sqs_q such that the uncertainty ε(g,{sp,sq})\varepsilon(g,\{s_p,s_q\}) obtained using only these two views is almost as good as (i.e. within a small constant factor of) ε(g,S)\varepsilon(g,\mathcal{S}). Next, we extend the result to the entire ground plane G\mathcal{G} and show that one can pick a small subset of S′⊆S\mathcal{S'} \subseteq \mathcal{S} (which grows only linearly with the area of G\mathcal{G}) and still obtain a constant factor approximation, for every point g∈Gg \in \mathcal{G}, to the minimum worst case estimate obtained by merging all views in S\mathcal{S}. Finally, we present a multi-resolution view selection method which extends our techniques to non-planar scenes. We show that the method can produce rich and accurate dense reconstructions with a small number of views. Our results provide a view selection mechanism with provable performance guarantees which can drastically increase the speed of scene reconstruction algorithms. In addition to theoretical results, we demonstrate their effectiveness in an application where aerial imagery is used for monitoring farms and orchards

    An Integer Linear Programming Model for View Selection on Overlapping Camera Clusters

    Get PDF
    Multi-View Stereo (MVS) algorithms scale poorly on large image sets, and quickly become unfeasible to run on a single machine with limited memory. Typical solutions to lower the complexity include reducing the redundancy of the image set (view selection), and dividing the image set in groups to be processed independently (view clustering). A novel formulation for view selection is proposed here. We express the problem with an Integer Linear Programming (ILP) model, where cameras are modeled with binary variables, while the linear constraints enforce the completeness of the 3D reconstruction. The solution of the ILP leads to an optimal subset of selected cameras. As a second contribution, we integrate ILP camera selection with a view clustering approach which exploits Leveraged Affinity Propagation (LAP). LAP clustering can efficiently deal with large camera sets. We adapt the original algorithm so that it provides a set of overlapping clusters where the minimum and maximum sizes and the number of overlapping cameras can be specified. Evaluations on four different dataset show our solution provides significant complexity reductions and guarantees near-perfect coverage, making large reconstructions feasible even on a single machine
    corecore