477 research outputs found

    A Light Touch Approach to Teaching Transformers Multi-view Geometry

    Full text link
    Transformers are powerful visual learners, in large part due to their conspicuous lack of manually-specified priors. This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps, penalizing attention values outside the epipolar lines and encouraging higher attention along these lines since they contain geometrically plausible matches. Unlike previous methods, our proposal does not require any camera pose information at test-time. We focus on pose-invariant object instance retrieval, where standard Transformer networks struggle, due to the large differences in viewpoint between query and retrieved images. Experimentally, our method outperforms state-of-the-art approaches at object retrieval, without needing pose information at test-time

    Structure from Recurrent Motion: From Rigidity to Recurrency

    Full text link
    This paper proposes a new method for Non-Rigid Structure-from-Motion (NRSfM) from a long monocular video sequence observing a non-rigid object performing recurrent and possibly repetitive dynamic action. Departing from the traditional idea of using linear low-order or lowrank shape model for the task of NRSfM, our method exploits the property of shape recurrency (i.e., many deforming shapes tend to repeat themselves in time). We show that recurrency is in fact a generalized rigidity. Based on this, we reduce NRSfM problems to rigid ones provided that certain recurrency condition is satisfied. Given such a reduction, standard rigid-SfM techniques are directly applicable (without any change) to the reconstruction of non-rigid dynamic shapes. To implement this idea as a practical approach, this paper develops efficient algorithms for automatic recurrency detection, as well as camera view clustering via a rigidity-check. Experiments on both simulated sequences and real data demonstrate the effectiveness of the method. Since this paper offers a novel perspective on rethinking structure-from-motion, we hope it will inspire other new problems in the field.Comment: To appear in CVPR 201

    Matching and recovering 3D people from multiple views

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.Peer ReviewedPostprint (author's final draft

    Semantic Validation in Structure from Motion

    Full text link
    The Structure from Motion (SfM) challenge in computer vision is the process of recovering the 3D structure of a scene from a series of projective measurements that are calculated from a collection of 2D images, taken from different perspectives. SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure from estimated intrinsic and extrinsic parameters and features. A problem encountered in SfM is that scenes lacking texture or with repetitive features can cause erroneous feature matching between frames. Semantic segmentation offers a route to validate and correct SfM models by labelling pixels in the input images with the use of a deep convolutional neural network. The semantic and geometric properties associated with classes in the scene can be taken advantage of to apply prior constraints to each class of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab were used. This, along with planar reconstruction of the dense model, were used to determine erroneous points that may be occluded from the calculated camera position, given the semantic label, and thus prior constraint of the reconstructed plane. Herein, semantic segmentation is integrated into SfM to apply priors on the 3D point cloud, given the object detection in the 2D input images. Additionally, the semantic labels of matched keypoints are compared and inconsistent semantically labelled points discarded. Furthermore, semantic labels on input images are used for the removal of objects associated with motion in the output SfM models. The proposed approach is evaluated on a data-set of 1102 images of a repetitive architecture scene. This project offers a novel method for improved validation of 3D SfM models

    6D Visual Odometry with Dense Probabilistic Egomotion Estimation

    Get PDF
    Proceedings of the International Conference on Computer Vision Theory and Applications, 361-365, 2013, Barcelona, SpainWe present a novel approach to 6D visual odometry for vehicles with calibrated stereo cameras. A dense probabilistic egomotion (5D) method is combined with robust stereo feature based approaches and Extended Kalman Filtering (EKF) techniques to provide high quality estimates of vehicle’s angular and linear velocities. Experimental results show that the proposed method compares favorably with state-the-art approaches, mainly in the estimation of the angular velocities, where significant improvements are achieved

    New approach to calculating the fundamental matrix

    Get PDF
    The estimation of the fundamental matrix (F) is to determine the epipolar geometry and to establish a geometrical relation between two images of the same scene or elaborate video frames. In the literature, we find many techniques that have been proposed for robust estimations such as RANSAC (random sample consensus), least-squares median (LMeds), and M estimators as exhaustive. This article presents a comparison between the different detectors that are (Harris, FAST, SIFT, and SURF) in terms of detected points number, the number of correct matches and the computation speed of the ‘F’. Our method based first on the extraction of descriptors by the algorithm (SURF) was used in comparison to the other one because of its robustness, then set the threshold of uniqueness to obtain the best points and also normalize these points and rank it according to the weighting function of the different regions at the end of the estimation of the matrix''F'' by the technique of the M-estimator at eight points, to calculate the average error and the speed of the calculation ''F''. The results of the experimental simulation were applied to the real images with different changes of viewpoints, for example (rotation, lighting, and moving object), give a good agreement in terms of the counting speed of the fundamental matrix and the acceptable average error. The results of the simulation show this technique of use in real-time application
    • …
    corecore