477 research outputs found
A Light Touch Approach to Teaching Transformers Multi-view Geometry
Transformers are powerful visual learners, in large part due to their
conspicuous lack of manually-specified priors. This flexibility can be
problematic in tasks that involve multiple-view geometry, due to the
near-infinite possible variations in 3D shapes and viewpoints (requiring
flexibility), and the precise nature of projective geometry (obeying rigid
laws). To resolve this conundrum, we propose a "light touch" approach, guiding
visual Transformers to learn multiple-view geometry but allowing them to break
free when needed. We achieve this by using epipolar lines to guide the
Transformer's cross-attention maps, penalizing attention values outside the
epipolar lines and encouraging higher attention along these lines since they
contain geometrically plausible matches. Unlike previous methods, our proposal
does not require any camera pose information at test-time. We focus on
pose-invariant object instance retrieval, where standard Transformer networks
struggle, due to the large differences in viewpoint between query and retrieved
images. Experimentally, our method outperforms state-of-the-art approaches at
object retrieval, without needing pose information at test-time
Structure from Recurrent Motion: From Rigidity to Recurrency
This paper proposes a new method for Non-Rigid Structure-from-Motion (NRSfM)
from a long monocular video sequence observing a non-rigid object performing
recurrent and possibly repetitive dynamic action. Departing from the
traditional idea of using linear low-order or lowrank shape model for the task
of NRSfM, our method exploits the property of shape recurrency (i.e., many
deforming shapes tend to repeat themselves in time). We show that recurrency is
in fact a generalized rigidity. Based on this, we reduce NRSfM problems to
rigid ones provided that certain recurrency condition is satisfied. Given such
a reduction, standard rigid-SfM techniques are directly applicable (without any
change) to the reconstruction of non-rigid dynamic shapes. To implement this
idea as a practical approach, this paper develops efficient algorithms for
automatic recurrency detection, as well as camera view clustering via a
rigidity-check. Experiments on both simulated sequences and real data
demonstrate the effectiveness of the method. Since this paper offers a novel
perspective on rethinking structure-from-motion, we hope it will inspire other
new problems in the field.Comment: To appear in CVPR 201
Matching and recovering 3D people from multiple views
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.Peer ReviewedPostprint (author's final draft
Semantic Validation in Structure from Motion
The Structure from Motion (SfM) challenge in computer vision is the process
of recovering the 3D structure of a scene from a series of projective
measurements that are calculated from a collection of 2D images, taken from
different perspectives. SfM consists of three main steps; feature detection and
matching, camera motion estimation, and recovery of 3D structure from estimated
intrinsic and extrinsic parameters and features.
A problem encountered in SfM is that scenes lacking texture or with
repetitive features can cause erroneous feature matching between frames.
Semantic segmentation offers a route to validate and correct SfM models by
labelling pixels in the input images with the use of a deep convolutional
neural network. The semantic and geometric properties associated with classes
in the scene can be taken advantage of to apply prior constraints to each class
of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab
were used. This, along with planar reconstruction of the dense model, were used
to determine erroneous points that may be occluded from the calculated camera
position, given the semantic label, and thus prior constraint of the
reconstructed plane. Herein, semantic segmentation is integrated into SfM to
apply priors on the 3D point cloud, given the object detection in the 2D input
images. Additionally, the semantic labels of matched keypoints are compared and
inconsistent semantically labelled points discarded. Furthermore, semantic
labels on input images are used for the removal of objects associated with
motion in the output SfM models. The proposed approach is evaluated on a
data-set of 1102 images of a repetitive architecture scene. This project offers
a novel method for improved validation of 3D SfM models
6D Visual Odometry with Dense Probabilistic Egomotion Estimation
Proceedings of the International Conference on Computer Vision Theory and Applications, 361-365, 2013, Barcelona, SpainWe present a novel approach to 6D visual odometry for vehicles with calibrated stereo cameras. A dense probabilistic egomotion (5D) method is combined with robust stereo feature based approaches and Extended Kalman Filtering (EKF) techniques to provide high quality estimates of vehicle’s angular and linear velocities. Experimental results show that the proposed method compares favorably with state-the-art approaches, mainly in the estimation of the angular velocities, where significant improvements are achieved
New approach to calculating the fundamental matrix
The estimation of the fundamental matrix (F) is to determine the epipolar geometry and to establish a geometrical relation between two images of the same scene or elaborate video frames. In the literature, we find many techniques that have been proposed for robust estimations such as RANSAC (random sample consensus), least-squares median (LMeds), and M estimators as exhaustive. This article presents a comparison between the different detectors that are (Harris, FAST, SIFT, and SURF) in terms of detected points number, the number of correct matches and the computation speed of the ‘F’. Our method based first on the extraction of descriptors by the algorithm (SURF) was used in comparison to the other one because of its robustness, then set the threshold of uniqueness to obtain the best points and also normalize these points and rank it according to the weighting function of the different regions at the end of the estimation of the matrix''F'' by the technique of the M-estimator at eight points, to calculate the average error and the speed of the calculation ''F''. The results of the experimental simulation were applied to the real images with different changes of viewpoints, for example (rotation, lighting, and moving object), give a good agreement in terms of the counting speed of the fundamental matrix and the acceptable average error. The results of the simulation show this technique of use in real-time application
- …