150 research outputs found
Retrieval and Registration of Long-Range Overlapping Frames for Scalable Mosaicking of In Vivo Fetoscopy
Purpose: The standard clinical treatment of Twin-to-Twin Transfusion Syndrome
consists in the photo-coagulation of undesired anastomoses located on the
placenta which are responsible to a blood transfer between the two twins. While
being the standard of care procedure, fetoscopy suffers from a limited
field-of-view of the placenta resulting in missed anastomoses. To facilitate
the task of the clinician, building a global map of the placenta providing a
larger overview of the vascular network is highly desired. Methods: To overcome
the challenging visual conditions inherent to in vivo sequences (low contrast,
obstructions or presence of artifacts, among others), we propose the following
contributions: (i) robust pairwise registration is achieved by aligning the
orientation of the image gradients, and (ii) difficulties regarding long-range
consistency (e.g. due to the presence of outliers) is tackled via a bag-of-word
strategy, which identifies overlapping frames of the sequence to be registered
regardless of their respective location in time. Results: In addition to visual
difficulties, in vivo sequences are characterised by the intrinsic absence of
gold standard. We present mosaics motivating qualitatively our methodological
choices and demonstrating their promising aspect. We also demonstrate
semi-quantitatively, via visual inspection of registration results, the
efficacy of our registration approach in comparison to two standard baselines.
Conclusion: This paper proposes the first approach for the construction of
mosaics of placenta in in vivo fetoscopy sequences. Robustness to visual
challenges during registration and long-range temporal consistency are
proposed, offering first positive results on in vivo data for which standard
mosaicking techniques are not applicable.Comment: Accepted for publication in International Journal of Computer
Assisted Radiology and Surgery (IJCARS
Pentagon-Match (PMatch): Identification of View-Invariant Planar Feature for Local Feature Matching-Based Homography Estimation
In computer vision, finding correct point correspondence among images plays
an important role in many applications, such as image stitching, image
retrieval, visual localization, etc. Most of the research works focus on the
matching of local feature before a sampling method is employed, such as RANSAC,
to verify initial matching results via repeated fitting of certain global
transformation among the images. However, incorrect matches may still exist.
Thus, a novel sampling scheme, Pentagon-Match (PMatch), is proposed in this
work to verify the correctness of initially matched keypoints using pentagons
randomly sampled from them. By ensuring shape and location of these pentagons
are view-invariant with various evaluations of cross-ratio (CR), incorrect
matches of keypoint can be identified easily with homography estimated from
correctly matched pentagons. Experimental results show that highly accurate
estimation of homography can be obtained efficiently for planar scenes of the
HPatches dataset, based on keypoint matching results provided by LoFTR.
Besides, accurate outlier identification for the above matching results and
possible extension of the approach for multi-plane situation are also
demonstrated.Comment: arXiv admin note: text overlap with arXiv:2211.0300
Algorithms for trajectory integration in multiple views
PhDThis thesis addresses the problem of deriving a coherent and accurate localization
of moving objects from partial visual information when data are generated by cameras
placed in di erent view angles with respect to the scene. The framework is built around
applications of scene monitoring with multiple cameras. Firstly, we demonstrate how a
geometric-based solution exploits the relationships between corresponding feature points
across views and improves accuracy in object location. Then, we improve the estimation
of objects location with geometric transformations that account for lens distortions.
Additionally, we study the integration of the partial visual information generated by each
individual sensor and their combination into one single frame of observation that considers
object association and data fusion. Our approach is fully image-based, only relies on 2D
constructs and does not require any complex computation in 3D space. We exploit the
continuity and coherence in objects' motion when crossing cameras' elds of view. Additionally,
we work under the assumption of planar ground plane and wide baseline (i.e.
cameras' viewpoints are far apart). The main contributions are: i) the development of a
framework for distributed visual sensing that accounts for inaccuracies in the geometry
of multiple views; ii) the reduction of trajectory mapping errors using a statistical-based
homography estimation; iii) the integration of a polynomial method for correcting inaccuracies
caused by the cameras' lens distortion; iv) a global trajectory reconstruction
algorithm that associates and integrates fragments of trajectories generated by each camera
Detecting shadows and low-lying objects in indoor and outdoor scenes using homographies
Many computer vision applications apply background suppression techniques for the detection and segmentation of moving objects in a scene. While these algorithms tend to work well in controlled conditions they often fail when applied to unconstrained real-world environments. This paper describes a system that detects and removes erroneously segmented foreground regions that are close to a ground plane. These regions include shadows, changing background objects and other low-lying objects such as leaves and rubbish. The system uses a set-up of two or more cameras and requires no 3D reconstruction or depth analysis of the regions. Therefore, a strong camera calibration of the set-up is not necessary. A geometric constraint called a homography is exploited to determine if foreground points are on or above the ground plane. The system takes advantage of the fact that regions in images off the homography plane will not correspond after a homography transformation. Experimental results using real world scenes from a pedestrian tracking application illustrate the effectiveness of the proposed approach
Semantics and Planar Geometry for self-supervised Road Scene Understanding
In this thesis we leverage domain knowledge, specifically of road scenes, to provide a self-supervision signal, reduce the labelling requirements, improve the convergence of training and introduce interpretable parameters based on vastly simplified models. Specifically, we chose to research the value of applying domain knowledge to the popular tasks of semantic segmentation and relative pose estimation towards better understanding road scenes. In particular we leverage semantic and geometric scene understanding separately in the first two contributions and then seek to combine them in the third contribution.
Firstly, we show that hierarchical structure in class labels for training networks for tasks such as semantic segmentation can be useful for boosting performance and accelerating training. Moreover, we present a hierarchical loss implementation which differentiates between minor and serious errors, and evaluate our method on the Vistas road scene dataset.
Secondly, for the task of self-supervised monocular relative pose estimation, we propose a ground-relative formulation for network output which roots our problem in a locally planar geometry. Current self-supervised methods generally require over-parameterised training of both a pose and depth network, and our method entirely replaces the need for depth estimation, while obtaining competitive results on the KITTI visual odometry dataset, dramatically simplifying the problem.
Thirdly, we combine semantics with our geometric formulation by extracting the road plane with semantic segmentation and robustly fitting homographies to fine-scale correspondences between coarsely aligned image pairs. We show that with aid from our geometric knowledge and a known analytical method, we can decompose these homographies into camera-relative pose, providing a self-supervision signal that significantly improves our visual odometry performance at both training and test time. In particular, we form a non-differentiable module which computes real-time pseudo-labels, avoiding training complexity, and additionally allowing for test-time performance boosting, helping tackle bias present with deep learning methods
Viewpoint Invariant Dense Matching for Visual Geolocalization
In this paper we propose a novel method for image matching based on dense local features and tailored for visual geolocalization. Dense local features matching is robust against changes in illumination and occlusions, but not against viewpoint shifts which are a fundamental aspect of geolocalization. Our method, called GeoWarp, directly embeds invariance to viewpoint shifts in the process of extracting dense features. This is achieved via a trainable module which learns from the data an invariance that is meaningful for the task of recognizing places. We also devise a new self-supervised loss and two new weakly supervised losses to train this module using only unlabeled data and weak labels. GeoWarp is implemented efficiently as a re-ranking method that can be easily embedded into pre-existing visual geolocalization pipelines. Experimental validation on standard geolocalization benchmarks demonstrates that GeoWarp boosts the accuracy of state-of-the-art retrieval architectures. The code and trained models are available at https://github.com/gmberton/geo_war
- …