322 research outputs found
Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation
How do computers and intelligent agents view the world around them? Feature
extraction and representation constitutes one the basic building blocks towards
answering this question. Traditionally, this has been done with carefully
engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is
no ``one size fits all'' approach that satisfies all requirements. In recent
years, the rising popularity of deep learning has resulted in a myriad of
end-to-end solutions to many computer vision problems. These approaches, while
successful, tend to lack scalability and can't easily exploit information
learned by other systems. Instead, we propose SAND features, a dedicated deep
learning solution to feature extraction capable of providing hierarchical
context information. This is achieved by employing sparse relative labels
indicating relationships of similarity/dissimilarity between image locations.
The nature of these labels results in an almost infinite set of dissimilar
examples to choose from. We demonstrate how the selection of negative examples
during training can be used to modify the feature space and vary it's
properties. To demonstrate the generality of this approach, we apply the
proposed features to a multitude of tasks, each requiring different properties.
This includes disparity estimation, semantic segmentation, self-localisation
and SLAM. In all cases, we show how incorporating SAND features results in
better or comparable results to the baseline, whilst requiring little to no
additional training. Code can be found at:
https://github.com/jspenmar/SAND_featuresComment: CVPR201
Improving the matching of deformable objects by learning to detect keypoints
We propose a novel learned keypoint detection method to increase the number
of correct matches for the task of non-rigid image correspondence. By
leveraging true correspondences acquired by matching annotated image pairs with
a specified descriptor extractor, we train an end-to-end convolutional neural
network (CNN) to find keypoint locations that are more appropriate to the
considered descriptor. For that, we apply geometric and photometric warpings to
images to generate a supervisory signal, allowing the optimization of the
detector. Experiments demonstrate that our method enhances the Mean Matching
Accuracy of numerous descriptors when used in conjunction with our detection
method, while outperforming the state-of-the-art keypoint detectors on real
images of non-rigid objects by 20 p.p. We also apply our method on the complex
real-world task of object retrieval where our detector performs on par with the
finest keypoint detectors currently available for this task. The source code
and trained models are publicly available at
https://github.com/verlab/LearningToDetect_PRL_2023Comment: This is the accepted version of the paper to appear at Pattern
Recognition Letters (PRL). The final journal version will be available at
https://doi.org/10.1016/j.patrec.2023.08.01
Hierarchical structure-and-motion recovery from uncalibrated images
This paper addresses the structure-and-motion problem, that requires to find
camera motion and 3D struc- ture from point matches. A new pipeline, dubbed
Samantha, is presented, that departs from the prevailing sequential paradigm
and embraces instead a hierarchical approach. This method has several
advantages, like a provably lower computational complexity, which is necessary
to achieve true scalability, and better error containment, leading to more
stability and less drift. Moreover, a practical autocalibration procedure
allows to process images without ancillary information. Experiments with real
data assess the accuracy and the computational efficiency of the method.Comment: Accepted for publication in CVI
Visual SLAM muuttuvissa ympäristöissä
This thesis investigates the problem of Visual Simultaneous Localization and Mapping (vSLAM) in
changing environments. The vSLAM problem is to sequentially estimate the pose of a device with
mounted cameras in a map generated based on images taken with those cameras. vSLAM algorithms
face two main challenges in changing environments: moving objects and temporal appearance
changes. Moving objects cause problems in pose estimation if they are mistaken for static objects.
Moving objects also cause problems for loop closure detection (LCD), which is the problem of
detecting whether a previously visited place has been revisited. A same moving object observed
in two different places may cause false loop closures to be detected. Temporal appearance changes
such as those brought about by time of day or weather changes cause long-term data association
errors for LCD. These cause difficulties in recognizing previously visited places after they have
undergone appearance changes. Focus is placed on LCD, which turns out to be the part of vSLAM
that changing environment affects the most. In addition, several techniques and algorithms for
Visual Place Recognition (VPR) in challenging conditions that could be used in the context of
LCD are surveyed and the performance of two state-of-the-art modern VPR algorithms in changing
environments is assessed in an experiment in order to measure their applicability for LCD. The
most severe performance degrading appearance changes are found to be those caused by change in
season and illumination. Several algorithms and techniques that perform well in loop closure related
tasks in specific environmental conditions are identified as a result of the survey. Finally, a limited
experiment on the Nordland dataset implies that the tested VPR algorithms are usable as is or can
be modified for use in long-term LCD. As a part of the experiment, a new simple neighborhood
consistency check was also developed, evaluated, and found to be effective at reducing false positives
output by the tested VPR algorithms
- …