322 research outputs found

    Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation

    Get PDF
    How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no ``one size fits all'' approach that satisfies all requirements. In recent years, the rising popularity of deep learning has resulted in a myriad of end-to-end solutions to many computer vision problems. These approaches, while successful, tend to lack scalability and can't easily exploit information learned by other systems. Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. This is achieved by employing sparse relative labels indicating relationships of similarity/dissimilarity between image locations. The nature of these labels results in an almost infinite set of dissimilar examples to choose from. We demonstrate how the selection of negative examples during training can be used to modify the feature space and vary it's properties. To demonstrate the generality of this approach, we apply the proposed features to a multitude of tasks, each requiring different properties. This includes disparity estimation, semantic segmentation, self-localisation and SLAM. In all cases, we show how incorporating SAND features results in better or comparable results to the baseline, whilst requiring little to no additional training. Code can be found at: https://github.com/jspenmar/SAND_featuresComment: CVPR201

    Improving the matching of deformable objects by learning to detect keypoints

    Full text link
    We propose a novel learned keypoint detection method to increase the number of correct matches for the task of non-rigid image correspondence. By leveraging true correspondences acquired by matching annotated image pairs with a specified descriptor extractor, we train an end-to-end convolutional neural network (CNN) to find keypoint locations that are more appropriate to the considered descriptor. For that, we apply geometric and photometric warpings to images to generate a supervisory signal, allowing the optimization of the detector. Experiments demonstrate that our method enhances the Mean Matching Accuracy of numerous descriptors when used in conjunction with our detection method, while outperforming the state-of-the-art keypoint detectors on real images of non-rigid objects by 20 p.p. We also apply our method on the complex real-world task of object retrieval where our detector performs on par with the finest keypoint detectors currently available for this task. The source code and trained models are publicly available at https://github.com/verlab/LearningToDetect_PRL_2023Comment: This is the accepted version of the paper to appear at Pattern Recognition Letters (PRL). The final journal version will be available at https://doi.org/10.1016/j.patrec.2023.08.01

    Hierarchical structure-and-motion recovery from uncalibrated images

    Full text link
    This paper addresses the structure-and-motion problem, that requires to find camera motion and 3D struc- ture from point matches. A new pipeline, dubbed Samantha, is presented, that departs from the prevailing sequential paradigm and embraces instead a hierarchical approach. This method has several advantages, like a provably lower computational complexity, which is necessary to achieve true scalability, and better error containment, leading to more stability and less drift. Moreover, a practical autocalibration procedure allows to process images without ancillary information. Experiments with real data assess the accuracy and the computational efficiency of the method.Comment: Accepted for publication in CVI

    Visual SLAM muuttuvissa ympäristöissä

    Get PDF
    This thesis investigates the problem of Visual Simultaneous Localization and Mapping (vSLAM) in changing environments. The vSLAM problem is to sequentially estimate the pose of a device with mounted cameras in a map generated based on images taken with those cameras. vSLAM algorithms face two main challenges in changing environments: moving objects and temporal appearance changes. Moving objects cause problems in pose estimation if they are mistaken for static objects. Moving objects also cause problems for loop closure detection (LCD), which is the problem of detecting whether a previously visited place has been revisited. A same moving object observed in two different places may cause false loop closures to be detected. Temporal appearance changes such as those brought about by time of day or weather changes cause long-term data association errors for LCD. These cause difficulties in recognizing previously visited places after they have undergone appearance changes. Focus is placed on LCD, which turns out to be the part of vSLAM that changing environment affects the most. In addition, several techniques and algorithms for Visual Place Recognition (VPR) in challenging conditions that could be used in the context of LCD are surveyed and the performance of two state-of-the-art modern VPR algorithms in changing environments is assessed in an experiment in order to measure their applicability for LCD. The most severe performance degrading appearance changes are found to be those caused by change in season and illumination. Several algorithms and techniques that perform well in loop closure related tasks in specific environmental conditions are identified as a result of the survey. Finally, a limited experiment on the Nordland dataset implies that the tested VPR algorithms are usable as is or can be modified for use in long-term LCD. As a part of the experiment, a new simple neighborhood consistency check was also developed, evaluated, and found to be effective at reducing false positives output by the tested VPR algorithms
    corecore