1,599 research outputs found

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

    Non-rigid Point Cloud Registration with Neural Deformation Pyramid

    Full text link
    Non-rigid point cloud registration is a key component in many computer vision and computer graphics applications. The high complexity of the unknown non-rigid motion make this task a challenging problem. In this paper, we break down this problem via hierarchical motion decomposition. Our method called Neural Deformation Pyramid (NDP) represents non-rigid motion using a pyramid architecture. Each pyramid level, denoted by a Multi-Layer Perception (MLP), takes as input a sinusoidally encoded 3D point and outputs its motion increments from the previous level. The sinusoidal function starts with a low input frequency and gradually increases when the pyramid level goes down. This allows a multi-level rigid to nonrigid motion decomposition and also speeds up the solving by 50 times compared to the existing MLP-based approach. Our method achieves advanced partialto-partial non-rigid point cloud registration results on the 4DMatch/4DLoMatch benchmark under both no-learned and supervised settings.Comment: Code: https://github.com/rabbityl/DeformationPyrami

    Towards Robust Visual Localization in Challenging Conditions

    Get PDF
    Visual localization is a fundamental problem in computer vision, with a multitude of applications in robotics, augmented reality and structure-from-motion. The basic problem is to, based on one or more images, figure out the position and orientation of the camera which captured these images relative to some model of the environment. Current visual localization approaches typically work well when the images to be localized are captured under similar conditions compared to those captured during mapping. However, when the environment exhibits large changes in visual appearance, due to e.g. variations in weather, seasons, day-night or viewpoint, the traditional pipelines break down. The reason is that the local image features used are based on low-level pixel-intensity information, which is not invariant to these transformations: when the environment changes, this will cause a different set of keypoints to be detected, and their descriptors will be different, making the long-term visual localization problem a challenging one. In this thesis, five papers are included, which present work towards solving the problem of long-term visual localization. Two of the articles present ideas for how semantic information may be included to aid in the localization process: one approach relies only on the semantic information for visual localization, and the other shows how the semantics can be used to detect outlier feature correspondences. The third paper considers how the output from a monocular depth-estimation network can be utilized to extract features that are less sensitive to viewpoint changes. The fourth article is a benchmark paper, where we present three new benchmark datasets aimed at evaluating localization algorithms in the context of long-term visual localization. Lastly, the fifth article considers how to perform convolutions on spherical imagery, which in the future might be applied to learning local image features for the localization problem

    Survey on Leveraging Uncertainty Estimation Towards Trustworthy Deep Neural Networks: The Case of Reject Option and Post-training Processing

    Full text link
    Although neural networks (especially deep neural networks) have achieved \textit{better-than-human} performance in many fields, their real-world deployment is still questionable due to the lack of awareness about the limitation in their knowledge. To incorporate such awareness in the machine learning model, prediction with reject option (also known as selective classification or classification with abstention) has been proposed in literature. In this paper, we present a systematic review of the prediction with the reject option in the context of various neural networks. To the best of our knowledge, this is the first study focusing on this aspect of neural networks. Moreover, we discuss different novel loss functions related to the reject option and post-training processing (if any) of network output for generating suitable measurements for knowledge awareness of the model. Finally, we address the application of the rejection option in reducing the prediction time for the real-time problems and present a comprehensive summary of the techniques related to the reject option in the context of extensive variety of neural networks. Our code is available on GitHub: \url{https://github.com/MehediHasanTutul/Reject_option

    Enabling technology for non-rigid registration during image-guided neurosurgery

    Get PDF
    In the context of image processing, non-rigid registration is an operation that attempts to align two or more images using spatially varying transformations. Non-rigid registration finds application in medical image processing to account for the deformations in the soft tissues of the imaged organs. During image-guided neurosurgery, non-rigid registration has the potential to assist in locating critical brain structures and improve identification of the tumor boundary. Robust non-rigid registration methods combine estimation of tissue displacement based on image intensities with the spatial regularization using biomechanical models of brain deformation. In practice, the use of such registration methods during neurosurgery is complicated by a number of issues: construction of the biomechanical model used in the registration from the image data, high computational demands of the application, and difficulties in assessing the registration results. In this dissertation we develop methods and tools that address some of these challenges, and provide components essential for the intra-operative application of a previously validated physics-based non-rigid registration method.;First, we study the problem of image-to-mesh conversion, which is required for constructing biomechanical model of the brain used during registration. We develop and analyze a number of methods suitable for solving this problem, and evaluate them using application-specific quantitative metrics. Second, we develop a high-performance implementation of the non-rigid registration algorithm and study the use of geographically distributed Grid resources for speculative registration computations. Using the high-performance implementation running on the remote computing resources we are able to deliver the results of registration within the time constraints of the neurosurgery. Finally, we present a method that estimates local alignment error between the two images of the same subject. We assess the utility of this method using multiple sources of ground truth to evaluate its potential to support speculative computations of non-rigid registration

    Graph Neural Network Flavour Tagging and Boosted Higgs Measurements at the LHC

    Get PDF
    This thesis presents investigations into the challenges of, and potential improvements to, b-jet identification (b-tagging) at the ATLAS experiment at the Large Hadron Collider (LHC). The presence of b-jets is a key signature of many interesting physics processes such as the production of Higgs bosons, which preferentially decay to a pair of b-quarks. In this thesis, a particular focus is placed on the high transverse momentum regime, which is a critical region in which to study the Higgs boson and the wider Standard Model, but also a region within which b-tagging becomes increasingly difficult. As b-tagging relies on the accurate reconstruction of charged particle trajectories (tracks), the tracking performance is investigated and potential improvements are assessed. Track reconstruction becomes increasingly difficult at high transverse momentum due to the in- creased multiplicity and collimation of tracks, and also due to the presence of displaced tracks from the decay of a long-flying b-hadron. The investigations reveal that the quality selections applied during track reconstruction are suboptimal for b-hadron decay tracks inside high transverse momentum b-jets, motivating future studies into the optimisation of these selections. Two novel approaches are developed to improve b-tagging performance. Firstly, an algorithm which is able to classify the origin of tracks is used to select a more optimal set of tracks for input to the b-tagging algorithms. Secondly, a graph neural network (GNN) jet flavour tagging algorithm has been developed. This algorithm directly accepts jets and tracks as inputs, making a break from previous algorithms which relied on the outputs of intermediate taggers. The model is trained to simultaneously predict the jet flavour, track origins, and the spatial track-pair compatibility, and demonstrates marked improvements in b-tagging performance both at low and high transverse momenta. The closely related task of c-jet identification also benefits from this approach. Analysis of high transverse momentum H → bb decays, where the Higgs boson is produced in association with a vector boson, was performed using 139 fb−1 of 13 TeV proton-proton collision data from Run 2 of the LHC. This analysis provided first measurements of the V H, H → bb process in two high transverse momentum regions, and is described with a particular focus on the background modelling studies performed by the author

    Study on Segmentation and Global Motion Estimation in Object Tracking Based on Compressed Domain

    Get PDF
    Object tracking is an interesting and needed procedure for many real time applications. But it is a challenging one, because of the presence of challenging sequences with abrupt motion occlusion, cluttered background and also the camera shake. In many video processing systems, the presence of moving objects limits the accuracy of Global Motion Estimation (GME). On the other hand, the inaccuracy of global motion parameter estimates affects the performance of motion segmentation. In the proposed method, we introduce a procedure for simultaneous object segmentation and GME from block-based motion vector (MV) field, motion vector is refined firstly by spatial and temporal correlation of motion and initial segmentation is produced by using the motion vector difference after global motion estimation
    • …
    corecore