282,158 research outputs found

    KEYFRAME-BASED VISUAL-INERTIAL SLAM USING NONLINEAR OPTIMIZATION

    Get PDF
    Abstract—The fusion of visual and inertial cues has become popular in robotics due to the complementary nature of the two sensing modalities. While most fusion strategies to date rely on filtering schemes, the visual robotics community has recently turned to non-linear optimization approaches for tasks such as visual Simultaneous Localization And Mapping (SLAM), following the discovery that this comes with significant advantages in quality of performance and computational complexity. Following this trend, we present a novel approach to tightly integrate visual measurements with readings from an Inertial Measurement Unit (IMU) in SLAM. An IMU error term is integrated with the landmark reprojection error in a fully probabilistic manner, resulting to a joint non-linear cost function to be optimized. Employing the powerful concept of ‘keyframes ’ we partially marginalize old states to maintain a bounded-sized optimization window, ensuring real-time operation. Comparing against both vision-only and loosely-coupled visual-inertial algorithms, our experiments confirm the benefits of tight fusion in terms of accuracy and robustness. I

    Polar Fusion Technique Analysis for Evaluating the Performances of Image Fusion of Thermal and Visual Images for Human Face Recognition

    Full text link
    This paper presents a comparative study of two different methods, which are based on fusion and polar transformation of visual and thermal images. Here, investigation is done to handle the challenges of face recognition, which include pose variations, changes in facial expression, partial occlusions, variations in illumination, rotation through different angles, change in scale etc. To overcome these obstacles we have implemented and thoroughly examined two different fusion techniques through rigorous experimentation. In the first method log-polar transformation is applied to the fused images obtained after fusion of visual and thermal images whereas in second method fusion is applied on log-polar transformed individual visual and thermal images. After this step, which is thus obtained in one form or another, Principal Component Analysis (PCA) is applied to reduce dimension of the fused images. Log-polar transformed images are capable of handling complicacies introduced by scaling and rotation. The main objective of employing fusion is to produce a fused image that provides more detailed and reliable information, which is capable to overcome the drawbacks present in the individual visual and thermal face images. Finally, those reduced fused images are classified using a multilayer perceptron neural network. The database used for the experiments conducted here is Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database benchmark thermal and visual face images. The second method has shown better performance, which is 95.71% (maximum) and on an average 93.81% as correct recognition rate.Comment: Proceedings of IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (IEEE CIBIM 2011), Paris, France, April 11 - 15, 201

    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

    Full text link
    Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOCK, a new multimodal fusion based on the block-superdiagonal tensor decomposition. It leverages the notion of block-term ranks, which generalizes both concepts of rank and mode ranks for tensors, already used for multimodal fusion. It allows to define new ways for optimizing the tradeoff between the expressiveness and complexity of the fusion model, and is able to represent very fine interactions between modalities while maintaining powerful mono-modal representations. We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities. Through extensive experiments, we show that BLOCK compares favorably with respect to state-of-the-art multimodal fusion models for both VQA and VRD tasks. Our code is available at https://github.com/Cadene/block.bootstrap.pytorch

    A comparison of score, rank and probability-based fusion methods for video shot retrieval

    Get PDF
    It is now accepted that the most effective video shot retrieval is based on indexing and retrieving clips using multiple, parallel modalities such as text-matching, image-matching and feature matching and then combining or fusing these parallel retrieval streams in some way. In this paper we investigate a range of fusion methods for combining based on multiple visual features (colour, edge and texture), for combining based on multiple visual examples in the query and for combining multiple modalities (text and visual). Using three TRECVid collections and the TRECVid search task, we specifically compare fusion methods based on normalised score and rank that use either the average, weighted average or maximum of retrieval results from a discrete Jelinek-Mercer smoothed language model. We also compare these results with a simple probability-based combination of the language model results that assumes all features and visual examples are fully independent
    corecore