7,700 research outputs found

    Learning Multimodal Word Representation via Dynamic Fusion Methods

    Full text link
    Multimodal models have been proven to outperform text-based models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose three novel dynamic fusion methods to assign importance weights to each modality, in which weights are learned under the weak supervision of word association pairs. The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.Comment: To be appear in AAAI-1

    Deep multimodal biometric recognition using contourlet derivative weighted rank fusion with human face, fingerprint and iris images

    Get PDF
    The goal of multimodal biometric recognition system is to make a decision by identifying their physiological behavioural traits. Nevertheless, the decision-making process by biometric recognition system can be extremely complex due to high dimension unimodal features in temporal domain. This paper explains a deep multimodal biometric system for human recognition using three traits, face, fingerprint and iris. With the objective of reducing the feature vector dimension in the temporal domain, first pre-processing is performed using Contourlet Transform Model. Next, Local Derivative Ternary Pattern model is applied to the pre-processed features where the feature discrimination power is improved by obtaining the coefficients that has maximum variation across pre-processed multimodality features, therefore improving recognition accuracy. Weighted Rank Level Fusion is applied to the extracted multimodal features, that efficiently combine the biometric matching scores from several modalities (i.e. face, fingerprint and iris). Finally, a deep learning framework is presented for improving the recognition rate of the multimodal biometric system in temporal domain. The results of the proposed multimodal biometric recognition framework were compared with other multimodal methods. Out of these comparisons, the multimodal face, fingerprint and iris fusion offers significant improvements in the recognition rate of the suggested multimodal biometric system

    AgriColMap: Aerial-Ground Collaborative 3D Mapping for Precision Farming

    Full text link
    The combination of aerial survey capabilities of Unmanned Aerial Vehicles with targeted intervention abilities of agricultural Unmanned Ground Vehicles can significantly improve the effectiveness of robotic systems applied to precision agriculture. In this context, building and updating a common map of the field is an essential but challenging task. The maps built using robots of different types show differences in size, resolution and scale, the associated geolocation data may be inaccurate and biased, while the repetitiveness of both visual appearance and geometric structures found within agricultural contexts render classical map merging techniques ineffective. In this paper we propose AgriColMap, a novel map registration pipeline that leverages a grid-based multimodal environment representation which includes a vegetation index map and a Digital Surface Model. We cast the data association problem between maps built from UAVs and UGVs as a multimodal, large displacement dense optical flow estimation. The dominant, coherent flows, selected using a voting scheme, are used as point-to-point correspondences to infer a preliminary non-rigid alignment between the maps. A final refinement is then performed, by exploiting only meaningful parts of the registered maps. We evaluate our system using real world data for 3 fields with different crop species. The results show that our method outperforms several state of the art map registration and matching techniques by a large margin, and has a higher tolerance to large initial misalignments. We release an implementation of the proposed approach along with the acquired datasets with this paper.Comment: Published in IEEE Robotics and Automation Letters, 201

    Navite: A Neural Network System For Sensory-Based Robot Navigation

    Full text link
    A neural network system, NAVITE, for incremental trajectory generation and obstacle avoidance is presented. Unlike other approaches, the system is effective in unstructured environments. Multimodal inforrnation from visual and range data is used for obstacle detection and to eliminate uncertainty in the measurements. Optimal paths are computed without explicitly optimizing cost functions, therefore reducing computational expenses. Simulations of a planar mobile robot (including the dynamic characteristics of the plant) in obstacle-free and object avoidance trajectories are presented. The system can be extended to incorporate global map information into the local decision-making process.Defense Advanced Research Projects Agency (AFOSR 90-0083); Office of Naval Research (N00014-92-J-l309); Consejo Nacional de Ciencia y TecnologĂ­a (63l462
    • …
    corecore