37,196 research outputs found

    Fast and Efficient Zero-Learning Image Fusion

    Full text link
    We propose a real-time image fusion method using pre-trained neural networks. Our method generates a single image containing features from multiple sources. We first decompose images into a base layer representing large scale intensity variations, and a detail layer containing small scale changes. We use visual saliency to fuse the base layers, and deep feature maps extracted from a pre-trained neural network to fuse the detail layers. We conduct ablation studies to analyze our method's parameters such as decomposition filters, weight construction methods, and network depth and architecture. Then, we validate its effectiveness and speed on thermal, medical, and multi-focus fusion. We also apply it to multiple image inputs such as multi-exposure sequences. The experimental results demonstrate that our technique achieves state-of-the-art performance in visual quality, objective assessment, and runtime efficiency.Comment: 13 pages, 10 figure

    The bilateral solver for quality estimation based multi-focus image fusion

    Full text link
    In this work, a fast Bilateral Solver for Quality Estimation Based multi-focus Image Fusion method (BS-QEBIF) is proposed. The all-in-focus image is generated by pixel-wise summing up the multi-focus source images with their focus-levels maps as weights. Since the visual quality of an image patch is highly correlated with its focus level, the focus-level maps are preliminarily obtained based on visual quality scores, as pre-estimations. However, the pre-estimations are not ideal. Thus the fast bilateral solver is then adopted to smooth the pre-estimations, and edges in the multi-focus source images can be preserved simultaneously. The edge-preserving smoothed results are utilized as final focus-level maps. Moreover, this work provides a confidence-map solution for the unstable fusion in the focus-level-changed boundary regions. Experiments were conducted on 2525 pairs of source images. The proposed BS-QEBIF outperforms the other 1313 fusion methods objectively and subjectively. The all-in-focus image produced by the proposed method can well maintain the details in the multi-focus source images and does not suffer from any residual errors. Experimental results show that BS-QEBIF can handle the focus-level-changed boundary regions without any blocking, ringing and blurring artifacts

    Multisource and Multitemporal Data Fusion in Remote Sensing

    Full text link
    The sharp and recent increase in the availability of data captured by different sensors combined with their considerably heterogeneous natures poses a serious challenge for the effective and efficient processing of remotely sensed data. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand. Multisource data fusion has, therefore, received enormous attention from researchers worldwide for a wide variety of applications. Moreover, thanks to the revisit capability of several spaceborne sensors, the integration of the temporal information with the spatial and/or spectral/backscattering information of the remotely sensed data is possible and helps to move from a representation of 2D/3D data to 4D data structures, where the time variable adds new information as well as challenges for the information extraction algorithms. There are a huge number of research works dedicated to multisource and multitemporal data fusion, but the methods for the fusion of different modalities have expanded in different paths according to each research community. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels (i.e., students, researchers, and senior researchers) willing to conduct novel investigations on this challenging topic by supplying sufficient detail and references

    Learning Multi-Modal Word Representation Grounded in Visual Context

    Full text link
    Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results

    Multimodal Classification of Events in Social Media

    Full text link
    A large amount of social media hosted on platforms like Flickr and Instagram is related to social events. The task of social event classification refers to the distinction of event and non-event-related content as well as the classification of event types (e.g. sports events, concerts, etc.). In this paper, we provide an extensive study of textual, visual, as well as multimodal representations for social event classification. We investigate strengths and weaknesses of the modalities and study synergy effects between the modalities. Experimental results obtained with our multimodal representation outperform state-of-the-art methods and provide a new baseline for future research.Comment: Preprint of accepted manuscript for the Elsevier Image and Vision Computing Journal (IMAVIS). The paper will be published by IMAVIS under DOI 10.1016/j.imavis.2015.12.00

    Latent Variable Algorithms for Multimodal Learning and Sensor Fusion

    Full text link
    Multimodal learning has been lacking principled ways of combining information from different modalities and learning a low-dimensional manifold of meaningful representations. We study multimodal learning and sensor fusion from a latent variable perspective. We first present a regularized recurrent attention filter for sensor fusion. This algorithm can dynamically combine information from different types of sensors in a sequential decision making task. Each sensor is bonded with a modular neural network to maximize utility of its own information. A gating modular neural network dynamically generates a set of mixing weights for outputs from sensor networks by balancing utility of all sensors' information. We design a co-learning mechanism to encourage co-adaption and independent learning of each sensor at the same time, and propose a regularization based co-learning method. In the second part, we focus on recovering the manifold of latent representation. We propose a co-learning approach using probabilistic graphical model which imposes a structural prior on the generative model: multimodal variational RNN (MVRNN) model, and derive a variational lower bound for its objective functions. In the third part, we extend the siamese structure to sensor fusion for robust acoustic event detection. We perform experiments to investigate the latent representations that are extracted; works will be done in the following months. Our experiments show that the recurrent attention filter can dynamically combine different sensor inputs according to the information carried in the inputs. We consider MVRNN can identify latent representations that are useful for many downstream tasks such as speech synthesis, activity recognition, and control and planning. Both algorithms are general frameworks which can be applied to other tasks where different types of sensors are jointly used for decision making

    Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data

    Full text link
    With the increasing availability of various sensor technologies, we now have access to large amounts of multi-block (also called multi-set, multi-relational, or multi-view) data that need to be jointly analyzed to explore their latent connections. Various component analysis methods have played an increasingly important role for the analysis of such coupled data. In this paper, we first provide a brief review of existing matrix-based (two-way) component analysis methods for the joint analysis of such data with a focus on biomedical applications. Then, we discuss their important extensions and generalization to multi-block multiway (tensor) data. We show how constrained multi-block tensor decomposition methods are able to extract similar or statistically dependent common features that are shared by all blocks, by incorporating the multiway nature of data. Special emphasis is given to the flexible common and individual feature analysis of multi-block data with the aim to simultaneously extract common and individual latent components with desired properties and types of diversity. Illustrative examples are given to demonstrate their effectiveness for biomedical data analysis.Comment: 20 pages, 11 figures, Proceedings of the IEEE, 201

    Scale-Invariant Structure Saliency Selection for Fast Image Fusion

    Full text link
    In this paper, we present a fast yet effective method for pixel-level scale-invariant image fusion in spatial domain based on the scale-space theory. Specifically, we propose a scale-invariant structure saliency selection scheme based on the difference-of-Gaussian (DoG) pyramid of images to build the weights or activity map. Due to the scale-invariant structure saliency selection, our method can keep both details of small size objects and the integrity information of large size objects in images. In addition, our method is very efficient since there are no complex operation involved and easy to be implemented and therefore can be used for fast high resolution images fusion. Experimental results demonstrate the proposed method yields competitive or even better results comparing to state-of-the-art image fusion methods both in terms of visual quality and objective evaluation metrics. Furthermore, the proposed method is very fast and can be used to fuse the high resolution images in real-time. Code is available at https://github.com/yiqingmy/Fusion

    EEG-based Intention Recognition from Spatio-Temporal Representations via Cascade and Parallel Convolutional Recurrent Neural Networks

    Full text link
    Brain-Computer Interface (BCI) is a system empowering humans to communicate with or control the outside world with exclusively brain intentions. Electroencephalography (EEG) based BCIs are promising solutions due to their convenient and portable instruments. Motor imagery EEG (MI-EEG) is a kind of most widely focused EEG signals, which reveals a subjects movement intentions without actual actions. Despite the extensive research of MI-EEG in recent years, it is still challenging to interpret EEG signals effectively due to the massive noises in EEG signals (e.g., low signal noise ratio and incomplete EEG signals), and difficulties in capturing the inconspicuous relationships between EEG signals and certain brain activities. Most existing works either only consider EEG as chain-like sequences neglecting complex dependencies between adjacent signals or performing simple temporal averaging over EEG sequences. In this paper, we introduce both cascade and parallel convolutional recurrent neural network models for precisely identifying human intended movements by effectively learning compositional spatio-temporal representations of raw EEG streams. The proposed models grasp the spatial correlations between physically neighboring EEG signals by converting the chain like EEG sequences into a 2D mesh like hierarchy. An LSTM based recurrent network is able to extract the subtle temporal dependencies of EEG data streams. Extensive experiments on a large-scale MI-EEG dataset (108 subjects, 3,145,160 EEG records) have demonstrated that both models achieve high accuracy near 98.3% and outperform a set of baseline methods and most recent deep learning based EEG recognition models, yielding a significant accuracy increase of 18% in the cross-subject validation scenario.Comment: 8 pages, 3 figure

    Discriminative Representation Combinations for Accurate Face Spoofing Detection

    Full text link
    Three discriminative representations for face presentation attack detection are introduced in this paper. Firstly we design a descriptor called spatial pyramid coding micro-texture (SPMT) feature to characterize local appearance information. Secondly we utilize the SSD, which is a deep learning framework for detection, to excavate context cues and conduct end-to-end face presentation attack detection. Finally we design a descriptor called template face matched binocular depth (TFBD) feature to characterize stereo structures of real and fake faces. For accurate presentation attack detection, we also design two kinds of representation combinations. Firstly, we propose a decision-level cascade strategy to combine SPMT with SSD. Secondly, we use a simple score fusion strategy to combine face structure cues (TFBD) with local micro-texture features (SPMT). To demonstrate the effectiveness of our design, we evaluate the representation combination of SPMT and SSD on three public datasets, which outperforms all other state-of-the-art methods. In addition, we evaluate the representation combination of SPMT and TFBD on our dataset and excellent performance is also achieved.Comment: To be published in Pattern Recognitio
    corecore