Search CORE

1,064 research outputs found

Learning Multimodal Structures in Computer Vision

Author: Taalimi Ali
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2017
Field of study

A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition

University of Tennessee, Knoxville: Trace

A Novel Multi-Focus Image Fusion Method Based on Stochastic Coordinate Coding and Local Density Peaks Clustering

Author
Publication venue: 'MDPI AG'
Publication date: 11/11/2016
Field of study

abstract: The multi-focus image fusion method is used in image processing to generate all-focus images that have large depth of field (DOF) based on original multi-focus images. Different approaches have been used in the spatial and transform domain to fuse multi-focus images. As one of the most popular image processing methods, dictionary-learning-based spare representation achieves great performance in multi-focus image fusion. Most of the existing dictionary-learning-based multi-focus image fusion methods directly use the whole source images for dictionary learning. However, it incurs a high error rate and high computation cost in dictionary learning process by using the whole source images. This paper proposes a novel stochastic coordinate coding-based image fusion framework integrated with local density peaks. The proposed multi-focus image fusion method consists of three steps. First, source images are split into small image patches, then the split image patches are classified into a few groups by local density peaks clustering. Next, the grouped image patches are used for sub-dictionary learning by stochastic coordinate coding. The trained sub-dictionaries are combined into a dictionary for sparse representation. Finally, the simultaneous orthogonal matching pursuit (SOMP) algorithm is used to carry out sparse representation. After the three steps, the obtained sparse coefficients are fused following the max L1-norm rule. The fused coefficients are inversely transformed to an image by using the learned dictionary. The results and analyses of comparison experiments demonstrate that fused images of the proposed method have higher qualities than existing state-of-the-art methods

ASU Digital Repository

Deep Image Translation with an Affinity-Based Change Prior for Unsupervised Multimodal Change Detection

Author: Anfinsen S. N.
Bianchi F. M.
Jenssen R.
Kampffmeyer M.
Luppino L. T.
Moser G.
Serpico S. B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Image translation with convolutional neural networks has recently been used as an approach to multimodal change detection. Existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. A main challenge in the unsupervised problem setting is to avoid that change pixels affect the learning of the translation function. We propose two new network architectures trained with loss functions weighted by priors that reduce the impact of change pixels on the learning objective. The change prior is derived in an unsupervised fashion from relational pixel information captured by domain-specific affinity matrices. Specifically, we use the vertex degrees associated with an absolute affinity difference matrix and demonstrate their utility in combination with cycle consistency and adversarial training. The proposed neural networks are compared with the state-of-the-art algorithms. Experiments conducted on three real data sets show the effectiveness of our methodology

Archivio istituzionale della ricerca - Università di Genova

A MEDICAL X-RAY IMAGE CLASSIFICATION AND RETRIEVAL SYSTEM

Author: Müller Henning
Zare Mohammad Reza
Publication venue: AIS Electronic Library (AISeL)
Publication date: 27/06/2016
Field of study

Medical image retrieval systems have gained high interest in the scientific community due to the advances in medical imaging technologies. The semantic gap is one of the biggest challenges in retrieval from large medical databases. This paper presents a retrieval system that aims at addressing this challenge by learning the main concept of every image in the medical database. The proposed system contains two modules: a classification/annotation and a retrieval module. The first module aims at classifying and subsequently annotating all medical images automatically. SIFT (Scale Invariant Feature Transform) and LBP (Local Binary Patterns) are two descriptors used in this process. Image-based and patch-based features are used as approaches to build a bag of words (BoW) using these descriptors. The impact on the classification performance is also evaluated. The results show that the classification accuracy obtained incorporating image-based integration techniques is higher than the accuracy obtained by other techniques. The retrieval module enables the search based on text, visual and multimodal queries. The text-based query supports retrieval of medical images based on categories, as it is carried out via the category that the images were annotated with, within the classification module. The multimodal query applies a late fusion technique on the retrieval results obtained from text-based and image-based queries. This fusion is used to enhance the retrieval performance by incorporating the advantages of both text-based and content-based image retrieval

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

AIS Electronic Library (AISeL)

Deep Image Translation With an Affinity-Based Change Prior for Unsupervised Multimodal Change Detection

Author: Anfinsen Stian Normann
Bianchi Filippo Maria
Jenssen Robert
Kampffmeyer Michael
Luppino Luigi Tommaso
Moser Gabriele
Serpico Sebastiano Bruno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Image translation with convolutional neural networks has recently been used as an approach to multimodal change detection. Existing approaches train the networks by exploiting supervised information of the change areas, which, however, is not always available. A main challenge in the unsupervised problem setting is to avoid that change pixels affect the learning of the translation function. We propose two new network architectures trained with loss functions weighted by priors that reduce the impact of change pixels on the learning objective. The change prior is derived in an unsupervised fashion from relational pixel information captured by domain-specific affinity matrices. Specifically, we use the vertex degrees associated with an absolute affinity difference matrix and demonstrate their utility in combination with cycle consistency and adversarial training. The proposed neural networks are compared with the state-of-the-art algorithms. Experiments conducted on three real data sets show the effectiveness of our methodology

arXiv.org e-Print Archive

Munin - Open Research Archive

NORA - Norwegian Open Research Archives