62,184 research outputs found
Structural similarity loss for learning to fuse multi-focus images
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks
Subjectively optimised multi-exposure and multi-focus image fusion with compensation for camera shake
Multi-exposure image fusion algorithms are used for enhancing the perceptual quality of an image captured by sensors of limited dynamic range. This is achieved by rendering a single scene based on multiple images captured at different exposure times. Similarly, multi-focus image fusion is used when the limited depth of focus on a selected focus setting of a camera results in parts of an image being out of focus. The solution adopted is to fuse together a number of multi-focus images to create an image that is focused throughout. In this paper we propose a single algorithm that can perform both multi-focus and multi-exposure image fusion. This algorithm is a novel approach in which a set of unregistered multiexposure/focus images is first registered before being fused. The registration of images is done via identifying matching key points in constituent images using Scale Invariant Feature Transforms (SIFT). The RANdom SAmple Consensus (RANSAC) algorithm is used to identify inliers of SIFT key points removing outliers that can cause errors in the
registration process. Finally we use the Coherent Point Drift algorithm to register the images, preparing them to be fused
in the subsequent fusion stage. For the fusion of images, a novel approach based on an improved version of a Wavelet Based Contourlet Transform (WBCT) is used. The experimental results as follows prove that the proposed algorithm is capable of producing HDR, or multi-focus images by registering and fusing a set of multi-exposure or multi-focus images taken in the presence of camera shake
A Novel Multi-Focus Image Fusion Method Based on Stochastic Coordinate Coding and Local Density Peaks Clustering
abstract: The multi-focus image fusion method is used in image processing to generate all-focus images that have large depth of field (DOF) based on original multi-focus images. Different approaches have been used in the spatial and transform domain to fuse multi-focus images. As one of the most popular image processing methods, dictionary-learning-based spare representation achieves great performance in multi-focus image fusion. Most of the existing dictionary-learning-based multi-focus image fusion methods directly use the whole source images for dictionary learning. However, it incurs a high error rate and high computation cost in dictionary learning process by using the whole source images. This paper proposes a novel stochastic coordinate coding-based image fusion framework integrated with local density peaks. The proposed multi-focus image fusion method consists of three steps. First, source images are split into small image patches, then the split image patches are classified into a few groups by local density peaks clustering. Next, the grouped image patches are used for sub-dictionary learning by stochastic coordinate coding. The trained sub-dictionaries are combined into a dictionary for sparse representation. Finally, the simultaneous orthogonal matching pursuit (SOMP) algorithm is used to carry out sparse representation. After the three steps, the obtained sparse coefficients are fused following the max L1-norm rule. The fused coefficients are inversely transformed to an image by using the learned dictionary. The results and analyses of comparison experiments demonstrate that fused images of the proposed method have higher qualities than existing state-of-the-art methods
Exploit the Best of Both End-to-End and Map-Based Methods for Multi-Focus Image Fusion
Multi-focus image fusion is a technique to fuse the images focused on different depth ranges to generate an all-in-focus image. Existing deep learning approaches to multi-focus image fusion can be categorized as end-to-end methods and decision map based methods. End-to-end methods can generate natural fusion near the focus-defocus boundaries (FDB), but the output is often inconsistent with the input in the areas far from the boundaries (FFB). On the contrary, decision map based methods can preserve original images in the FFB areas, but often generate artifacts near the FDB. In this paper, we propose a dual-branch network for multi-focus image fusion (DB-MFIF) to exploit the best of both worlds, achieving better results in both FDB and FFB areas, i.e. with naturally sharper FDB areas and more consistent FFB areas with the inputs. In our DB-MFIF, an end-to-end branch and a decision map based branch are proposed to mutually assist each other. In addition, to this end, two map-based loss functions are also proposed. Experiments show that our method surpasses existing algorithms on multiple datasets, both qualitatively and quantitatively, and achieves the state-of-the-art performance. The code and model is available on GitHub: https://github.com/Zancelot/DB-MFIF
Perceptual Based Image Fusion with Applications to Hyperspectral Image Data
Development of new imaging sensors has created a need for image processing techniques that can fuse images from different sensors or multiple images produced by the same sensor. The methods presented here focus on combining image data from the Airborne Visual and Infrared Imaging Spectrometer (AVIRIS) hyperspectral sensor into a single or smaller subset of images while maintaining the visual information necessary for human analysis. Three hierarchical multi-resolution image fusion techniques are implemented and tested using the AVIRIS image data and test images that contain various levels of correlated or uncorrelated noise. Two of the algorithms are published fusion methods that combine images from multiple sensors. The third method was developed to fuse any co-registered image data. This new method uses the spatial frequency response (contrast sensitivity) of the human visual system to determine which parts of the input images contain the salient features that need to be preserved in the composite image(s). After analyzing the signal-to-noise ratios and visual aesthetics of the fused images, contrast sensitivity based fusion is shown to provide excellent fusion results and, in every case, clearly outperformed the other two methods. Finally, as an illustrative example of how the fusion techniques are independent of the hyperspectral application, they are applied to fusing multiple polarimetric images from a Synthetic Aperture Radar to enhance automated targeting techniques
Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
Multi-modal image fusion (MMIF) integrates valuable information from
different modality images into a fused one. However, the fusion of multiple
visible images with different focal regions and infrared images is a
unprecedented challenge in real MMIF applications. This is because of the
limited depth of the focus of visible optical lenses, which impedes the
simultaneous capture of the focal information within the same scene. To address
this issue, in this paper, we propose a MMIF framework for joint focused
integration and modalities information extraction. Specifically, a
semi-sparsity-based smoothing filter is introduced to decompose the images into
structure and texture components. Subsequently, a novel multi-scale operator is
proposed to fuse the texture components, capable of detecting significant
information by considering the pixel focus attributes and relevant data from
various modal images. Additionally, to achieve an effective capture of scene
luminance and reasonable contrast maintenance, we consider the distribution of
energy information in the structural components in terms of multi-directional
frequency variance and information entropy. Extensive experiments on existing
MMIF datasets, as well as the object detection and depth estimation tasks,
consistently demonstrate that the proposed algorithm can surpass the
state-of-the-art methods in visual perception and quantitative evaluation. The
code is available at https://github.com/ixilai/MFIF-MMIF.Comment: Accepted to IEEE/CVF Winter Conference on Applications of Computer
Vision (WACV) 202
LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition
Recently, 3D shape understanding has achieved significant progress due to the
advances of deep learning models on various data formats like images, voxels,
and point clouds. Among them, point clouds and multi-view images are two
complementary modalities of 3D objects and learning representations by fusing
both of them has been proven to be fairly effective. While prior works
typically focus on exploiting global features of the two modalities, herein we
argue that more discriminative features can be derived by modeling ``where to
fuse''. To investigate this, we propose a novel Locality-Aware Point-View
Fusion Transformer (LATFormer) for 3D shape retrieval and classification. The
core component of LATFormer is a module named Locality-Aware Fusion (LAF) which
integrates the local features of correlated regions across the two modalities
based on the co-occurrence scores. We further propose to filter out scores with
low values to obtain salient local co-occurring regions, which reduces
redundancy for the fusion process. In our LATFormer, we utilize the LAF module
to fuse the multi-scale features of the two modalities both bidirectionally and
hierarchically to obtain more informative features. Comprehensive experiments
on four popular 3D shape benchmarks covering 3D object retrieval and
classification validate its effectiveness
- …