69 research outputs found
Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes
Infrared and visible image fusion has emerged as a prominent research in
computer vision. However, little attention has been paid on complex scenes
fusion, causing existing techniques to produce sub-optimal results when suffers
from real interferences. To fill this gap, we propose a decomposition-based and
interference perception image fusion method. Specifically, we classify the
pixels of visible image from the degree of scattering of light transmission,
based on which we then separate the detail and energy information of the image.
This refined decomposition facilitates the proposed model in identifying more
interfering pixels that are in complex scenes. To strike a balance between
denoising and detail preservation, we propose an adaptive denoising scheme for
fusing detail components. Meanwhile, we propose a new weighted fusion rule by
considering the distribution of image energy information from the perspective
of multiple directions. Extensive experiments in complex scenes fusions cover
adverse weathers, noise, blur, overexposure, fire, as well as downstream tasks
including semantic segmentation, object detection, salient object detection and
depth estimation, consistently indicate the effectiveness and superiority of
the proposed method compared with the recent representative methods
SAMF: Small-Area-Aware Multi-focus Image Fusion for Object Detection
Existing multi-focus image fusion (MFIF) methods often fail to preserve the
uncertain transition region and detect small focus areas within large defocused
regions accurately. To address this issue, this study proposes a new
small-area-aware MFIF algorithm for enhancing object detection capability.
First, we enhance the pixel attributes within the small focus and boundary
regions, which are subsequently combined with visual saliency detection to
obtain the pre-fusion results used to discriminate the distribution of focused
pixels. To accurately ensure pixel focus, we consider the source image as a
combination of focused, defocused, and uncertain regions and propose a
three-region segmentation strategy. Finally, we design an effective pixel
selection rule to generate segmentation decision maps and obtain the final
fusion results. Experiments demonstrated that the proposed method can
accurately detect small and smooth focus areas while improving object detection
performance, outperforming existing methods in both subjective and objective
evaluations. The source code is available at https://github.com/ixilai/SAMF.Comment: Accepted to International Conference on Acoustics, Speech and Signal
Processing (ICASSP) 202
Physical Perception Network and an All-weather Multi-modality Benchmark for Adverse Weather Image Fusion
Multi-modality image fusion (MMIF) integrates the complementary information
from different modal images to provide comprehensive and objective
interpretation of a scenes. However, existing MMIF methods lack the ability to
resist different weather interferences in real-life scenarios, preventing them
from being useful in practical applications such as autonomous driving. To
bridge this research gap, we proposed an all-weather MMIF model. Regarding deep
learning architectures, their network designs are often viewed as a black box,
which limits their multitasking capabilities. For deweathering module, we
propose a physically-aware clear feature prediction module based on an
atmospheric scattering model that can deduce variations in light transmittance
from both scene illumination and depth. For fusion module, We utilize a
learnable low-rank representation model to decompose images into low-rank and
sparse components. This highly interpretable feature separation allows us to
better observe and understand images. Furthermore, we have established a
benchmark for MMIF research under extreme weather conditions. It encompasses
multiple scenes under three types of weather: rain, haze, and snow, with each
weather condition further subdivided into various impact levels. Extensive
fusion experiments under adverse weather demonstrate that the proposed
algorithm has excellent detail recovery and multi-modality feature extraction
capabilities
Dynamic Chuck Convolution For Unified Streaming And Non-streaming Conformer ASR
Recently, there has been an increasing interest in unifying streaming and
non-streaming speech recognition models to reduce development, training and
deployment cost. The best-known approaches rely on either window-based or
dynamic chunk-based attention strategy and causal convolutions to minimize the
degradation due to streaming. However, the performance gap still remains
relatively large between non-streaming and a full-contextual model trained
independently. To address this, we propose a dynamic chunk-based convolution
replacing the causal convolution in a hybrid Connectionist Temporal
Classification (CTC)-Attention Conformer architecture. Additionally, we
demonstrate further improvements through initialization of weights from a
full-contextual model and parallelization of the convolution and self-attention
modules. We evaluate our models on the open-source Voxpopuli, LibriSpeech and
in-house conversational datasets. Overall, our proposed model reduces the
degradation of the streaming mode over the non-streaming full-contextual model
from 41.7% and 45.7% to 16.7% and 26.2% on the LibriSpeech test-clean and
test-other datasets respectively, while improving by a relative 15.5% WER over
the previous state-of-the-art unified model.Comment: 5 pages, 3 figures, 2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP 2023
Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
Multi-modal image fusion (MMIF) integrates valuable information from
different modality images into a fused one. However, the fusion of multiple
visible images with different focal regions and infrared images is a
unprecedented challenge in real MMIF applications. This is because of the
limited depth of the focus of visible optical lenses, which impedes the
simultaneous capture of the focal information within the same scene. To address
this issue, in this paper, we propose a MMIF framework for joint focused
integration and modalities information extraction. Specifically, a
semi-sparsity-based smoothing filter is introduced to decompose the images into
structure and texture components. Subsequently, a novel multi-scale operator is
proposed to fuse the texture components, capable of detecting significant
information by considering the pixel focus attributes and relevant data from
various modal images. Additionally, to achieve an effective capture of scene
luminance and reasonable contrast maintenance, we consider the distribution of
energy information in the structural components in terms of multi-directional
frequency variance and information entropy. Extensive experiments on existing
MMIF datasets, as well as the object detection and depth estimation tasks,
consistently demonstrate that the proposed algorithm can surpass the
state-of-the-art methods in visual perception and quantitative evaluation. The
code is available at https://github.com/ixilai/MFIF-MMIF.Comment: Accepted to IEEE/CVF Winter Conference on Applications of Computer
Vision (WACV) 202
- …