644 research outputs found
Middle-level Fusion for Lightweight RGB-D Salient Object Detection
Most existing lightweight RGB-D salient object detection (SOD) models are
based on two-stream structure or single-stream structure. The former one first
uses two sub-networks to extract unimodal features from RGB and depth images,
respectively, and then fuses them for SOD. While, the latter one directly
extracts multi-modal features from the input RGB-D images and then focuses on
exploiting cross-level complementary information. However, two-stream structure
based models inevitably require more parameters and single-stream structure
based ones cannot well exploit the cross-modal complementary information since
they ignore the modality difference. To address these issues, we propose to
employ the middle-level fusion structure for designing lightweight RGB-D SOD
model in this paper, which first employs two sub-networks to extract low- and
middle-level unimodal features, respectively, and then fuses those extracted
middle-level unimodal features for extracting corresponding high-level
multi-modal features in the subsequent sub-network. Different from existing
models, this structure can effectively exploit the cross-modal complementary
information and significantly reduce the network's parameters, simultaneously.
Therefore, a novel lightweight SOD model is designed, which contains a
information-aware multi-modal feature fusion (IMFF) module for effectively
capturing the cross-modal complementary information and a lightweight
feature-level and decision-level feature fusion (LFDF) module for aggregating
the feature-level and the decision-level saliency information in different
stages with less parameters. Our proposed model has only 3.9M parameters and
runs at 33 FPS. The experimental results on several benchmark datasets verify
the effectiveness and superiority of the proposed method over some
state-of-the-art methods.Comment: 11 pages, 6 figure
CHITNet: A Complementary to Harmonious Information Transfer Network for Infrared and Visible Image Fusion
Current infrared and visible image fusion (IVIF) methods go to great lengths
to excavate complementary features and design complex fusion strategies, which
is extremely challenging. To this end, we rethink the IVIF outside the box,
proposing a complementary to harmonious information transfer network (CHITNet).
It reasonably transfers complementary information into harmonious one, which
integrates both the shared and complementary features from two modalities.
Specifically, to skillfully sidestep aggregating complementary information in
IVIF, we design a mutual information transfer (MIT) module to mutually
represent features from two modalities, roughly transferring complementary
information into harmonious one. Then, a harmonious information acquisition
supervised by source image (HIASSI) module is devised to further ensure the
complementary to harmonious information transfer after MIT. Meanwhile, we also
propose a structure information preservation (SIP) module to guarantee that the
edge structure information of the source images can be transferred to the
fusion results. Moreover, a mutual promotion training paradigm (MPTP) with
interaction loss is adopted to facilitate better collaboration among MIT,
HIASSI and SIP. In this way, the proposed method is able to generate fused
images with higher qualities. Extensive experimental results demonstrate the
superiority of our CHITNet over state-of-the-art algorithms in terms of visual
quality and quantitative evaluations
Deep learning-based image captioning for visually impaired people
Vision loss can affect people of all ages. Severe or complete vision loss may occur when the eye or brain parts that need to process images are damaged. In this paper, in order to facilitate the blind, deep learning algorithms are used to caption the image for the blind person in which the blind can know about the object, distance and position of object. Whenever an image is captured via the camera, the scenes are recognized and predicted by the machine. After the prediction, it will be sent as an audio output to the user. Thus, with the help of this paper an artificial vision to the blind, can be achieved and help them to gain confidence while travelling alone
- …