231 research outputs found
DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression
We propose a new architecture for distributed image compression from a group
of distributed data sources. The work is motivated by practical needs of
data-driven codec design, low power consumption, robustness, and data privacy.
The proposed architecture, which we refer to as Distributed Recurrent
Autoencoder for Scalable Image Compression (DRASIC), is able to train
distributed encoders and one joint decoder on correlated data sources. Its
compression capability is much better than the method of training codecs
separately. Meanwhile, the performance of our distributed system with 10
distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of
the performance of a single codec trained with all data sources. We experiment
distributed sources with different correlations and show how our data-driven
methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding
(DSC). To the best of our knowledge, this is the first data-driven DSC
framework for general distributed code design with deep learning
Does Thermal Really Always Matter for RGB-T Salient Object Detection?
In recent years, RGB-T salient object detection (SOD) has attracted
continuous attention, which makes it possible to identify salient objects in
environments such as low light by introducing thermal image. However, most of
the existing RGB-T SOD models focus on how to perform cross-modality feature
fusion, ignoring whether thermal image is really always matter in SOD task.
Starting from the definition and nature of this task, this paper rethinks the
connotation of thermal modality, and proposes a network named TNet to solve the
RGB-T SOD task. In this paper, we introduce a global illumination estimation
module to predict the global illuminance score of the image, so as to regulate
the role played by the two modalities. In addition, considering the role of
thermal modality, we set up different cross-modality interaction mechanisms in
the encoding phase and the decoding phase. On the one hand, we introduce a
semantic constraint provider to enrich the semantics of thermal images in the
encoding phase, which makes thermal modality more suitable for the SOD task. On
the other hand, we introduce a two-stage localization and complementation
module in the decoding phase to transfer object localization cue and internal
integrity cue in thermal features to the RGB modality. Extensive experiments on
three datasets show that the proposed TNet achieves competitive performance
compared with 20 state-of-the-art methods.Comment: Accepted by IEEE Trans. Multimedia 2022, 13 pages, 9 figure
An underwater image enhancement by reducing speckle noise using modified anisotropic diffusion filter
Underwater images are usually suffering from the issues of quality degradation, such as low contrast due to blurring details, color deviations, non-uniform lighting, and noise. Since last few decades, many researches are undergoing for restoration and enhancement for degraded underwater images. In this paper, we proposed a novel algorithm using modified anisotropic diffusion filter with dynamic color balancing strategy. This proposed algorithm performs based on an employing effective noise reduction as well as edge preserving technique with dynamic color correction to make uniform lighting and minimize the speckle noise. Furthermore, reanalyze the contributions and limitations of existing underwater image restoration and enhancement methods. Finally, in this research provided the detailed objective evaluations and compared with the various underwater scenarios for above said challenges also made subjective studies, which shows that our proposed method will improve the quality of the image and significantly enhanced the image
Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art
Transformers have rapidly gained popularity in computer vision, especially in
the field of object recognition and detection. Upon examining the outcomes of
state-of-the-art object detection methods, we noticed that transformers
consistently outperformed well-established CNN-based detectors in almost every
video or image dataset. While transformer-based approaches remain at the
forefront of small object detection (SOD) techniques, this paper aims to
explore the performance benefits offered by such extensive networks and
identify potential reasons for their SOD superiority. Small objects have been
identified as one of the most challenging object types in detection frameworks
due to their low visibility. We aim to investigate potential strategies that
could enhance transformers' performance in SOD. This survey presents a taxonomy
of over 60 research studies on developed transformers for the task of SOD,
spanning the years 2020 to 2023. These studies encompass a variety of detection
applications, including small object detection in generic images, aerial
images, medical images, active millimeter images, underwater images, and
videos. We also compile and present a list of 12 large-scale datasets suitable
for SOD that were overlooked in previous studies and compare the performance of
the reviewed studies using popular metrics such as mean Average Precision
(mAP), Frames Per Second (FPS), number of parameters, and more. Researchers can
keep track of newer studies on our web page, which is available at
\url{https://github.com/arekavandi/Transformer-SOD}
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection
By integrating complementary information from RGB image and depth map, the
ability of salient object detection (SOD) for complex and challenging scenes
can be improved. In recent years, the important role of Convolutional Neural
Networks (CNNs) in feature extraction and cross-modality interaction has been
fully explored, but it is still insufficient in modeling global long-range
dependencies of self-modality and cross-modality. To this end, we introduce
CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network
with Point-aware Interaction and CNN-induced Refinement (PICR-Net). On the one
hand, considering the prior correlation between RGB modality and depth
modality, an attention-triggered cross-modality point-aware interaction (CmPI)
module is designed to explore the feature interaction of different modalities
with positional constraints. On the other hand, in order to alleviate the block
effect and detail destruction problems brought by the Transformer naturally, we
design a CNN-induced refinement (CNNR) unit for content refinement and
supplementation. Extensive experiments on five RGB-D SOD datasets show that the
proposed network achieves competitive results in both quantitative and
qualitative comparisons.Comment: Accepted by ACM MM 202
- …