2,777 research outputs found
Semantic-aware Texture-Structure Feature Collaboration for Underwater Image Enhancement
Underwater image enhancement has become an attractive topic as a significant
technology in marine engineering and aquatic robotics. However, the limited
number of datasets and imperfect hand-crafted ground truth weaken its
robustness to unseen scenarios, and hamper the application to high-level vision
tasks. To address the above limitations, we develop an efficient and compact
enhancement network in collaboration with a high-level semantic-aware
pretrained model, aiming to exploit its hierarchical feature representation as
an auxiliary for the low-level underwater image enhancement. Specifically, we
tend to characterize the shallow layer features as textures while the deep
layer features as structures in the semantic-aware model, and propose a
multi-path Contextual Feature Refinement Module (CFRM) to refine features in
multiple scales and model the correlation between different features. In
addition, a feature dominative network is devised to perform channel-wise
modulation on the aggregated texture and structure features for the adaptation
to different feature patterns of the enhancement network. Extensive experiments
on benchmarks demonstrate that the proposed algorithm achieves more appealing
results and outperforms state-of-the-art methods by large margins. We also
apply the proposed algorithm to the underwater salient object detection task to
reveal the favorable semantic-aware ability for high-level vision tasks. The
code is available at STSC.Comment: Accepted by ICRA202
Top-Quark Decay at Next-to-Next-to-Next-to-Leading Order in QCD
We present the first complete high-precision QCD corrections to the inclusive
decay width , the -helicity fractions
and semi-inclusive distributions for the top-quark decay
process at NNNLO in the strong
coupling constant . In particular, the pure NNNLO QCD correction
decreases the by about of the previous NNLO result
at the top-quark pole mass scale, exceeding the error estimated by the usual
scale-variation prescription. After taking into account all sources of errors,
we get , the error of which meets the request by future
colliders. On the other hand, the NNNLO QCD effects on are
found to be much smaller, at the level of one per-mille for the dominating
, predestining them to act as precision observables for the top-quark
decay process.Comment: 7 pages, 3 figure
AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware Robust Adversarial Training
Monocular 3D object detection plays a pivotal role in the field of autonomous
driving and numerous deep learning-based methods have made significant
breakthroughs in this area. Despite the advancements in detection accuracy and
efficiency, these models tend to fail when faced with such attacks, rendering
them ineffective. Therefore, bolstering the adversarial robustness of 3D
detection models has become a crucial issue that demands immediate attention
and innovative solutions. To mitigate this issue, we propose a depth-aware
robust adversarial training method for monocular 3D object detection, dubbed
DART3D. Specifically, we first design an adversarial attack that iteratively
degrades the 2D and 3D perception capabilities of 3D object detection
models(IDP), serves as the foundation for our subsequent defense mechanism. In
response to this attack, we propose an uncertainty-based residual learning
method for adversarial training. Our adversarial training approach capitalizes
on the inherent uncertainty, enabling the model to significantly improve its
robustness against adversarial attacks. We conducted extensive experiments on
the KITTI 3D datasets, demonstrating that DART3D surpasses direct adversarial
training (the most popular approach) under attacks in 3D object detection
of car category for the Easy, Moderate, and Hard settings, with
improvements of 4.415%, 4.112%, and 3.195%, respectively
Improving Misaligned Multi-modality Image Fusion with One-stage Progressive Dense Registration
Misalignments between multi-modality images pose challenges in image fusion,
manifesting as structural distortions and edge ghosts. Existing efforts
commonly resort to registering first and fusing later, typically employing two
cascaded stages for registration,i.e., coarse registration and fine
registration. Both stages directly estimate the respective target deformation
fields. In this paper, we argue that the separated two-stage registration is
not compact, and the direct estimation of the target deformation fields is not
accurate enough. To address these challenges, we propose a Cross-modality
Multi-scale Progressive Dense Registration (C-MPDR) scheme, which accomplishes
the coarse-to-fine registration exclusively using a one-stage optimization,
thus improving the fusion performance of misaligned multi-modality images.
Specifically, two pivotal components are involved, a dense Deformation Field
Fusion (DFF) module and a Progressive Feature Fine (PFF) module. The DFF
aggregates the predicted multi-scale deformation sub-fields at the current
scale, while the PFF progressively refines the remaining misaligned features.
Both work together to accurately estimate the final deformation fields. In
addition, we develop a Transformer-Conv-based Fusion (TCF) subnetwork that
considers local and long-range feature dependencies, allowing us to capture
more informative features from the registered infrared and visible images for
the generation of high-quality fused images. Extensive experimental analysis
demonstrates the superiority of the proposed method in the fusion of misaligned
cross-modality images
Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for Loss-free Multi-Exposure Image Fusion
Multi-exposure image fusion (MEF) has emerged as a prominent solution to
address the limitations of digital imaging in representing varied exposure
levels. Despite its advancements, the field grapples with challenges, notably
the reliance on manual designs for network structures and loss functions, and
the constraints of utilizing simulated reference images as ground truths.
Consequently, current methodologies often suffer from color distortions and
exposure artifacts, further complicating the quest for authentic image
representation. In addressing these challenges, this paper presents a
Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which
introduces a bi-level optimization search scheme for automatic design of both
network structures and loss functions. More specifically, we harnesses a unique
dual research mechanism rooted in a novel weighted structure refinement
architecture search. Besides, a hybrid supervised contrast constraint
seamlessly guides and integrates with searching process, facilitating a more
adaptive and comprehensive search for optimal loss functions. We realize the
state-of-the-art performance in comparison to various competitive schemes,
yielding a 10.61% and 4.38% improvement in Visual Information Fidelity (VIF)
for general and no-reference scenarios, respectively, while providing results
with high contrast, rich details and colors
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement
Enhancing images in low-light scenes is a challenging but widely concerned
task in the computer vision. The mainstream learning-based methods mainly
acquire the enhanced model by learning the data distribution from the specific
scenes, causing poor adaptability (even failure) when meeting real-world
scenarios that have never been encountered before. The main obstacle lies in
the modeling conundrum from distribution discrepancy across different scenes.
To remedy this, we first explore relationships between diverse low-light scenes
based on statistical analysis, i.e., the network parameters of the encoder
trained in different data distributions are close. We introduce the bilevel
paradigm to model the above latent correspondence from the perspective of
hyperparameter optimization. A bilevel learning framework is constructed to
endow the scene-irrelevant generality of the encoder towards diverse scenes
(i.e., freezing the encoder in the adaptation and testing phases). Further, we
define a reinforced bilevel learning framework to provide a meta-initialization
for scene-specific decoder to further ameliorate visual quality. Moreover, to
improve the practicability, we establish a Retinex-induced architecture with
adaptive denoising and apply our built learning framework to acquire its
parameters by using two training losses including supervised and unsupervised
forms. Extensive experimental evaluations on multiple datasets verify our
adaptability and competitive performance against existing state-of-the-art
works. The code and datasets will be available at
https://github.com/vis-opt-group/BL
- …