244 research outputs found
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction
The correction of exposure-related issues is a pivotal component in enhancing
the quality of images, offering substantial implications for various computer
vision tasks. Historically, most methodologies have predominantly utilized
spatial domain recovery, offering limited consideration to the potentialities
of the frequency domain. Additionally, there has been a lack of a unified
perspective towards low-light enhancement, exposure correction, and
multi-exposure fusion, complicating and impeding the optimization of image
processing. In response to these challenges, this paper proposes a novel
methodology that leverages the frequency domain to improve and unify the
handling of exposure correction tasks. Our method introduces Holistic Frequency
Attention and Dynamic Frequency Feed-Forward Network, which replace
conventional correlation computation in the spatial-domain. They form a
foundational building block that facilitates a U-shaped Holistic Dynamic
Frequency Transformer as a filter to extract global information and dynamically
select important frequency bands for image restoration. Complementing this, we
employ a Laplacian pyramid to decompose images into distinct frequency bands,
followed by multiple restorers, each tuned to recover specific frequency-band
information. The pyramid fusion allows a more detailed and nuanced image
restoration process. Ultimately, our structure unifies the three tasks of
low-light enhancement, exposure correction, and multi-exposure fusion, enabling
comprehensive treatment of all classical exposure errors. Benchmarking on
mainstream datasets for these tasks, our proposed method achieves
state-of-the-art results, paving the way for more sophisticated and unified
solutions in exposure correction
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Deep neural networks have achieved remarkable progress in enhancing low-light
images by improving their brightness and eliminating noise. However, most
existing methods construct end-to-end mapping networks heuristically,
neglecting the intrinsic prior of image enhancement task and lacking
transparency and interpretability. Although some unfolding solutions have been
proposed to relieve these issues, they rely on proximal operator networks that
deliver ambiguous and implicit priors. In this work, we propose a paradigm for
low-light image enhancement that explores the potential of customized learnable
priors to improve the transparency of the deep unfolding paradigm. Motivated by
the powerful feature representation capability of Masked Autoencoder (MAE), we
customize MAE-based illumination and noise priors and redevelop them from two
perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light
image to its illumination properties and then embed it into the proximal
operator design of the unfolding architecture; and m2) \textbf{optimization
flow}: we train MAE from a normal-light image to its gradient representation
and then employ it as a regularization term to constrain noise in the model
output. These designs improve the interpretability and representation
capability of the model.Extensive experiments on multiple low-light image
enhancement datasets demonstrate the superiority of our proposed paradigm over
state-of-the-art methods. Code is available at
https://github.com/zheng980629/CUE.Comment: Accepted by ICCV 202
Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
We propose a novel unsupervised backlit image enhancement method, abbreviated
as CLIP-LIT, by exploring the potential of Contrastive Language-Image
Pre-Training (CLIP) for pixel-level image enhancement. We show that the
open-world CLIP prior not only aids in distinguishing between backlit and
well-lit images, but also in perceiving heterogeneous regions with different
luminance, facilitating the optimization of the enhancement network. Unlike
high-level and image manipulation tasks, directly applying CLIP to enhancement
tasks is non-trivial, owing to the difficulty in finding accurate prompts. To
solve this issue, we devise a prompt learning framework that first learns an
initial prompt pair by constraining the text-image similarity between the
prompt (negative/positive sample) and the corresponding image (backlit
image/well-lit image) in the CLIP latent space. Then, we train the enhancement
network based on the text-image similarity between the enhanced result and the
initial prompt pair. To further improve the accuracy of the initial prompt
pair, we iteratively fine-tune the prompt learning framework to reduce the
distribution gaps between the backlit images, enhanced results, and well-lit
images via rank learning, boosting the enhancement performance. Our method
alternates between updating the prompt learning framework and enhancement
network until visually pleasing results are achieved. Extensive experiments
demonstrate that our method outperforms state-of-the-art methods in terms of
visual quality and generalization ability, without requiring any paired data.Comment: Accepted to ICCV 2023 as Oral. Project page:
https://zhexinliang.github.io/CLIP_LIT_page
Endoscopic Vision Augmentation Using Multiscale Bilateral-Weighted Retinex for Robotic Surgery
医疗机器人手术视觉是微创外科手术成功与否的关键所在。由于手术器械医学电子内镜自身内在的局限性,导致了手术视野不清晰、光照不均、多烟雾等诸多问题,使得外科医生无法准确快速感知与识别人体内部器官中的神经血管以及病灶位置等结构信息,这无疑增加了手术风险和手术时间。针对这些手术视觉问题,本论文提出了一种基于双边滤波权重分析的多尺度Retinex模型方法,对达芬奇医疗机器人手术过程中所采集到的病患视频进行处理与分析。经过外科医生对实验结果的主观评价,一致认为该方法能够大幅度地增强手术视野质量;同时客观评价实验结果表明本论文所提出方法优于目前计算机视觉领域内的图像增强与恢复方法。
厦门大学信息科学与技术学院计算机科学系罗雄彪教授为本文第一作者。【Abstract】Endoscopic vision plays a significant role in minimally invasive surgical procedures. The visibility and maintenance of such direct in-situ vision is paramount not only for safety by preventing inadvertent injury, but also to improve precision and reduce operating time. Unfortunately, endoscopic vision is unavoidably degraded due to illumination variations during surgery. This work aims to restore or augment such degraded visualization and quantitatively evaluate it during robotic surgery. A multiscale bilateral-weighted retinex method is proposed to remove non-uniform and highly directional illumination and enhance surgical vision, while an objective noreference image visibility assessment method is defined in terms of sharpness, naturalness, and contrast, to quantitatively and objectively evaluate endoscopic visualization on surgical video sequences. The methods were validated on surgical data, with the experimental results showing that our method outperforms existent retinex approaches. In particular, the combined visibility was improved from 0.81 to 1.06, while three surgeons generally
agreed that the results were restored with much better visibility.The authors thank the assistance of Dr. Stephen Pautler for facilitating the data acquisition, Dr. A. Jonathan McLeod and Dr.Uditha Jayarathne for helpful discussions
- …