19 research outputs found
CLIP Guided Image-perceptive Prompt Learning for Image Enhancement
Image enhancement is a significant research area in the fields of computer
vision and image processing. In recent years, many learning-based methods for
image enhancement have been developed, where the Look-up-table (LUT) has proven
to be an effective tool. In this paper, we delve into the potential of
Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning,
proposing a simple structure called CLIP-LUT for image enhancement. We found
that the prior knowledge of CLIP can effectively discern the quality of
degraded images, which can provide reliable guidance. To be specific, We
initially learn image-perceptive prompts to distinguish between original and
target images using CLIP model, in the meanwhile, we introduce a very simple
network by incorporating a simple baseline to predict the weights of three
different LUT as enhancement network. The obtained prompts are used to steer
the enhancement network like a loss function and improve the performance of
model. We demonstrate that by simply combining a straightforward method with
CLIP, we can obtain satisfactory results.Comment: A trial work to the image enhancemen
A Large-scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement
Film, a classic image style, is culturally significant to the whole
photographic industry since it marks the birth of photography. However, film
photography is time-consuming and expensive, necessitating a more efficient
method for collecting film-style photographs. Numerous datasets that have
emerged in the field of image enhancement so far are not film-specific. In
order to facilitate film-based image stylization research, we construct
FilmSet, a large-scale and high-quality film style dataset. Our dataset
includes three different film types and more than 5000 in-the-wild high
resolution images. Inspired by the features of FilmSet images, we propose a
novel framework called FilmNet based on Laplacian Pyramid for stylizing images
across frequency bands and achieving film style outcomes. Experiments reveal
that the performance of our model is superior than state-of-the-art techniques.
Our dataset and code will be made publicly available
High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
Shadows often occur when we capture the documents with casual equipment,
which influences the visual quality and readability of the digital copies.
Different from the algorithms for natural shadow removal, the algorithms in
document shadow removal need to preserve the details of fonts and figures in
high-resolution input. Previous works ignore this problem and remove the
shadows via approximate attention and small datasets, which might not work in
real-world situations. We handle high-resolution document shadow removal
directly via a larger-scale real-world dataset and a carefully designed
frequency-aware network. As for the dataset, we acquire over 7k couples of
high-resolution (2462 x 3699) images of real-world document pairs with various
samples under different lighting circumstances, which is 10 times larger than
existing datasets. As for the design of the network, we decouple the
high-resolution images in the frequency domain, where the low-frequency details
and high-frequency boundaries can be effectively learned via the carefully
designed network structure. Powered by our network and dataset, the proposed
method clearly shows a better performance than previous methods in terms of
visual quality and numerical results. The code, models, and dataset are
available at: https://github.com/CXH-Research/DocShadow-SD7KComment: Accepted by International Conference on Computer Vision 2023 (ICCV
2023
ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal
Document shadow is a common issue that arise when capturing documents using
mobile devices, which significantly impacts the readability. Current methods
encounter various challenges including inaccurate detection of shadow masks and
estimation of illumination. In this paper, we propose ShaDocFormer, a
Transformer-based architecture that integrates traditional methodologies and
deep learning techniques to tackle the problem of document shadow removal. The
ShaDocFormer architecture comprises two components: the Shadow-attentive
Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module
employs a traditional thresholding technique and leverages the attention
mechanism of the Transformer to gather global information, thereby enabling
precise detection of shadow masks. The cascaded and aggregative structure of
the CFR module facilitates a coarse-to-fine restoration process for the entire
image. As a result, ShaDocFormer excels in accurately detecting and capturing
variations in both shadow and illumination, thereby enabling effective removal
of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms
current state-of-the-art methods in both qualitative and quantitative
measurements
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer With Adaptive Channel Expansion
Vignetting commonly occurs as a degradation in images resulting from factors
such as lens design, improper lens hood usage, and limitations in camera
sensors. This degradation affects image details, color accuracy, and presents
challenges in computational photography. Existing vignetting removal algorithms
predominantly rely on ideal physics assumptions and hand-crafted parameters,
resulting in the ineffective removal of irregular vignetting and suboptimal
results. Moreover, the substantial lack of real-world vignetting datasets
hinders the objective and comprehensive evaluation of vignetting removal. To
address these challenges, we present Vigset, a pioneering dataset for
vignetting removal. Vigset includes 983 pairs of both vignetting and
vignetting-free high-resolution () real-world images under
various conditions. In addition, We introduce DeVigNet, a novel frequency-aware
Transformer architecture designed for vignetting removal. Through the Laplacian
Pyramid decomposition, we propose the Dual Aggregated Fusion Transformer to
handle global features and remove vignetting in the low-frequency domain.
Additionally, we propose the Adaptive Channel Expansion Module to enhance
details in the high-frequency domain. The experiments demonstrate that the
proposed model outperforms existing state-of-the-art methods. The code, models,
and dataset are available at \url{https://github.com/CXH-Research/DeVigNet}.Comment: Accepted by AAAI Conference on Artificial Intelligence 2024 (AAAI
2024
UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer
Underwater images often exhibit poor quality, imbalanced coloration, and low
contrast due to the complex and intricate interaction of light, water, and
objects. Despite the significant contributions of previous underwater
enhancement techniques, there exist several problems that demand further
improvement: (i) Current deep learning methodologies depend on Convolutional
Neural Networks (CNNs) that lack multi-scale enhancement and also have limited
global perception fields. (ii) The scarcity of paired real-world underwater
datasets poses a considerable challenge, and the utilization of synthetic image
pairs risks overfitting. To address the aforementioned issues, this paper
presents a Multi-scale Transformer-based Network called UWFormer for enhancing
images at multiple frequencies via semi-supervised learning, in which we
propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale
Fusion Feed-forward Network for low-frequency enhancement. Additionally, we
introduce a specialized underwater semi-supervised training strategy, proposing
a Subaqueous Perceptual Loss function to generate reliable pseudo labels.
Experiments using full-reference and non-reference underwater benchmarks
demonstrate that our method outperforms state-of-the-art methods in terms of
both quantity and visual quality
DocDeshadower: Frequency-aware Transformer for Document Shadow Removal
The presence of shadows significantly impacts the visual quality of scanned
documents. However, the existing traditional techniques and deep learning
methods used for shadow removal have several limitations. These methods either
rely heavily on heuristics, resulting in suboptimal performance, or require
large datasets to learn shadow-related features. In this study, we propose the
DocDeshadower, a multi-frequency Transformer-based model built on Laplacian
Pyramid. DocDeshadower is designed to remove shadows at different frequencies
in a coarse-to-fine manner. To achieve this, we decompose the shadow image into
different frequency bands using Laplacian Pyramid. In addition, we introduce
two novel components to this model: the Attention-Aggregation Network and the
Gated Multi-scale Fusion Transformer. The Attention-Aggregation Network is
designed to remove shadows in the low-frequency part of the image, whereas the
Gated Multi-scale Fusion Transformer refines the entire image at a global scale
with its large perceptive field. Our extensive experiments demonstrate that
DocDeshadower outperforms the current state-of-the-art methods in both
qualitative and quantitative terms
Hardware inspired neural network for efficient time-resolved biomedical imaging
Convolutional neural networks (CNN) have revealed exceptional performance for fluorescence lifetime imaging (FLIM). However, redundant parameters and complicated topologies make it challenging to implement such networks on embedded hardware to achieve real-time processing. We report a lightweight, quantized neural architecture that can offer fast FLIM imaging. The forward-propagation is significantly simplified by replacing matrix multiplications in each convolution layer with additions and data quantization using a low bit-width. We first used synthetic 3-D lifetime data with given lifetime ranges and photon counts to assure correct average lifetimes can be obtained. Afterwards, human prostatic cancer cells incubated with gold nanoprobes were utilized to validate the feasibility of the network for real-world data. The quantized network yielded a 37.8% compression ratio without performance degradation. Clinical relevance - This neural network can be applied to diagnose cancer early based on fluorescence lifetime in a non-invasive way. This approach brings high accuracy and accelerates diagnostic processes for clinicians who are not experts in biomedical signal processin
Fast analysis of time‐domain fluorescence lifetime imaging via extreme learning machine
We present a fast and accurate analytical method for fluorescence lifetime imaging microscopy (FLIM), using the extreme learning machine (ELM). We used extensive metrics to evaluate ELM and existing algorithms. First, we compared these algorithms using synthetic datasets. The results indicate that ELM can obtain higher fidelity, even in low-photon conditions. Afterwards, we used ELM to retrieve lifetime components from human prostate cancer cells loaded with gold nanosensors, showing that ELM also outperforms the iterative fitting and non-fitting algorithms. By comparing ELM with a computational efficient neural network, ELM achieves comparable accuracy with less training and inference time. As there is no back-propagation process for ELM during the training phase, the training speed is much higher than existing neural network approaches. The proposed strategy is promising for edge computing with online training
The impact of metabolic overweight/obesity phenotypes on unplanned readmission risk in patients with COPD: a retrospective cohort study
Background: There is an inconsistent association between overweight/obesity and chronic obstructive pulmonary disease (COPD). Considering that different metabolic characteristics exist among individuals in the same body mass index (BMI) category, the classification of overweight/obesity based on metabolic status may facilitate the risk assessment of COPD. Our study aimed to explore the relationship between metabolic overweight/obesity phenotypes and unplanned readmission in patients with COPD.Methods: We conducted a retrospective cohort study using the Nationwide Readmissions Database (NRD). According to metabolic overweight/obesity phenotypes, patients were classified into four groups: metabolically healthy non-overweight/obesity (MHNO), metabolically unhealthy non-overweight/obesity (MUNO), metabolically healthy with overweight/obesity (MHO), and metabolically unhealthy with overweight/obesity (MUO). The primary outcome was unplanned readmission to hospital within 30 days of discharge from index hospitalization. Secondary outcomes included in-hospital mortality, length of stay (LOS) and total charges of readmission within 30 days.Results: Among 1,445,890 patients admitted with COPD, 167,156 individuals were unplanned readmitted within 30 days. Patients with the phenotype MUNO [hazard ratio (HR), 1.049; 95%CI, 1.038–1.061; p < 0.001] and MUO (HR, 1.061; 95%CI, 1.045–1.077; p < 0.001) had a higher readmission risk compared with patients with MHNO. But in elders (≥65yr), MHO also had a higher readmission risk (HR, 1.032; 95%CI, 1.002–1.063; p = 0.039). Besides, the readmission risk of COPD patients with hyperglycemia or hypertension regardless of overweight/obesity increased (p < 0.001).Conclusion: In patients with COPD, overweight/obesity alone had little effect on unplanned readmission, whereas metabolic abnormalities regardless of overweight/obesity were associated with an increased risk of unplanned readmission. Among the metabolic abnormalities, particular attention should be paid to hyperglycemia and hypertension. But in elders (≥65yr) overweight/obesity and metabolic abnormalities independently exacerbated the adverse outcomes