55 research outputs found
A Large-scale Film Style Dataset for Learning Multi-frequency Driven Film Enhancement
Film, a classic image style, is culturally significant to the whole
photographic industry since it marks the birth of photography. However, film
photography is time-consuming and expensive, necessitating a more efficient
method for collecting film-style photographs. Numerous datasets that have
emerged in the field of image enhancement so far are not film-specific. In
order to facilitate film-based image stylization research, we construct
FilmSet, a large-scale and high-quality film style dataset. Our dataset
includes three different film types and more than 5000 in-the-wild high
resolution images. Inspired by the features of FilmSet images, we propose a
novel framework called FilmNet based on Laplacian Pyramid for stylizing images
across frequency bands and achieving film style outcomes. Experiments reveal
that the performance of our model is superior than state-of-the-art techniques.
Our dataset and code will be made publicly available
High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
Shadows often occur when we capture the documents with casual equipment,
which influences the visual quality and readability of the digital copies.
Different from the algorithms for natural shadow removal, the algorithms in
document shadow removal need to preserve the details of fonts and figures in
high-resolution input. Previous works ignore this problem and remove the
shadows via approximate attention and small datasets, which might not work in
real-world situations. We handle high-resolution document shadow removal
directly via a larger-scale real-world dataset and a carefully designed
frequency-aware network. As for the dataset, we acquire over 7k couples of
high-resolution (2462 x 3699) images of real-world document pairs with various
samples under different lighting circumstances, which is 10 times larger than
existing datasets. As for the design of the network, we decouple the
high-resolution images in the frequency domain, where the low-frequency details
and high-frequency boundaries can be effectively learned via the carefully
designed network structure. Powered by our network and dataset, the proposed
method clearly shows a better performance than previous methods in terms of
visual quality and numerical results. The code, models, and dataset are
available at: https://github.com/CXH-Research/DocShadow-SD7KComment: Accepted by International Conference on Computer Vision 2023 (ICCV
2023
ShaDocFormer: A Shadow-attentive Threshold Detector with Cascaded Fusion Refiner for document shadow removal
Document shadow is a common issue that arise when capturing documents using
mobile devices, which significantly impacts the readability. Current methods
encounter various challenges including inaccurate detection of shadow masks and
estimation of illumination. In this paper, we propose ShaDocFormer, a
Transformer-based architecture that integrates traditional methodologies and
deep learning techniques to tackle the problem of document shadow removal. The
ShaDocFormer architecture comprises two components: the Shadow-attentive
Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module
employs a traditional thresholding technique and leverages the attention
mechanism of the Transformer to gather global information, thereby enabling
precise detection of shadow masks. The cascaded and aggregative structure of
the CFR module facilitates a coarse-to-fine restoration process for the entire
image. As a result, ShaDocFormer excels in accurately detecting and capturing
variations in both shadow and illumination, thereby enabling effective removal
of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms
current state-of-the-art methods in both qualitative and quantitative
measurements
UWFormer: Underwater Image Enhancement via a Semi-Supervised Multi-Scale Transformer
Underwater images often exhibit poor quality, imbalanced coloration, and low
contrast due to the complex and intricate interaction of light, water, and
objects. Despite the significant contributions of previous underwater
enhancement techniques, there exist several problems that demand further
improvement: (i) Current deep learning methodologies depend on Convolutional
Neural Networks (CNNs) that lack multi-scale enhancement and also have limited
global perception fields. (ii) The scarcity of paired real-world underwater
datasets poses a considerable challenge, and the utilization of synthetic image
pairs risks overfitting. To address the aforementioned issues, this paper
presents a Multi-scale Transformer-based Network called UWFormer for enhancing
images at multiple frequencies via semi-supervised learning, in which we
propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale
Fusion Feed-forward Network for low-frequency enhancement. Additionally, we
introduce a specialized underwater semi-supervised training strategy, proposing
a Subaqueous Perceptual Loss function to generate reliable pseudo labels.
Experiments using full-reference and non-reference underwater benchmarks
demonstrate that our method outperforms state-of-the-art methods in terms of
both quantity and visual quality
DocDeshadower: Frequency-aware Transformer for Document Shadow Removal
The presence of shadows significantly impacts the visual quality of scanned
documents. However, the existing traditional techniques and deep learning
methods used for shadow removal have several limitations. These methods either
rely heavily on heuristics, resulting in suboptimal performance, or require
large datasets to learn shadow-related features. In this study, we propose the
DocDeshadower, a multi-frequency Transformer-based model built on Laplacian
Pyramid. DocDeshadower is designed to remove shadows at different frequencies
in a coarse-to-fine manner. To achieve this, we decompose the shadow image into
different frequency bands using Laplacian Pyramid. In addition, we introduce
two novel components to this model: the Attention-Aggregation Network and the
Gated Multi-scale Fusion Transformer. The Attention-Aggregation Network is
designed to remove shadows in the low-frequency part of the image, whereas the
Gated Multi-scale Fusion Transformer refines the entire image at a global scale
with its large perceptive field. Our extensive experiments demonstrate that
DocDeshadower outperforms the current state-of-the-art methods in both
qualitative and quantitative terms
DiffGAN-F2S: Symmetric and Efficient Denoising Diffusion GANs for Structural Connectivity Prediction from Brain fMRI
Mapping from functional connectivity (FC) to structural connectivity (SC) can
facilitate multimodal brain network fusion and discover potential biomarkers
for clinical implications. However, it is challenging to directly bridge the
reliable non-linear mapping relations between SC and functional magnetic
resonance imaging (fMRI). In this paper, a novel diffusision generative
adversarial network-based fMRI-to-SC (DiffGAN-F2S) model is proposed to predict
SC from brain fMRI in an end-to-end manner. To be specific, the proposed
DiffGAN-F2S leverages denoising diffusion probabilistic models (DDPMs) and
adversarial learning to efficiently generate high-fidelity SC through a few
steps from fMRI. By designing the dual-channel multi-head spatial attention
(DMSA) and graph convolutional modules, the symmetric graph generator first
captures global relations among direct and indirect connected brain regions,
then models the local brain region interactions. It can uncover the complex
mapping relations between fMRI and structural connectivity. Furthermore, the
spatially connected consistency loss is devised to constrain the generator to
preserve global-local topological information for accurate intrinsic SC
prediction. Testing on the public Alzheimer's Disease Neuroimaging Initiative
(ADNI) dataset, the proposed model can effectively generate empirical
SC-preserved connectivity from four-dimensional imaging data and shows superior
performance in SC prediction compared with other related models. Furthermore,
the proposed model can identify the vast majority of important brain regions
and connections derived from the empirical method, providing an alternative way
to fuse multimodal brain networks and analyze clinical disease.Comment: 12 page
Forecasting the COVID-19 transmission in Italy based on the minimum spanning tree of dynamic region network
Background Italy surpassed 1.5 million confirmed Coronavirus Disease 2019 (COVID-19) infections on November 26, as its death toll rose rapidly in the second wave of COVID-19 outbreak which is a heavy burden on hospitals. Therefore, it is necessary to forecast and early warn the potential outbreak of COVID-19 in the future, which facilitates the timely implementation of appropriate control measures. However, real-time prediction of COVID-19 transmission and outbreaks is usually challenging because of its complexity intertwining both biological systems and social systems. Methods By mining the dynamical information from region networks and the short-term time series data, we developed a data-driven model, the minimum-spanning-tree-based dynamical network marker (MST-DNM), to quantitatively analyze and monitor the dynamical process of COVID-19 spreading. Specifically, we collected the historical information of daily cases caused by COVID-19 infection in Italy from February 24, 2020 to November 28, 2020. When applied to the region network of Italy, the MST-DNM model has the ability to monitor the whole process of COVID-19 transmission and successfully identify the early-warning signals. The interpretability and practical significance of our model are explained in detail in this study. Results The study on the dynamical changes of Italian region networks reveals the dynamic of COVID-19 transmission at the network level. It is noteworthy that the driving force of MST-DNM only relies on small samples rather than years of time series data. Therefore, it is of great potential in public surveillance for emerging infectious diseases
Generative AI for brain image computing and brain network computing: a review
Recent years have witnessed a significant advancement in brain imaging techniques that offer a non-invasive approach to mapping the structure and function of the brain. Concurrently, generative artificial intelligence (AI) has experienced substantial growth, involving using existing data to create new content with a similar underlying pattern to real-world data. The integration of these two domains, generative AI in neuroimaging, presents a promising avenue for exploring various fields of brain imaging and brain network computing, particularly in the areas of extracting spatiotemporal brain features and reconstructing the topological connectivity of brain networks. Therefore, this study reviewed the advanced models, tasks, challenges, and prospects of brain imaging and brain network computing techniques and intends to provide a comprehensive picture of current generative AI techniques in brain imaging. This review is focused on novel methodological approaches and applications of related new methods. It discussed fundamental theories and algorithms of four classic generative models and provided a systematic survey and categorization of tasks, including co-registration, super-resolution, enhancement, classification, segmentation, cross-modality, brain network analysis, and brain decoding. This paper also highlighted the challenges and future directions of the latest work with the expectation that future research can be beneficial
Recommended from our members
WMNN: Wearables-Based Multi-Column Neural Network for Human Activity Recognition.
In recent years, human activity recognition (HAR) technologies in e-health have triggered broad interest. In literature, mainstream works focus on the body's spatial information (i.e. postures) which lacks the interpretation of key bioinformatics associated with movements, limiting the use in applications requiring comprehensively evaluating motion tasks' correctness. To address the issue, in this article, a Wearables-based Multi-column Neural Network (WMNN) for HAR based on multi-sensor fusion and deep learning is presented. Here, the Tai Chi Eight Methods were utilized as an example as in which both postures and muscle activity strengths are significant. The research work was validated by recruiting 14 subjects in total, and we experimentally show 96.9% and 92.5% accuracy for training and testing, for a total of 144 postures and corresponding muscle activities. The method is then provided with a human-machine interface (HMI), which returns users with motion suggestions (i.e. postures and muscle strength). The report demonstrates that the proposed HAR technique can enhance users' self-training efficiency, potentially promoting the development of the HAR area
Exploring Indoor White Spaces in Metropolises
It is a promising vision to utilize white spaces, i.e., vacant VHF and UHF TV channels, to satisfy skyrocketing wireless data demand in both outdoor and indoor scenarios. While most prior works have focused on exploring outdoor white spaces, the indoor story is largely open for investigation. Motivated by this observation and that 70 % of the spectrum demand comes from indoor environments, we carry out a comprehensive study of exploring indoor white spaces. We first present a large-scale measurement of outdoor and indoor TV spectrum occupancy in 30+ diverse locations in a typical metropolis Hong Kong. Our measurement results confirm abundant white spaces available for exploration in a wide range of areas in metropolises. In particular, more than 50 % and 70 % of the TV spectrum are white spaces in outdoor and indoor scenarios, respectively. While there are substantially more white space
- …