44 research outputs found
Prompt-based test-time real image dehazing: a novel pipeline
Existing methods attempt to improve models' generalization ability on
real-world hazy images by exploring well-designed training schemes (e.g.,
CycleGAN, prior loss). However, most of them need very complicated training
procedures to achieve satisfactory results. In this work, we present a totally
novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help
generate visually pleasing results of real-captured hazy images during the
inference phase. We experimentally find that given a dehazing model trained on
synthetic data, by fine-tuning the statistics (i.e., mean and standard
deviation) of encoding features, PTTD is able to narrow the domain gap,
boosting the performance of real image dehazing. Accordingly, we first apply a
prompt generation module (PGM) to generate a visual prompt, which is the source
of appropriate statistical perturbations for mean and standard deviation. And
then, we employ the feature adaptation module (FAM) into the existing dehazing
models for adjusting the original statistics with the guidance of the generated
prompt. Note that, PTTD is model-agnostic and can be equipped with various
state-of-the-art dehazing models trained on synthetic hazy-clean pairs.
Extensive experimental results demonstrate that our PTTD is flexible meanwhile
achieves superior performance against state-of-the-art dehazing methods in
real-world scenarios. The source code of our PTTD will be made available at
https://github.com/cecret3350/PTTD-Dehazing.Comment: update github link (https://github.com/cecret3350/PTTD-Dehazing
Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization
Recently, deep learning-based methods have dominated image dehazing domain.
Although very competitive dehazing performance has been achieved with
sophisticated models, effective solutions for extracting useful features are
still under-explored. In addition, non-local network, which has made a
breakthrough in many vision tasks, has not been appropriately applied to image
dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting
of the multi-stream feature attention block (MSFAB) and cross non-local block
(CNLB) is presented in this paper. We start with extracting richer features for
dehazing. Specifically, we design a multi-stream feature extraction (MSFE)
sub-block, which contains three parallel convolutions with different receptive
fields (i.e., , , ) for extracting multi-scale
features. Following MSFE, we employ an attention sub-block to make the model
adaptively focus on important channels/regions. The MSFE and attention
sub-blocks constitute our MSFAB. Then, we design a cross non-local block
(CNLB), which can capture long-range dependencies beyond the query. Instead of
the same input source of query branch, the key and value branches are enhanced
by fusing more preceding features. CNLB is computation-friendly by leveraging a
spatial pyramid down-sampling (SPDS) strategy to reduce the computation and
memory consumption without sacrificing the performance. Last but not least, a
novel detail-focused contrastive regularization (DFCR) is presented by
emphasizing the low-level details and ignoring the high-level semantic
information in the representation space. Comprehensive experimental results
demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art
dehazing methods with less than 1.5 Million parameters.Comment: submitted to IEEE TCYB for possible publicatio
Highly sensitive magnetic properties and large linear magnetoresistance in antiferromagnetic CrxSe(0.875\lex\le1)single crystals
CrxSe (x\le1) is a class of quasi-layered binary compounds with potential
applications in spintronics due to its intriguing antiferromagnetic properties.
In this work, CrxSe single crystals with high Cr content (x=0.87, 0.91 and
0.95) were grown, and their magnetic and transport properties were investigated
in detail. It is found that with small increase of Cr content, the N\'eel
temperature (TN) of the samples can dramatically increase from 147 K to 257 K,
accompanied with obvious changes in the magnetic anisotropy and hysteresis. The
phenomena of field-induced spin-flop transitions were unveiled in these alloys,
indicating their comparatively low anisotropy. The magnetoresistance (MR) of
the three compounds showed positive dependence at low temperatures and
particularly, non-saturated linear positive MR was observed in Cr0.91Se and
Cr0.95Se, with a large value of 16.2% achieved in Cr0.91Se (10K, 9T). The
calculated Fermi surface and MR showed that the quasi-linear MR is a product of
carrier compensation. Our work revealed highly sensitive magnetic and transport
properties in the Cr-Se compounds, which can lay foundation when constructing
further antiferromagnetic spintronic devices based on them
A Practical Response Adaptive Block Randomization Design with Analytic Type I Error Protection
Response adaptive randomization is appealing in confirmatory adaptive
clinical trials from statistical, ethical, and pragmatic perspectives, in the
sense that subjects are more likely to be randomized to better performing
treatment groups based on accumulating data. The Doubly Adaptive Biased Coin
Design (DBCD) is a popular solution due to its asymptotic normal property of
final allocations, which further justifies its asymptotic type I error rate
control. As an alternative, we propose a Response Adaptive Block Randomization
(RABR) design with pre-specified randomization ratios for the control and
high-performing groups to robustly achieve desired final sample size per group
under different underlying responses, which is usually required in
industry-sponsored clinical studies. We show that the usual test statistic has
a controlled type I error rate. Our simulations further highlight the
advantages of the proposed design over the DBCD in terms of consistently
achieving final sample allocations and of power performance. We further apply
this design to a Phase III study evaluating the efficacy of two dosing regimens
of adjunctive everolimus in treating tuberous sclerosis complex but with no
previous dose-finding studies in this indication
GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions
Image restoration in adverse weather conditions is a difficult task in
computer vision. In this paper, we propose a novel transformer-based framework
called GridFormer which serves as a backbone for image restoration under
adverse weather conditions. GridFormer is designed in a grid structure using a
residual dense transformer block, and it introduces two core designs. First, it
uses an enhanced attention mechanism in the transformer layer. The mechanism
includes stages of the sampler and compact self-attention to improve
efficiency, and a local enhancement stage to strengthen local information.
Second, we introduce a residual dense transformer block (RDTB) as the final
GridFormer layer. This design further improves the network's ability to learn
effective features from both preceding and current local features. The
GridFormer framework achieves state-of-the-art results on five diverse image
restoration tasks in adverse weather conditions, including image deraining,
dehazing, deraining & dehazing, desnowing, and multi-weather restoration. The
source code and pre-trained models will be released.Comment: 17 pages, 12 figure
Deep Video Restoration for Under-Display Camera
Images or videos captured by the Under-Display Camera (UDC) suffer from
severe degradation, such as saturation degeneration and color shift. While
restoration for UDC has been a critical task, existing works of UDC restoration
focus only on images. UDC video restoration (UDC-VR) has not been explored in
the community. In this work, we first propose a GAN-based generation pipeline
to simulate the realistic UDC degradation process. With the pipeline, we build
the first large-scale UDC video restoration dataset called PexelsUDC, which
includes two subsets named PexelsUDC-T and PexelsUDC-P corresponding to
different displays for UDC. Using the proposed dataset, we conduct extensive
benchmark studies on existing video restoration methods and observe their
limitations on the UDC-VR task. To this end, we propose a novel
transformer-based baseline method that adaptively enhances degraded videos. The
key components of the method are a spatial branch with local-aware
transformers, a temporal branch embedded temporal transformers, and a
spatial-temporal fusion module. These components drive the model to fully
exploit spatial and temporal information for UDC-VR. Extensive experiments show
that our method achieves state-of-the-art performance on PexelsUDC. The
benchmark and the baseline method are expected to promote the progress of
UDC-VR in the community, which will be made public
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Style voice conversion aims to transform the style of source speech to a
desired style according to real-world application demands. However, the current
style voice conversion approach relies on pre-defined labels or reference
speech to control the conversion process, which leads to limitations in style
diversity or falls short in terms of the intuitive and interpretability of
style representation. In this study, we propose PromptVC, a novel style voice
conversion approach that employs a latent diffusion model to generate a style
vector driven by natural language prompts. Specifically, the style vector is
extracted by a style encoder during training, and then the latent diffusion
model is trained independently to sample the style vector from noise, with this
process being conditioned on natural language prompts. To improve style
expressiveness, we leverage HuBERT to extract discrete tokens and replace them
with the K-Means center embedding to serve as the linguistic content, which
minimizes residual style information. Additionally, we deduplicate the same
discrete token and employ a differentiable duration predictor to re-predict the
duration of each token, which can adapt the duration of the same linguistic
content to different styles. The subjective and objective evaluation results
demonstrate the effectiveness of our proposed system.Comment: Submitted to ICASSP 202