44 research outputs found

    Prompt-based test-time real image dehazing: a novel pipeline

    Full text link
    Existing methods attempt to improve models' generalization ability on real-world hazy images by exploring well-designed training schemes (e.g., CycleGAN, prior loss). However, most of them need very complicated training procedures to achieve satisfactory results. In this work, we present a totally novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help generate visually pleasing results of real-captured hazy images during the inference phase. We experimentally find that given a dehazing model trained on synthetic data, by fine-tuning the statistics (i.e., mean and standard deviation) of encoding features, PTTD is able to narrow the domain gap, boosting the performance of real image dehazing. Accordingly, we first apply a prompt generation module (PGM) to generate a visual prompt, which is the source of appropriate statistical perturbations for mean and standard deviation. And then, we employ the feature adaptation module (FAM) into the existing dehazing models for adjusting the original statistics with the guidance of the generated prompt. Note that, PTTD is model-agnostic and can be equipped with various state-of-the-art dehazing models trained on synthetic hazy-clean pairs. Extensive experimental results demonstrate that our PTTD is flexible meanwhile achieves superior performance against state-of-the-art dehazing methods in real-world scenarios. The source code of our PTTD will be made available at https://github.com/cecret3350/PTTD-Dehazing.Comment: update github link (https://github.com/cecret3350/PTTD-Dehazing

    Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization

    Full text link
    Recently, deep learning-based methods have dominated image dehazing domain. Although very competitive dehazing performance has been achieved with sophisticated models, effective solutions for extracting useful features are still under-explored. In addition, non-local network, which has made a breakthrough in many vision tasks, has not been appropriately applied to image dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting of the multi-stream feature attention block (MSFAB) and cross non-local block (CNLB) is presented in this paper. We start with extracting richer features for dehazing. Specifically, we design a multi-stream feature extraction (MSFE) sub-block, which contains three parallel convolutions with different receptive fields (i.e., 1×11\times 1, 3×33\times 3, 5×55\times 5) for extracting multi-scale features. Following MSFE, we employ an attention sub-block to make the model adaptively focus on important channels/regions. The MSFE and attention sub-blocks constitute our MSFAB. Then, we design a cross non-local block (CNLB), which can capture long-range dependencies beyond the query. Instead of the same input source of query branch, the key and value branches are enhanced by fusing more preceding features. CNLB is computation-friendly by leveraging a spatial pyramid down-sampling (SPDS) strategy to reduce the computation and memory consumption without sacrificing the performance. Last but not least, a novel detail-focused contrastive regularization (DFCR) is presented by emphasizing the low-level details and ignoring the high-level semantic information in the representation space. Comprehensive experimental results demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art dehazing methods with less than 1.5 Million parameters.Comment: submitted to IEEE TCYB for possible publicatio

    Highly sensitive magnetic properties and large linear magnetoresistance in antiferromagnetic CrxSe(0.875\lex\le1)single crystals

    Full text link
    CrxSe (x\le1) is a class of quasi-layered binary compounds with potential applications in spintronics due to its intriguing antiferromagnetic properties. In this work, CrxSe single crystals with high Cr content (x=0.87, 0.91 and 0.95) were grown, and their magnetic and transport properties were investigated in detail. It is found that with small increase of Cr content, the N\'eel temperature (TN) of the samples can dramatically increase from 147 K to 257 K, accompanied with obvious changes in the magnetic anisotropy and hysteresis. The phenomena of field-induced spin-flop transitions were unveiled in these alloys, indicating their comparatively low anisotropy. The magnetoresistance (MR) of the three compounds showed positive dependence at low temperatures and particularly, non-saturated linear positive MR was observed in Cr0.91Se and Cr0.95Se, with a large value of 16.2% achieved in Cr0.91Se (10K, 9T). The calculated Fermi surface and MR showed that the quasi-linear MR is a product of carrier compensation. Our work revealed highly sensitive magnetic and transport properties in the Cr-Se compounds, which can lay foundation when constructing further antiferromagnetic spintronic devices based on them

    A Practical Response Adaptive Block Randomization Design with Analytic Type I Error Protection

    Full text link
    Response adaptive randomization is appealing in confirmatory adaptive clinical trials from statistical, ethical, and pragmatic perspectives, in the sense that subjects are more likely to be randomized to better performing treatment groups based on accumulating data. The Doubly Adaptive Biased Coin Design (DBCD) is a popular solution due to its asymptotic normal property of final allocations, which further justifies its asymptotic type I error rate control. As an alternative, we propose a Response Adaptive Block Randomization (RABR) design with pre-specified randomization ratios for the control and high-performing groups to robustly achieve desired final sample size per group under different underlying responses, which is usually required in industry-sponsored clinical studies. We show that the usual test statistic has a controlled type I error rate. Our simulations further highlight the advantages of the proposed design over the DBCD in terms of consistently achieving final sample allocations and of power performance. We further apply this design to a Phase III study evaluating the efficacy of two dosing regimens of adjunctive everolimus in treating tuberous sclerosis complex but with no previous dose-finding studies in this indication

    GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

    Full text link
    Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image restoration under adverse weather conditions. GridFormer is designed in a grid structure using a residual dense transformer block, and it introduces two core designs. First, it uses an enhanced attention mechanism in the transformer layer. The mechanism includes stages of the sampler and compact self-attention to improve efficiency, and a local enhancement stage to strengthen local information. Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer. This design further improves the network's ability to learn effective features from both preceding and current local features. The GridFormer framework achieves state-of-the-art results on five diverse image restoration tasks in adverse weather conditions, including image deraining, dehazing, deraining & dehazing, desnowing, and multi-weather restoration. The source code and pre-trained models will be released.Comment: 17 pages, 12 figure

    Deep Video Restoration for Under-Display Camera

    Full text link
    Images or videos captured by the Under-Display Camera (UDC) suffer from severe degradation, such as saturation degeneration and color shift. While restoration for UDC has been a critical task, existing works of UDC restoration focus only on images. UDC video restoration (UDC-VR) has not been explored in the community. In this work, we first propose a GAN-based generation pipeline to simulate the realistic UDC degradation process. With the pipeline, we build the first large-scale UDC video restoration dataset called PexelsUDC, which includes two subsets named PexelsUDC-T and PexelsUDC-P corresponding to different displays for UDC. Using the proposed dataset, we conduct extensive benchmark studies on existing video restoration methods and observe their limitations on the UDC-VR task. To this end, we propose a novel transformer-based baseline method that adaptively enhances degraded videos. The key components of the method are a spatial branch with local-aware transformers, a temporal branch embedded temporal transformers, and a spatial-temporal fusion module. These components drive the model to fully exploit spatial and temporal information for UDC-VR. Extensive experiments show that our method achieves state-of-the-art performance on PexelsUDC. The benchmark and the baseline method are expected to promote the progress of UDC-VR in the community, which will be made public

    PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

    Full text link
    Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation. In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts. Specifically, the style vector is extracted by a style encoder during training, and then the latent diffusion model is trained independently to sample the style vector from noise, with this process being conditioned on natural language prompts. To improve style expressiveness, we leverage HuBERT to extract discrete tokens and replace them with the K-Means center embedding to serve as the linguistic content, which minimizes residual style information. Additionally, we deduplicate the same discrete token and employ a differentiable duration predictor to re-predict the duration of each token, which can adapt the duration of the same linguistic content to different styles. The subjective and objective evaluation results demonstrate the effectiveness of our proposed system.Comment: Submitted to ICASSP 202
    corecore