117 research outputs found

    Learning Domain Invariant Prompt for Vision-Language Models

    Full text link
    Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning achieves excellent performance over in-domain data, it still faces the major challenge of generalizing to unseen classes and domains. Some existing prompt learning methods tackle this issue by adaptively generating different prompts for different tokens or domains but neglecting the ability of learned prompts to generalize to unseen domains. In this paper, we propose a novel prompt learning paradigm that directly generates \emph{domain invariant} prompt that can be generalized to unseen domains, called MetaPrompt. Specifically, a dual-modality prompt tuning network is proposed to generate prompts for input from both image and text modalities. With a novel asymmetric contrastive loss, the representation from the original pre-trained vision-language model acts as supervision to enhance the generalization ability of the learned prompt. More importantly, we propose a meta-learning-based prompt tuning algorithm that explicitly constrains the task-specific prompt tuned for one domain or class to also achieve good performance in another domain or class. Extensive experiments on 11 datasets for base-to-new generalization and 4 datasets for domain generalization demonstrate that our method consistently and significantly outperforms existing methods.Comment: 12 pages, 6 figures, 5 table

    A semi-automatic deep learning model based on biparametric MRI scanning strategy to predict bone metastases in newly diagnosed prostate cancer patients

    Get PDF
    ObjectiveTo develop a semi-automatic model integrating radiomics, deep learning, and clinical features for Bone Metastasis (BM) prediction in prostate cancer (PCa) patients using Biparametric MRI (bpMRI) images.MethodsA retrospective study included 414 PCa patients (BM, n=136; NO-BM, n=278) from two institutions (Center 1, n=318; Center 2, n=96) between January 2016 and December 2022. MRI scans were confirmed with BM status via PET-CT or ECT pre-treatment. Tumor areas on bpMRI images were delineated as tumor’s region of interest (ROI) using auto-delineation tumor models, evaluated with Dice similarity coefficient (DSC). Samples were auto-sketched, refined, and used to train the ResNet BM prediction model. Clinical, radiomics, and deep learning data were synthesized into the ResNet-C model, evaluated using receiver operating characteristic (ROC).ResultsThe auto-segmentation model achieved a DSC of 0.607. Clinical BM prediction’s internal validation had an accuracy (ACC) of 0.650 and area under the curve (AUC) of 0.713; external cohort had an ACC of 0.668 and AUC of 0.757. The deep learning model yielded an ACC of 0.875 and AUC of 0.907 for the internal, and ACC of 0.833 and AUC of 0.862 for the external cohort. The Radiomics model registered an ACC of 0.819 and AUC of 0.852 internally, and ACC of 0.885 and AUC of 0.903 externally. ResNet-C demonstrated the highest ACC of 0.902 and AUC of 0.934 for the internal, and ACC of 0.885 and AUC of 0.903 for the external cohort.ConclusionThe ResNet-C model, utilizing bpMRI scanning strategy, accurately assesses bone metastasis (BM) status in newly diagnosed prostate cancer (PCa) patients, facilitating precise treatment planning and improving patient prognoses

    Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

    Full text link
    Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments

    ICOSLG-associated immunological landscape and diagnostic value in oral squamous cell carcinoma: a prospective cohort study

    Get PDF
    Background: We previously reported that stroma cells regulate constitutive and inductive PD-L1 (B7-H1) expression and immune escape of oral squamous cell carcinoma. ICOSLG (B7-H2), belongs to the B7 protein family, also participates in regulating T cells activation for tissue homeostasis via binding to ICOS and inducing ICOS+ T cell differentiation as well as stimulate B-cell activation, while it appears to be abnormally expressed during carcinogenesis. Clarifying its heterogeneous clinical expression pattern and its immune landscape is a prerequisite for the maximum response rate of ICOSLG-based immunotherapy in a specific population.Methods: This retrospective study included OSCC tissue samples (n = 105) to analyze the spatial distribution of ICOSLG. Preoperative peripheral blood samples (n = 104) and independent tissue samples (n = 10) of OSCC were collected to analyze the changes of immunocytes (T cells, B cells, NK cells and macrophages) according to ICOSLG level in different cellular contents.Results: ICOSLG is ubiquitous in tumor cells (TCs), cancer-associated fibroblasts (CAFs) and tumor infiltrating lymphocytes (TILs). Patients with high ICOSLGTCs or TILs showed high TNM stage and lymph node metastasis, which predicted a decreased overall or metastasis-free survival. This sub-cohort was featured with diminished CD4+ T cells and increased Foxp3+ cells in invasive Frontier in situ, and increased absolute numbers of CD3+CD4+ and CD8+ T cells in peripheral blood. ICOSLG also positively correlated with other immune checkpoint molecules (PD-L1, CSF1R, CTLA4, IDO1, IL10, PD1).Conclusion: Tumor cell-derived ICOSLG could be an efficient marker of OSCC patient stratification for precision immunotherapy

    Open X-Embodiment:Robotic learning datasets and RT-X models

    Get PDF
    Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x.github.io

    Conditional Lie-BĂ€cklund Symmetry Reductions and Exact Solutions of a Class of Reaction-Diffusion Equations

    No full text
    The method of conditional Lie-BĂ€cklund symmetry is applied to solve a class of reaction-diffusion equations ut+uxx+Qxux2+Pxu+Rx=0, which have wide range of applications in physics, engineering, chemistry, biology, and financial mathematics theory. The resulting equations are either solved exactly or reduced to some finite-dimensional dynamical systems. The exact solutions obtained in concrete examples possess the extended forms of the separation of variables

    GMTS: GNN-based multi-scale transformer siamese network for remote sensing building change detection

    No full text
    With the remarkable success of change detection (CD) in remote sensing images in the context of deep learning, many convolutional neural network (CNN) based methods have been proposed. In the current research, to obtain a better context modeling method for remote sensing images and to capture more spatiotemporal characteristics, several attention-based methods and transformer (TR)-based methods have been proposed. Recent research has also continued to innovate on TR-based methods, and many new methods have been proposed. Most of them require a huge number of calculation to achieve good results. Therefore, using the TR-based mehtod while maintaining the overhead low is a problem to be solved. Here, we propose a GNN-based multi-scale transformer siamese network for remote sensing image change detection (GMTS) that maintains a low network overhead while effectively modeling context in the spatiotemporal domain. We also design a novel hybrid backbone to extract features. Compared with the current CNN backbone, our backbone network has a lower overhead and achieves better results. Further, we use high/low frequency (HiLo) attention to extract more detailed local features and the multi-scale pooling pyramid transformer (MPPT) module to focus on more global features respectively. Finally, we leverage the context modeling capabilities of TR in the spatiotemporal domain to optimize the extracted features. We have a relatively low number of parameters compared to that required by current TR-based methods and achieve a good effect improvement, which provides a good balance between efficiency and performance
    • 

    corecore