38 research outputs found

    Domain-Specific Bias Filtering for Single Labeled Domain Generalization

    Full text link
    Conventional Domain Generalization (CDG) utilizes multiple labeled source datasets to train a generalizable model for unseen target domains. However, due to expensive annotation costs, the requirements of labeling all the source data are hard to be met in real-world applications. In this paper, we investigate a Single Labeled Domain Generalization (SLDG) task with only one source domain being labeled, which is more practical and challenging than the CDG task. A major obstacle in the SLDG task is the discriminability-generalization bias: the discriminative information in the labeled source dataset may contain domain-specific bias, constraining the generalization of the trained model. To tackle this challenging task, we propose a novel framework called Domain-Specific Bias Filtering (DSBF), which initializes a discriminative model with the labeled source data and then filters out its domain-specific bias with the unlabeled source data for generalization improvement. We divide the filtering process into (1) feature extractor debiasing via k-means clustering-based semantic feature re-extraction and (2) classifier rectification through attention-guided semantic feature projection. DSBF unifies the exploration of the labeled and the unlabeled source data to enhance the discriminability and generalization of the trained model, resulting in a highly generalizable model. We further provide theoretical analysis to verify the proposed domain-specific bias filtering process. Extensive experiments on multiple datasets show the superior performance of DSBF in tackling both the challenging SLDG task and the CDG task.Comment: Accepted by International Journal of Computer Vision (IJCV

    Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding

    Full text link
    Understanding human perceptions presents a formidable multimodal challenge for computers, encompassing aspects such as sentiment tendencies and sense of humor. While various methods have recently been introduced to extract modality-invariant and specific information from diverse modalities, with the goal of enhancing the efficacy of multimodal learning, few works emphasize this aspect in large language models. In this paper, we introduce a novel multimodal prompt strategy tailored for tuning large language models. Our method assesses the correlation among different modalities and isolates the modality-invariant and specific components, which are then utilized for prompt tuning. This approach enables large language models to efficiently and effectively assimilate information from various modalities. Furthermore, our strategy is designed with scalability in mind, allowing the integration of features from any modality into pretrained large language models. Experimental results on public datasets demonstrate that our proposed method significantly improves performance compared to previous methods

    M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

    Full text link
    The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/

    Memory-Inspired Temporal Prompt Interaction for Text-Image Classification

    Full text link
    In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this contex, we propose a novel prompt-based multimodal interaction strategy inspired by human memory strategy, namely Memory-Inspired Temporal Prompt Interaction (MITP). Our proposed method involves in two stages as in human memory strategy: the acquiring stage, and the consolidation and activation stage. We utilize temporal prompts on intermediate layers to imitate the acquiring stage, leverage similarity-based prompt interaction to imitate memory consolidation, and employ prompt generation strategy to imitate memory activation. The main strength of our paper is that we interact the prompt vectors on intermediate layers to leverage sufficient information exchange between modalities, with compressed trainable parameters and memory usage. We achieve competitive results on several datasets with relatively small memory usage and 2.0M of trainable parameters (about 1% of the pre-trained foundation model)

    Homologous haplotypes, expression, genetic effects and geographic distribution of the wheat yield gene TaGW2

    Get PDF
    BACKGROUND: TaGW2-6A, cloned in earlier research, strongly influences wheat grain width and TKW. Here, we mainly analyzed haplotypes of TaGW2-6B and their effects on TKW and interaction with haplotypes at TaGW2-6A. RESULTS: About 2.9 kb of the promoter sequences of TaGW2-6B and TaGW2-6D were cloned in 34 bread wheat cultivars. Eleven SNPs were detected in the promoter region of TaGW2-6B, forming 4 haplotypes, but no divergence was detected in the TaGW2-6D promoter or coding region. Three molecular markers including CAPS, dCAPS and ACAS, were developed to distinguish the TaGW2-6B haplotypes. Haplotype association analysis indicated that TaGW2-6B has a stronger influence than TaGW2-6A on TKW, and Hap-6B-1 was a favored haplotype increasing grain width and weight that had undergone strong positive selection in global wheat breeding. However, clear geographic distribution differences for TaGW2-6A haplotypes were found; Hap-6A-A was favored in Chinese, Australian and Russian cultivars, whereas Hap-6A-G was preferred in European, American and CIMMYT cultivars. This difference might be caused by a flowering and maturity time difference between the two haplotypes. Hap-6A-A is the earlier type. Haplotype interaction analysis between TaGW2-6A and TaGW2-6B showed additive effects between the favored haplotypes. Hap-6A-A/Hap-6B-1 was the best combination to increase TKW. Relative expression analysis of the three TaGW2 homoeologous genes in 22 cultivars revealed that TaGW2-6A underwent the highest expression. TaGW2-6D was the least expressed during grain development and TaGW2-6B was intermediate. Diversity of the three genes was negatively correlated with their effect on TKW. CONCLUSIONS: Genetic effects, expression patterns and historic changes of haplotypes at three homoeologous genes of TaGW2 influencing yield were dissected in wheat cultivars. Strong and constant selection to favored haplotypes has been found in global wheat breeding during the past century. This research also provides a valuable case for understanding interaction of genes that control complex traits in polyploid species

    Iteratively Coupled Multiple Instance Learning from Instance to Bag Classifier for Whole Slide Image Classification

    Full text link
    Whole Slide Image (WSI) classification remains a challenge due to their extremely high resolution and the absence of fine-grained labels. Presently, WSIs are usually classified as a Multiple Instance Learning (MIL) problem when only slide-level labels are available. MIL methods involve a patch embedding process and a bag-level classification process, but they are prohibitively expensive to be trained end-to-end. Therefore, existing methods usually train them separately, or directly skip the training of the embedder. Such schemes hinder the patch embedder's access to slide-level labels, resulting in inconsistencies within the entire MIL pipeline. To overcome this issue, we propose a novel framework called Iteratively Coupled MIL (ICMIL), which bridges the loss back-propagation process from the bag-level classifier to the patch embedder. In ICMIL, we use category information in the bag-level classifier to guide the patch-level fine-tuning of the patch feature extractor. The refined embedder then generates better instance representations for achieving a more accurate bag-level classifier. By coupling the patch embedder and bag classifier at a low cost, our proposed framework enables information exchange between the two processes, benefiting the entire MIL classification model. We tested our framework on two datasets using three different backbones, and our experimental results demonstrate consistent performance improvements over state-of-the-art MIL methods. Code will be made available upon acceptance

    Super-Resolution Based Patch-Free 3D Image Segmentation with High-Frequency Guidance

    Full text link
    High resolution (HR) 3D images are widely used nowadays, such as medical images like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, segmentation of these 3D images remains a challenge due to their high spatial resolution and dimensionality in contrast to currently limited GPU memory. Therefore, most existing 3D image segmentation methods use patch-based models, which have low inference efficiency and ignore global contextual information. To address these problems, we propose a super-resolution (SR) based patch-free 3D image segmentation framework that can realize HR segmentation from a global-wise low-resolution (LR) input. The framework contains two sub-tasks, of which semantic segmentation is the main task and super resolution is an auxiliary task aiding in rebuilding the high frequency information from the LR input. To furthermore balance the information loss with the LR input, we propose a High-Frequency Guidance Module (HGM), and design an efficient selective cropping algorithm to crop an HR patch from the original image as restoration guidance for it. In addition, we also propose a Task-Fusion Module (TFM) to exploit the inter connections between segmentation and SR task, realizing joint optimization of the two tasks. When predicting, only the main segmentation task is needed, while other modules can be removed for acceleration. The experimental results on two different datasets show that our framework has a four times higher inference speed compared to traditional patch-based methods, while its performance also surpasses other patch-based and patch-free models.Comment: Version #2 uploaded in Jul 10, 202
    corecore