38 research outputs found
Domain-Specific Bias Filtering for Single Labeled Domain Generalization
Conventional Domain Generalization (CDG) utilizes multiple labeled source
datasets to train a generalizable model for unseen target domains. However, due
to expensive annotation costs, the requirements of labeling all the source data
are hard to be met in real-world applications. In this paper, we investigate a
Single Labeled Domain Generalization (SLDG) task with only one source domain
being labeled, which is more practical and challenging than the CDG task. A
major obstacle in the SLDG task is the discriminability-generalization bias:
the discriminative information in the labeled source dataset may contain
domain-specific bias, constraining the generalization of the trained model. To
tackle this challenging task, we propose a novel framework called
Domain-Specific Bias Filtering (DSBF), which initializes a discriminative model
with the labeled source data and then filters out its domain-specific bias with
the unlabeled source data for generalization improvement. We divide the
filtering process into (1) feature extractor debiasing via k-means
clustering-based semantic feature re-extraction and (2) classifier
rectification through attention-guided semantic feature projection. DSBF
unifies the exploration of the labeled and the unlabeled source data to enhance
the discriminability and generalization of the trained model, resulting in a
highly generalizable model. We further provide theoretical analysis to verify
the proposed domain-specific bias filtering process. Extensive experiments on
multiple datasets show the superior performance of DSBF in tackling both the
challenging SLDG task and the CDG task.Comment: Accepted by International Journal of Computer Vision (IJCV
Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding
Understanding human perceptions presents a formidable multimodal challenge
for computers, encompassing aspects such as sentiment tendencies and sense of
humor. While various methods have recently been introduced to extract
modality-invariant and specific information from diverse modalities, with the
goal of enhancing the efficacy of multimodal learning, few works emphasize this
aspect in large language models. In this paper, we introduce a novel multimodal
prompt strategy tailored for tuning large language models. Our method assesses
the correlation among different modalities and isolates the modality-invariant
and specific components, which are then utilized for prompt tuning. This
approach enables large language models to efficiently and effectively
assimilate information from various modalities. Furthermore, our strategy is
designed with scalability in mind, allowing the integration of features from
any modality into pretrained large language models. Experimental results on
public datasets demonstrate that our proposed method significantly improves
performance compared to previous methods
M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images
The advancement of Spatial Transcriptomics (ST) has facilitated the
spatially-aware profiling of gene expressions based on histopathology images.
Although ST data offers valuable insights into the micro-environment of tumors,
its acquisition cost remains expensive. Therefore, directly predicting the ST
expressions from digital pathology images is desired. Current methods usually
adopt existing regression backbones for this task, which ignore the inherent
multi-scale hierarchical data structure of digital pathology images. To address
this limit, we propose M2ORT, a many-to-one regression Transformer that can
accommodate the hierarchical structure of the pathology images through a
decoupled multi-scale feature extractor. Different from traditional models that
are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology
images of different magnifications at a time to jointly predict the gene
expressions at their corresponding common ST spot, aiming at learning a
many-to-one relationship through training. We have tested M2ORT on three public
ST datasets and the experimental results show that M2ORT can achieve
state-of-the-art performance with fewer parameters and floating-point
operations (FLOPs). The code is available at:
https://github.com/Dootmaan/M2ORT/
Memory-Inspired Temporal Prompt Interaction for Text-Image Classification
In recent years, large-scale pre-trained multimodal models (LMM) generally
emerge to integrate the vision and language modalities, achieving considerable
success in various natural language processing and computer vision tasks. The
growing size of LMMs, however, results in a significant computational cost for
fine-tuning these models for downstream tasks. Hence, prompt-based interaction
strategy is studied to align modalities more efficiently. In this contex, we
propose a novel prompt-based multimodal interaction strategy inspired by human
memory strategy, namely Memory-Inspired Temporal Prompt Interaction (MITP). Our
proposed method involves in two stages as in human memory strategy: the
acquiring stage, and the consolidation and activation stage. We utilize
temporal prompts on intermediate layers to imitate the acquiring stage,
leverage similarity-based prompt interaction to imitate memory consolidation,
and employ prompt generation strategy to imitate memory activation. The main
strength of our paper is that we interact the prompt vectors on intermediate
layers to leverage sufficient information exchange between modalities, with
compressed trainable parameters and memory usage. We achieve competitive
results on several datasets with relatively small memory usage and 2.0M of
trainable parameters (about 1% of the pre-trained foundation model)
Homologous haplotypes, expression, genetic effects and geographic distribution of the wheat yield gene TaGW2
BACKGROUND: TaGW2-6A, cloned in earlier research, strongly influences wheat grain width and TKW. Here, we mainly analyzed haplotypes of TaGW2-6B and their effects on TKW and interaction with haplotypes at TaGW2-6A. RESULTS: About 2.9 kb of the promoter sequences of TaGW2-6B and TaGW2-6D were cloned in 34 bread wheat cultivars. Eleven SNPs were detected in the promoter region of TaGW2-6B, forming 4 haplotypes, but no divergence was detected in the TaGW2-6D promoter or coding region. Three molecular markers including CAPS, dCAPS and ACAS, were developed to distinguish the TaGW2-6B haplotypes. Haplotype association analysis indicated that TaGW2-6B has a stronger influence than TaGW2-6A on TKW, and Hap-6B-1 was a favored haplotype increasing grain width and weight that had undergone strong positive selection in global wheat breeding. However, clear geographic distribution differences for TaGW2-6A haplotypes were found; Hap-6A-A was favored in Chinese, Australian and Russian cultivars, whereas Hap-6A-G was preferred in European, American and CIMMYT cultivars. This difference might be caused by a flowering and maturity time difference between the two haplotypes. Hap-6A-A is the earlier type. Haplotype interaction analysis between TaGW2-6A and TaGW2-6B showed additive effects between the favored haplotypes. Hap-6A-A/Hap-6B-1 was the best combination to increase TKW. Relative expression analysis of the three TaGW2 homoeologous genes in 22 cultivars revealed that TaGW2-6A underwent the highest expression. TaGW2-6D was the least expressed during grain development and TaGW2-6B was intermediate. Diversity of the three genes was negatively correlated with their effect on TKW. CONCLUSIONS: Genetic effects, expression patterns and historic changes of haplotypes at three homoeologous genes of TaGW2 influencing yield were dissected in wheat cultivars. Strong and constant selection to favored haplotypes has been found in global wheat breeding during the past century. This research also provides a valuable case for understanding interaction of genes that control complex traits in polyploid species
Iteratively Coupled Multiple Instance Learning from Instance to Bag Classifier for Whole Slide Image Classification
Whole Slide Image (WSI) classification remains a challenge due to their
extremely high resolution and the absence of fine-grained labels. Presently,
WSIs are usually classified as a Multiple Instance Learning (MIL) problem when
only slide-level labels are available. MIL methods involve a patch embedding
process and a bag-level classification process, but they are prohibitively
expensive to be trained end-to-end. Therefore, existing methods usually train
them separately, or directly skip the training of the embedder. Such schemes
hinder the patch embedder's access to slide-level labels, resulting in
inconsistencies within the entire MIL pipeline. To overcome this issue, we
propose a novel framework called Iteratively Coupled MIL (ICMIL), which bridges
the loss back-propagation process from the bag-level classifier to the patch
embedder. In ICMIL, we use category information in the bag-level classifier to
guide the patch-level fine-tuning of the patch feature extractor. The refined
embedder then generates better instance representations for achieving a more
accurate bag-level classifier. By coupling the patch embedder and bag
classifier at a low cost, our proposed framework enables information exchange
between the two processes, benefiting the entire MIL classification model. We
tested our framework on two datasets using three different backbones, and our
experimental results demonstrate consistent performance improvements over
state-of-the-art MIL methods. Code will be made available upon acceptance
Super-Resolution Based Patch-Free 3D Image Segmentation with High-Frequency Guidance
High resolution (HR) 3D images are widely used nowadays, such as medical
images like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT).
However, segmentation of these 3D images remains a challenge due to their high
spatial resolution and dimensionality in contrast to currently limited GPU
memory. Therefore, most existing 3D image segmentation methods use patch-based
models, which have low inference efficiency and ignore global contextual
information. To address these problems, we propose a super-resolution (SR)
based patch-free 3D image segmentation framework that can realize HR
segmentation from a global-wise low-resolution (LR) input. The framework
contains two sub-tasks, of which semantic segmentation is the main task and
super resolution is an auxiliary task aiding in rebuilding the high frequency
information from the LR input. To furthermore balance the information loss with
the LR input, we propose a High-Frequency Guidance Module (HGM), and design an
efficient selective cropping algorithm to crop an HR patch from the original
image as restoration guidance for it. In addition, we also propose a
Task-Fusion Module (TFM) to exploit the inter connections between segmentation
and SR task, realizing joint optimization of the two tasks. When predicting,
only the main segmentation task is needed, while other modules can be removed
for acceleration. The experimental results on two different datasets show that
our framework has a four times higher inference speed compared to traditional
patch-based methods, while its performance also surpasses other patch-based and
patch-free models.Comment: Version #2 uploaded in Jul 10, 202