551 research outputs found
Interpretation on Multi-modal Visual Fusion
In this paper, we present an analytical framework and a novel metric to shed
light on the interpretation of the multimodal vision community. Our approach
involves measuring the proposed semantic variance and feature similarity across
modalities and levels, and conducting semantic and quantitative analyses
through comprehensive experiments. Specifically, we investigate the consistency
and speciality of representations across modalities, evolution rules within
each modality, and the collaboration logic used when optimizing a
multi-modality model. Our studies reveal several important findings, such as
the discrepancy in cross-modal features and the hybrid multi-modal cooperation
rule, which highlights consistency and speciality simultaneously for
complementary inference. Through our dissection and findings on multi-modal
fusion, we facilitate a rethinking of the reasonability and necessity of
popular multi-modal vision fusion strategies. Furthermore, our work lays the
foundation for designing a trustworthy and universal multi-modal fusion model
for a variety of tasks in the future.Comment: This version was under review since 2023/3/
DLUNet: Semi-supervised Learning based Dual-Light UNet for Multi-organ Segmentation
The manual ground truth of abdominal multi-organ is labor-intensive. In order
to make full use of CT data, we developed a semi-supervised learning based
dual-light UNet. In the training phase, it consists of two light UNets, which
make full use of label and unlabeled data simultaneously by using
consistent-based learning. Moreover, separable convolution and residual
concatenation was introduced light UNet to reduce the computational cost.
Further, a robust segmentation loss was applied to improve the performance. In
the inference phase, only a light UNet is used, which required low time cost
and less GPU memory utilization. The average DSC of this method in the
validation set is 0.8718. The code is available in
https://github.com/laihaoran/Semi-SupervisednnUNet.Comment: 13 page, 3 figure
Expression levels of microRNAs are not associated with their regulatory activities
MicroRNAs (miRNAs) regulate their targets by triggering mRNA degradation or translational repression. The negative relationship between miRNAs and their targets suggests that the regulatory effect of a miRNA could be determined from the expression levels of its targets. Here, we investigated the relationship between miRNA activities determined by computational programs and miRNA expression levels by using data in which both mRNA and miRNA expression from the same samples were measured. We found that different from the intuitive expectation one might have, miRNA activity shows very weak correlation with miRNA expression, which indicates complex regulating mechanisms between miRNAs and their target genes
Dual Feature Augmentation Network for Generalized Zero-shot Learning
Zero-shot learning (ZSL) aims to infer novel classes without training samples
by transferring knowledge from seen classes. Existing embedding-based
approaches for ZSL typically employ attention mechanisms to locate attributes
on an image. However, these methods often ignore the complex entanglement among
different attributes' visual features in the embedding space. Additionally,
these methods employ a direct attribute prediction scheme for classification,
which does not account for the diversity of attributes in images of the same
category. To address these issues, we propose a novel Dual Feature Augmentation
Network (DFAN), which comprises two feature augmentation modules, one for
visual features and the other for semantic features. The visual feature
augmentation module explicitly learns attribute features and employs cosine
distance to separate them, thus enhancing attribute representation. In the
semantic feature augmentation module, we propose a bias learner to capture the
offset that bridges the gap between actual and predicted attribute values from
a dataset's perspective. Furthermore, we introduce two predictors to reconcile
the conflicts between local and global features. Experimental results on three
benchmarks demonstrate the marked advancement of our method compared to
state-of-the-art approaches. Our code is available at
https://github.com/Sion1/DFAN.Comment: Accepted to BMVC202
Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
In industry, machine anomalous sound detection (ASD) is in great demand.
However, collecting enough abnormal samples is difficult due to the high cost,
which boosts the rapid development of unsupervised ASD algorithms. Autoencoder
(AE) based methods have been widely used for unsupervised ASD, but suffer from
problems including 'shortcut', poor anti-noise ability and sub-optimal quality
of features. To address these challenges, we propose a new AE-based framework
termed AEGM. Specifically, we first insert an auxiliary classifier into AE to
enhance ASD in a multi-task learning manner. Then, we design a group-based
decoder structure, accompanied by an adaptive loss function, to endow the model
with domain-specific knowledge. Results on the DCASE 2021 Task 2 development
set show that our methods achieve a relative improvement of 13.11% and 15.20%
respectively in average AUC over the official AE and MobileNetV2 across test
sets of seven machines.Comment: Submitted to the 2024 IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP 2024
- …