Search CORE

562 research outputs found

Interpretation on Multi-modal Visual Fusion

Author: Chen Hao
Deng Yongjian
Zhou Haoran
Publication venue
Publication date: 19/08/2023
Field of study

In this paper, we present an analytical framework and a novel metric to shed light on the interpretation of the multimodal vision community. Our approach involves measuring the proposed semantic variance and feature similarity across modalities and levels, and conducting semantic and quantitative analyses through comprehensive experiments. Specifically, we investigate the consistency and speciality of representations across modalities, evolution rules within each modality, and the collaboration logic used when optimizing a multi-modality model. Our studies reveal several important findings, such as the discrepancy in cross-modal features and the hybrid multi-modal cooperation rule, which highlights consistency and speciality simultaneously for complementary inference. Through our dissection and findings on multi-modal fusion, we facilitate a rethinking of the reasonability and necessity of popular multi-modal vision fusion strategies. Furthermore, our work lays the foundation for designing a trustworthy and universal multi-modal fusion model for a variety of tasks in the future.Comment: This version was under review since 2023/3/

arXiv.org e-Print Archive

DLUNet: Semi-supervised Learning based Dual-Light UNet for Multi-organ Segmentation

Author: Lai Haoran
Wang Tao
Zhou Shuoling
Publication venue
Publication date: 22/09/2022
Field of study

The manual ground truth of abdominal multi-organ is labor-intensive. In order to make full use of CT data, we developed a semi-supervised learning based dual-light UNet. In the training phase, it consists of two light UNets, which make full use of label and unlabeled data simultaneously by using consistent-based learning. Moreover, separable convolution and residual concatenation was introduced light UNet to reduce the computational cost. Further, a robust segmentation loss was applied to improve the performance. In the inference phase, only a light UNet is used, which required low time cost and less GPU memory utilization. The average DSC of this method in the validation set is 0.8718. The code is available in https://github.com/laihaoran/Semi-SupervisednnUNet.Comment: 13 page, 3 figure

arXiv.org e-Print Archive

Dual Feature Augmentation Network for Generalized Zero-shot Learning

Author: Duan Haoran
Long Yang
Xiang Lei
Zhou Yuan
Publication venue
Publication date: 24/09/2023
Field of study

Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes. Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image. However, these methods often ignore the complex entanglement among different attributes' visual features in the embedding space. Additionally, these methods employ a direct attribute prediction scheme for classification, which does not account for the diversity of attributes in images of the same category. To address these issues, we propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules, one for visual features and the other for semantic features. The visual feature augmentation module explicitly learns attribute features and employs cosine distance to separate them, thus enhancing attribute representation. In the semantic feature augmentation module, we propose a bias learner to capture the offset that bridges the gap between actual and predicted attribute values from a dataset's perspective. Furthermore, we introduce two predictors to reconcile the conflicts between local and global features. Experimental results on three benchmarks demonstrate the marked advancement of our method compared to state-of-the-art approaches. Our code is available at https://github.com/Sion1/DFAN.Comment: Accepted to BMVC202

arXiv.org e-Print Archive

Expression levels of microRNAs are not associated with their regulatory activities

Author: Liang Zhi
Wu Jiarui
Zheng Haoran
Zhou Hong
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

MicroRNAs (miRNAs) regulate their targets by triggering mRNA degradation or translational repression. The negative relationship between miRNAs and their targets suggests that the regulatory effect of a miRNA could be determined from the expression levels of its targets. Here, we investigated the relationship between miRNA activities determined by computational programs and miRNA expression levels by using data in which both mRNA and miRNA expression from the same samples were measured. We found that different from the intuitive expectation one might have, miRNA activity shows very weak correlation with miRNA expression, which indicates complex regulating mechanisms between miRNAs and their target genes

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Reliability analysis of grid-interfaced filter capacitors

Author: Blaabjerg Frede
Wang Haoran
Wang Huai
Zhou Dao
Publication venue
Publication date: 01/09/2018
Field of study

VBN

Agent as Cerebrum, Controller as Cerebellum: Implementing an Embodied LMM-based Agent on Drones

Author: Pan Fengxing
Ping Huqiuyue
Zhao Haoran
Zhou Yaoming
Publication venue
Publication date: 25/11/2023
Field of study

In this study, we present a novel paradigm for industrial robotic embodied agents, encapsulating an 'agent as cerebrum, controller as cerebellum' architecture. Our approach harnesses the power of Large Multimodal Models (LMMs) within an agent framework known as AeroAgent, tailored for drone technology in industrial settings. To facilitate seamless integration with robotic systems, we introduce ROSchain, a bespoke linkage framework connecting LMM-based agents to the Robot Operating System (ROS). We report findings from extensive empirical research, including simulated experiments on the Airgen and real-world case study, particularly in individual search and rescue operations. The results demonstrate AeroAgent's superior performance in comparison to existing Deep Reinforcement Learning (DRL)-based agents, highlighting the advantages of the embodied LMM in complex, real-world scenarios.Comment: 17 pages, 12 figure

arXiv.org e-Print Archive

Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection

Author: Long Yanhua
Wei Haoran
Xu Dongxing
Zhou Yifan
Publication venue
Publication date: 15/11/2023
Field of study

In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features. To address these challenges, we propose a new AE-based framework termed AEGM. Specifically, we first insert an auxiliary classifier into AE to enhance ASD in a multi-task learning manner. Then, we design a group-based decoder structure, accompanied by an adaptive loss function, to endow the model with domain-specific knowledge. Results on the DCASE 2021 Task 2 development set show that our methods achieve a relative improvement of 13.11% and 15.20% respectively in average AUC over the official AE and MobileNetV2 across test sets of seven machines.Comment: Submitted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024

arXiv.org e-Print Archive