Search CORE

66 research outputs found

A model-free feature selection technique of feature screening and random forest based recursive feature elimination

Author: Xia Siwei
Yang Yuehan
Publication venue
Publication date: 14/02/2023
Field of study

In this paper, we propose a model-free feature selection method for ultra-high dimensional data with mass features. This is a two phases procedure that we propose to use the fused Kolmogorov filter with the random forest based RFE to remove model limitations and reduce the computational complexity. The method is fully nonparametric and can work with various types of datasets. It has several appealing characteristics, i.e., accuracy, model-free, and computational efficiency, and can be widely used in practical problems, such as multiclass classification, nonparametric regression, and Poisson regression, among others. We show that the proposed method is selection consistent and

L_2

consistent under weak regularity conditions. We further demonstrate the superior performance of the proposed method over other existing methods by simulations and real data examples

arXiv.org e-Print Archive

Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths

Author: Shen Xiangqing
Wu Siwei
Xia Rui
Publication venue
Publication date: 28/05/2023
Field of study

ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing everyday if-then knowledge triplets, i.e., {head event, relation, tail event}. The one-hop annotation manner made ATOMIC a set of independent bipartite graphs, which ignored the numerous links between events in different bipartite graphs and consequently caused shortages in knowledge coverage and multi-hop paths. In this work, we aim to construct Dense-ATOMIC with high knowledge coverage and massive multi-hop paths. The events in ATOMIC are normalized to a consistent pattern at first. We then propose a CSKG completion method called Rel-CSKGC to predict the relation given the head event and the tail event of a triplet, and train a CSKG completion model based on existing triplets in ATOMIC. We finally utilize the model to complete the missing links in ATOMIC and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in terms of statistics, human evaluation, and simple downstream tasks, all proving Dense-ATOMIC's advantages in Knowledge Coverage and Multi-hop Paths. Both the source code of Rel-CSKGC and Dense-ATOMIC are publicly available on https://github.com/NUSTM/Dense-ATOMIC.Comment: Accepted by ACL 2023 Main Conferenc

arXiv.org e-Print Archive

VCD: Knowledge Base Guided Visual Commonsense Discovery in Images

Author: Shen Xiangqing
Song Yurun
Wu Siwei
Xia Rui
Publication venue
Publication date: 27/02/2024
Field of study

Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-grained and incomplete. In this work, we draw inspiration from a commonsense knowledge base ConceptNet in natural language processing, and systematically define the types of visual commonsense. Based on this, we introduce a new task, Visual Commonsense Discovery (VCD), aiming to extract fine-grained commonsense of different types contained within different objects in the image. We accordingly construct a dataset (VCDD) from Visual Genome and ConceptNet for VCD, featuring over 100,000 images and 14 million object-commonsense pairs. We furthermore propose a generative model (VCDM) that integrates a vision-language model with instruction tuning to tackle VCD. Automatic and human evaluations demonstrate VCDM's proficiency in VCD, particularly outperforming GPT-4V in implicit commonsense discovery. The value of VCD is further demonstrated by its application to two downstream tasks, including visual commonsense evaluation and visual question answering. The data and code will be made available on GitHub

arXiv.org e-Print Archive

Network Analysis-Based Approach for Exploring the Potential Diagnostic Biomarkers of Acute Myocardial Infarction

Author: Jiaqi Chen
Ling Yu
Siwei Zhang
Xia Chen
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2016
Field of study

Acute myocardial infarction (AMI) is a severe cardiovascular disease that is a serious threat to human life. However, the specific diagnostic biomarkers have not been fully clarified and candidate regulatory targets for AMI have not been identified. In order to explore the potential diagnostic biomarkers and possible regulatory targets of AMI, we used a network analysis-based approach to analyze microarray expression profiling of peripheral blood in patients with AMI. The significant differentially-expressed genes (DEGs) were screened by Limma and constructed a gene function regulatory network (GO-Tree) to obtain the inherent affiliation of significant function terms. The pathway action network was constructed, and the signal transfer relationship between pathway terms was mined in order to investigate the impact of core pathway terms in AMI. Subsequently, constructed the transcription regulatory network of DEGs. Weighted gene co-expression network analysis (WGCNA) was employed to identify significantly altered gene modules and hub genes in two groups. Subsequently, the transcription regulation network of DEGs was constructed. We found that specific gene modules may provide a better insight into the potential diagnostic biomarkers of AMI. Our findings revealed and verified that NCF4, AQP9, NFIL3, DYSF, GZMA, TBX21, PRF1 and PTGDR genes by RT-qPCR. TBX21 and PRF1 may be potential candidates for diagnostic biomarker and possible regulatory targets in AMI

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

Author: Cai Hongjie
Li Ke
Liu Shijie
Song Nan
Wang Zengzhi
Wu Siwei
Xia Rui
Xie Qiming
Yu Jianfei
Zhao Qiankun
Publication venue
Publication date: 29/06/2023
Field of study

Aspect-based sentiment analysis is a long-standing research interest in the field of opinion mining, and in recent years, researchers have gradually shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA tasks. However, the datasets currently used in the research are limited to individual elements of specific tasks, usually focusing on in-domain settings, ignoring implicit aspects and opinions, and with a small data scale. To address these issues, we propose a large-scale Multi-Element Multi-Domain dataset (MEMD) that covers the four elements across five domains, including nearly 20,000 review sentences and 30,000 quadruples annotated with explicit and implicit aspects and opinions for ABSA research. Meanwhile, we evaluate generative and non-generative baselines on multiple ABSA subtasks under the open domain setting, and the results show that open domain ABSA as well as mining implicit aspects and opinions remain ongoing challenges to be addressed. The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}

arXiv.org e-Print Archive

CONVERT:Contrastive Graph Clustering with Reliable Augmentation

Author: Li Stan Z.
Liang Ke
Liu Xinwang
Liu Yue
Tan Cheng
Wang Siwei
Xia Jun
Yang Xihong
Zhou Sihang
Zhu En
Publication venue
Publication date: 17/08/2023
Field of study

Contrastive graph node clustering via learnable data augmentation is a hot research spot in the field of unsupervised graph learning. The existing methods learn the sampling distribution of a pre-defined augmentation to generate data-driven augmentations automatically. Although promising clustering performance has been achieved, we observe that these strategies still rely on pre-defined augmentations, the semantics of the augmented graph can easily drift. The reliability of the augmented view semantics for contrastive learning can not be guaranteed, thus limiting the model performance. To address these problems, we propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (COVERT). Specifically, in our method, the data augmentations are processed by the proposed reversible perturb-recover network. It distills reliable semantic information by recovering the perturbed latent embeddings. Moreover, to further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network via quantifying the perturbation and recovery. Lastly, a label-matching mechanism is designed to guide the model by clustering information through aligning the semantic labels and the selected high-confidence clustering pseudo labels. Extensive experimental results on seven datasets demonstrate the effectiveness of the proposed method. We release the code and appendix of CONVERT at https://github.com/xihongyang1999/CONVERT on GitHub

arXiv.org e-Print Archive

A Survey of Deep Graph Clustering: Taxonomy, Challenge, and Application

Author: Guo Xifeng
Li Stan Z.
Liang Ke
Liu Xinwang
Liu Yue
Tu Wenxuan
Wang Siwei
Xia Jun
Yang Xihong
Zhou Sihang
Publication venue
Publication date: 23/11/2022
Field of study

Graph clustering, which aims to divide the nodes in the graph into several distinct clusters, is a fundamental and challenging task. In recent years, deep graph clustering methods have been increasingly proposed and achieved promising performance. However, the corresponding survey paper is scarce and it is imminent to make a summary in this field. From this motivation, this paper makes the first comprehensive survey of deep graph clustering. Firstly, the detailed definition of deep graph clustering and the important baseline methods are introduced. Besides, the taxonomy of deep graph clustering methods is proposed based on four different criteria including graph type, network architecture, learning paradigm, and clustering method. In addition, through the careful analysis of the existing works, the challenges and opportunities from five perspectives are summarized. At last, the applications of deep graph clustering in four domains are presented. It is worth mentioning that a collection of state-of-the-art deep graph clustering methods including papers, codes, and datasets is available on GitHub. We hope this work will serve as a quick guide and help researchers to overcome challenges in this vibrant field.Comment: 13 pages, 13 figure

arXiv.org e-Print Archive