66 research outputs found
A model-free feature selection technique of feature screening and random forest based recursive feature elimination
In this paper, we propose a model-free feature selection method for
ultra-high dimensional data with mass features. This is a two phases procedure
that we propose to use the fused Kolmogorov filter with the random forest based
RFE to remove model limitations and reduce the computational complexity. The
method is fully nonparametric and can work with various types of datasets. It
has several appealing characteristics, i.e., accuracy, model-free, and
computational efficiency, and can be widely used in practical problems, such as
multiclass classification, nonparametric regression, and Poisson regression,
among others. We show that the proposed method is selection consistent and
consistent under weak regularity conditions. We further demonstrate the
superior performance of the proposed method over other existing methods by
simulations and real data examples
Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths
ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing
everyday if-then knowledge triplets, i.e., {head event, relation, tail event}.
The one-hop annotation manner made ATOMIC a set of independent bipartite
graphs, which ignored the numerous links between events in different bipartite
graphs and consequently caused shortages in knowledge coverage and multi-hop
paths. In this work, we aim to construct Dense-ATOMIC with high knowledge
coverage and massive multi-hop paths. The events in ATOMIC are normalized to a
consistent pattern at first. We then propose a CSKG completion method called
Rel-CSKGC to predict the relation given the head event and the tail event of a
triplet, and train a CSKG completion model based on existing triplets in
ATOMIC. We finally utilize the model to complete the missing links in ATOMIC
and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on
an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over
strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in
terms of statistics, human evaluation, and simple downstream tasks, all proving
Dense-ATOMIC's advantages in Knowledge Coverage and Multi-hop Paths. Both the
source code of Rel-CSKGC and Dense-ATOMIC are publicly available on
https://github.com/NUSTM/Dense-ATOMIC.Comment: Accepted by ACL 2023 Main Conferenc
VCD: Knowledge Base Guided Visual Commonsense Discovery in Images
Visual commonsense contains knowledge about object properties, relationships,
and behaviors in visual data. Discovering visual commonsense can provide a more
comprehensive and richer understanding of images, and enhance the reasoning and
decision-making capabilities of computer vision systems. However, the visual
commonsense defined in existing visual commonsense discovery studies is
coarse-grained and incomplete. In this work, we draw inspiration from a
commonsense knowledge base ConceptNet in natural language processing, and
systematically define the types of visual commonsense. Based on this, we
introduce a new task, Visual Commonsense Discovery (VCD), aiming to extract
fine-grained commonsense of different types contained within different objects
in the image. We accordingly construct a dataset (VCDD) from Visual Genome and
ConceptNet for VCD, featuring over 100,000 images and 14 million
object-commonsense pairs. We furthermore propose a generative model (VCDM) that
integrates a vision-language model with instruction tuning to tackle VCD.
Automatic and human evaluations demonstrate VCDM's proficiency in VCD,
particularly outperforming GPT-4V in implicit commonsense discovery. The value
of VCD is further demonstrated by its application to two downstream tasks,
including visual commonsense evaluation and visual question answering. The data
and code will be made available on GitHub
Network Analysis-Based Approach for Exploring the Potential Diagnostic Biomarkers of Acute Myocardial Infarction
Acute myocardial infarction (AMI) is a severe cardiovascular disease that is a serious threat to human life. However, the specific diagnostic biomarkers have not been fully clarified and candidate regulatory targets for AMI have not been identified. In order to explore the potential diagnostic biomarkers and possible regulatory targets of AMI, we used a network analysis-based approach to analyze microarray expression profiling of peripheral blood in patients with AMI. The significant differentially-expressed genes (DEGs) were screened by Limma and constructed a gene function regulatory network (GO-Tree) to obtain the inherent affiliation of significant function terms. The pathway action network was constructed, and the signal transfer relationship between pathway terms was mined in order to investigate the impact of core pathway terms in AMI. Subsequently, constructed the transcription regulatory network of DEGs. Weighted gene co-expression network analysis (WGCNA) was employed to identify significantly altered gene modules and hub genes in two groups. Subsequently, the transcription regulation network of DEGs was constructed. We found that specific gene modules may provide a better insight into the potential diagnostic biomarkers of AMI. Our findings revealed and verified that NCF4, AQP9, NFIL3, DYSF, GZMA, TBX21, PRF1 and PTGDR genes by RT-qPCR. TBX21 and PRF1 may be potential candidates for diagnostic biomarker and possible regulatory targets in AMI
MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis
Aspect-based sentiment analysis is a long-standing research interest in the
field of opinion mining, and in recent years, researchers have gradually
shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA
tasks. However, the datasets currently used in the research are limited to
individual elements of specific tasks, usually focusing on in-domain settings,
ignoring implicit aspects and opinions, and with a small data scale. To address
these issues, we propose a large-scale Multi-Element Multi-Domain dataset
(MEMD) that covers the four elements across five domains, including nearly
20,000 review sentences and 30,000 quadruples annotated with explicit and
implicit aspects and opinions for ABSA research. Meanwhile, we evaluate
generative and non-generative baselines on multiple ABSA subtasks under the
open domain setting, and the results show that open domain ABSA as well as
mining implicit aspects and opinions remain ongoing challenges to be addressed.
The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}
CONVERT:Contrastive Graph Clustering with Reliable Augmentation
Contrastive graph node clustering via learnable data augmentation is a hot
research spot in the field of unsupervised graph learning. The existing methods
learn the sampling distribution of a pre-defined augmentation to generate
data-driven augmentations automatically. Although promising clustering
performance has been achieved, we observe that these strategies still rely on
pre-defined augmentations, the semantics of the augmented graph can easily
drift. The reliability of the augmented view semantics for contrastive learning
can not be guaranteed, thus limiting the model performance. To address these
problems, we propose a novel CONtrastiVe Graph ClustEring network with Reliable
AugmenTation (COVERT). Specifically, in our method, the data augmentations are
processed by the proposed reversible perturb-recover network. It distills
reliable semantic information by recovering the perturbed latent embeddings.
Moreover, to further guarantee the reliability of semantics, a novel semantic
loss is presented to constrain the network via quantifying the perturbation and
recovery. Lastly, a label-matching mechanism is designed to guide the model by
clustering information through aligning the semantic labels and the selected
high-confidence clustering pseudo labels. Extensive experimental results on
seven datasets demonstrate the effectiveness of the proposed method. We release
the code and appendix of CONVERT at https://github.com/xihongyang1999/CONVERT
on GitHub
A Survey of Deep Graph Clustering: Taxonomy, Challenge, and Application
Graph clustering, which aims to divide the nodes in the graph into several
distinct clusters, is a fundamental and challenging task. In recent years, deep
graph clustering methods have been increasingly proposed and achieved promising
performance. However, the corresponding survey paper is scarce and it is
imminent to make a summary in this field. From this motivation, this paper
makes the first comprehensive survey of deep graph clustering. Firstly, the
detailed definition of deep graph clustering and the important baseline methods
are introduced. Besides, the taxonomy of deep graph clustering methods is
proposed based on four different criteria including graph type, network
architecture, learning paradigm, and clustering method. In addition, through
the careful analysis of the existing works, the challenges and opportunities
from five perspectives are summarized. At last, the applications of deep graph
clustering in four domains are presented. It is worth mentioning that a
collection of state-of-the-art deep graph clustering methods including papers,
codes, and datasets is available on GitHub. We hope this work will serve as a
quick guide and help researchers to overcome challenges in this vibrant field.Comment: 13 pages, 13 figure
- …