66 research outputs found

    A model-free feature selection technique of feature screening and random forest based recursive feature elimination

    Full text link
    In this paper, we propose a model-free feature selection method for ultra-high dimensional data with mass features. This is a two phases procedure that we propose to use the fused Kolmogorov filter with the random forest based RFE to remove model limitations and reduce the computational complexity. The method is fully nonparametric and can work with various types of datasets. It has several appealing characteristics, i.e., accuracy, model-free, and computational efficiency, and can be widely used in practical problems, such as multiclass classification, nonparametric regression, and Poisson regression, among others. We show that the proposed method is selection consistent and L2L_2 consistent under weak regularity conditions. We further demonstrate the superior performance of the proposed method over other existing methods by simulations and real data examples

    Dense-ATOMIC: Towards Densely-connected ATOMIC with High Knowledge Coverage and Massive Multi-hop Paths

    Full text link
    ATOMIC is a large-scale commonsense knowledge graph (CSKG) containing everyday if-then knowledge triplets, i.e., {head event, relation, tail event}. The one-hop annotation manner made ATOMIC a set of independent bipartite graphs, which ignored the numerous links between events in different bipartite graphs and consequently caused shortages in knowledge coverage and multi-hop paths. In this work, we aim to construct Dense-ATOMIC with high knowledge coverage and massive multi-hop paths. The events in ATOMIC are normalized to a consistent pattern at first. We then propose a CSKG completion method called Rel-CSKGC to predict the relation given the head event and the tail event of a triplet, and train a CSKG completion model based on existing triplets in ATOMIC. We finally utilize the model to complete the missing links in ATOMIC and accordingly construct Dense-ATOMIC. Both automatic and human evaluation on an annotated subgraph of ATOMIC demonstrate the advantage of Rel-CSKGC over strong baselines. We further conduct extensive evaluations on Dense-ATOMIC in terms of statistics, human evaluation, and simple downstream tasks, all proving Dense-ATOMIC's advantages in Knowledge Coverage and Multi-hop Paths. Both the source code of Rel-CSKGC and Dense-ATOMIC are publicly available on https://github.com/NUSTM/Dense-ATOMIC.Comment: Accepted by ACL 2023 Main Conferenc

    VCD: Knowledge Base Guided Visual Commonsense Discovery in Images

    Full text link
    Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-grained and incomplete. In this work, we draw inspiration from a commonsense knowledge base ConceptNet in natural language processing, and systematically define the types of visual commonsense. Based on this, we introduce a new task, Visual Commonsense Discovery (VCD), aiming to extract fine-grained commonsense of different types contained within different objects in the image. We accordingly construct a dataset (VCDD) from Visual Genome and ConceptNet for VCD, featuring over 100,000 images and 14 million object-commonsense pairs. We furthermore propose a generative model (VCDM) that integrates a vision-language model with instruction tuning to tackle VCD. Automatic and human evaluations demonstrate VCDM's proficiency in VCD, particularly outperforming GPT-4V in implicit commonsense discovery. The value of VCD is further demonstrated by its application to two downstream tasks, including visual commonsense evaluation and visual question answering. The data and code will be made available on GitHub

    Network Analysis-Based Approach for Exploring the Potential Diagnostic Biomarkers of Acute Myocardial Infarction

    Get PDF
    Acute myocardial infarction (AMI) is a severe cardiovascular disease that is a serious threat to human life. However, the specific diagnostic biomarkers have not been fully clarified and candidate regulatory targets for AMI have not been identified. In order to explore the potential diagnostic biomarkers and possible regulatory targets of AMI, we used a network analysis-based approach to analyze microarray expression profiling of peripheral blood in patients with AMI. The significant differentially-expressed genes (DEGs) were screened by Limma and constructed a gene function regulatory network (GO-Tree) to obtain the inherent affiliation of significant function terms. The pathway action network was constructed, and the signal transfer relationship between pathway terms was mined in order to investigate the impact of core pathway terms in AMI. Subsequently, constructed the transcription regulatory network of DEGs. Weighted gene co-expression network analysis (WGCNA) was employed to identify significantly altered gene modules and hub genes in two groups. Subsequently, the transcription regulation network of DEGs was constructed. We found that specific gene modules may provide a better insight into the potential diagnostic biomarkers of AMI. Our findings revealed and verified that NCF4, AQP9, NFIL3, DYSF, GZMA, TBX21, PRF1 and PTGDR genes by RT-qPCR. TBX21 and PRF1 may be potential candidates for diagnostic biomarker and possible regulatory targets in AMI

    MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

    Full text link
    Aspect-based sentiment analysis is a long-standing research interest in the field of opinion mining, and in recent years, researchers have gradually shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA tasks. However, the datasets currently used in the research are limited to individual elements of specific tasks, usually focusing on in-domain settings, ignoring implicit aspects and opinions, and with a small data scale. To address these issues, we propose a large-scale Multi-Element Multi-Domain dataset (MEMD) that covers the four elements across five domains, including nearly 20,000 review sentences and 30,000 quadruples annotated with explicit and implicit aspects and opinions for ABSA research. Meanwhile, we evaluate generative and non-generative baselines on multiple ABSA subtasks under the open domain setting, and the results show that open domain ABSA as well as mining implicit aspects and opinions remain ongoing challenges to be addressed. The datasets are publicly released at \url{https://github.com/NUSTM/MEMD-ABSA}

    CONVERT:Contrastive Graph Clustering with Reliable Augmentation

    Full text link
    Contrastive graph node clustering via learnable data augmentation is a hot research spot in the field of unsupervised graph learning. The existing methods learn the sampling distribution of a pre-defined augmentation to generate data-driven augmentations automatically. Although promising clustering performance has been achieved, we observe that these strategies still rely on pre-defined augmentations, the semantics of the augmented graph can easily drift. The reliability of the augmented view semantics for contrastive learning can not be guaranteed, thus limiting the model performance. To address these problems, we propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (COVERT). Specifically, in our method, the data augmentations are processed by the proposed reversible perturb-recover network. It distills reliable semantic information by recovering the perturbed latent embeddings. Moreover, to further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network via quantifying the perturbation and recovery. Lastly, a label-matching mechanism is designed to guide the model by clustering information through aligning the semantic labels and the selected high-confidence clustering pseudo labels. Extensive experimental results on seven datasets demonstrate the effectiveness of the proposed method. We release the code and appendix of CONVERT at https://github.com/xihongyang1999/CONVERT on GitHub

    A Survey of Deep Graph Clustering: Taxonomy, Challenge, and Application

    Full text link
    Graph clustering, which aims to divide the nodes in the graph into several distinct clusters, is a fundamental and challenging task. In recent years, deep graph clustering methods have been increasingly proposed and achieved promising performance. However, the corresponding survey paper is scarce and it is imminent to make a summary in this field. From this motivation, this paper makes the first comprehensive survey of deep graph clustering. Firstly, the detailed definition of deep graph clustering and the important baseline methods are introduced. Besides, the taxonomy of deep graph clustering methods is proposed based on four different criteria including graph type, network architecture, learning paradigm, and clustering method. In addition, through the careful analysis of the existing works, the challenges and opportunities from five perspectives are summarized. At last, the applications of deep graph clustering in four domains are presented. It is worth mentioning that a collection of state-of-the-art deep graph clustering methods including papers, codes, and datasets is available on GitHub. We hope this work will serve as a quick guide and help researchers to overcome challenges in this vibrant field.Comment: 13 pages, 13 figure
    • …
    corecore