78 research outputs found

    Network embedding and its applications

    Get PDF
    Apart from the attached attributes of entities, the relationships among entities are also an important perspective that reveals the topological structure of entities in a complex system. A network (or graph) with nodes representing entities and links indicating relationships, has been widely used in sociology, biology, chemistry, medicine, the Internet, etc. However, traditional machine learning and data mining algorithms, designed for the entities with attributes (i.e., data points in a vector space), cannot effectively and/or efficiently utilize the topological information of a network formed by relationships among entities. To fill this gap, Network Embedding (NE) is proposed to embed a network into a low dimensional vector space while preserving some topologies and/or properties, so that the resulting embeddings can facilitate various downstream machine learning and data mining tasks. Although there have been many successful NE methods, most of them are designed for embedding static plain networks. In fact, real-world networks often come with one or more additional properties such as node attributes and dynamic changes. The central research question of this thesis is "where and how can we apply NE for more realistic scenarios?". To this end, we propose three novel NE methods, each of which is for addressing the new challenges resulting from one type of more realistic networks. Besides, we also discuss the applications of NE with the focus to the drug-target interaction prediction problem. To be more specific, first, we investigate how to embed the attributed network, which can better describe a real-world complex system by including node attributes to a network. Previous Attributed Network Embedding (ANE) methods cannot effectively embed attributed networks especially when networks become sparse, and/or are not scalable to large-scale networks. To deal with these challenges, we propose a scalable ANE method to effectively and robustly embed attributed networks with different sparsities. Second, we study how to embed the dynamic network, which is often the case in real-world scenarios as real-world complex systems often evolve over time. Most previous Dynamic Network Embedding (DNE) methods try to capture the topological changes at or around the most affected nodes and accordingly update node embeddings. Unfortunately, this kind of approximation, although can improve efficiency, cannot effectively preserve the global topology of a dynamic network at each timestep, due to not considering the inactive sub-networks that receive accumulated topological changes propagated via the high-order proximity. To tackle this challenge, we propose a DNE method for better global topology preservation. Third, comparing to static networks, dynamic networks have a unique character called the degree of changes, which can be used to describe a kind of dynamic character of an input dynamic network about its rate of streaming edges between consecutive snapshots. The degree of changes could be very different for different dynamic networks. However, it remains unknown if existing DNE methods can robustly obtain good effectiveness to different degrees of changes, in particular for corresponding dynamic networks generated from the same dataset by different slicing settings. To answer this open question, we test several state-of-the-art DNE methods, and then further propose a DNE method that can more robustly obtain good effectiveness to the dynamic networks with different degree of changes. Fourth, regarding a specific application of NE to a real-world problem, we propose a NE based Drug-Target Interaction (DTI) prediction method by additionally utilizing the two implicit networks which are extracted from a given DTI network but are ignored in previous DTI prediction methods. A case study indicates that the proposed method can predict novel DTIs

    Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge Graphs

    Full text link
    Node Importance Estimation (NIE) is a task of inferring importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicating to knowledge graphs for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model by available labels, and they consider every interested node equally before training. However, the nodes with higher importance often require or receive more attention in real-world scenarios, e.g., people may care more about the movies or webpages with higher importance. To this end, we introduce Label Informed ContrAstive Pretraining (LICAP) to the NIE problem for being better aware of the nodes with high importance scores. Specifically, LICAP is a novel type of contrastive learning framework that aims to fully utilize the continuous labels to generate contrastive samples for pretraining embeddings. Considering the NIE problem, LICAP adopts a novel sampling strategy called top nodes preferred hierarchical sampling to first group all interested nodes into a top bin and a non-top bin based on node importance scores, and then divide the nodes within top bin into several finer bins also based on the scores. The contrastive samples are generated from those bins, and are then used to pretrain node embeddings of knowledge graphs via a newly proposed Predicate-aware Graph Attention Networks (PreGAT), so as to better separate the top nodes from non-top nodes, and distinguish the top nodes within top bin by keeping the relative order among finer bins. Extensive experiments demonstrate that the LICAP pretrained embeddings can further boost the performance of existing NIE methods and achieve the new state-of-the-art performance regarding both regression and ranking metrics. The source code for reproducibility is available at https://github.com/zhangtia16/LICAPComment: Accepted by IEEE TNNL

    Fossil Image Identification using Deep Learning Ensembles of Data Augmented Multiviews

    Full text link
    Identification of fossil species is crucial to evolutionary studies. Recent advances from deep learning have shown promising prospects in fossil image identification. However, the quantity and quality of labeled fossil images are often limited due to fossil preservation, conditioned sampling, and expensive and inconsistent label annotation by domain experts, which pose great challenges to the training of deep learning based image classification models. To address these challenges, we follow the idea of the wisdom of crowds and propose a novel multiview ensemble framework, which collects multiple views of each fossil specimen image reflecting its different characteristics to train multiple base deep learning models and then makes final decisions via soft voting. We further develop OGS method that integrates original, gray, and skeleton views under this framework to demonstrate the effectiveness. Experimental results on the fusulinid fossil dataset over five deep learning based milestone models show that OGS using three base models consistently outperforms the baseline using a single base model, and the ablation study verifies the usefulness of each selected view. Besides, OGS obtains the superior or comparable performance compared to the method under well-known bagging framework. Moreover, as the available training data decreases, the proposed framework achieves more performance gains compared to the baseline. Furthermore, a consistency test with two human experts shows that OGS obtains the highest agreement with both the labels of dataset and the two experts. Notably, this methodology is designed for general fossil identification and it is expected to see applications on other fossil datasets. The results suggest the potential application when the quantity and quality of labeled data are particularly restricted, e.g., to identify rare fossil images.Comment: preprint submitted to Methods in Ecology and Evolutio

    GloDyNE: Global Topology Preserving Dynamic Network Embedding

    Get PDF
    Learning low-dimensional topological representation of a network in dynamic environments is attracting much attention due to the time-evolving nature of many real-world networks. The main and common objective of Dynamic Network Embedding (DNE) is to efficiently update node embeddings while preserving network topology at each time step. The idea of most existing DNE methods is to capture the topological changes at or around the most affected nodes (instead of all nodes) and accordingly update node embeddings. Unfortunately, this kind of approximation, although can improve efficiency, cannot effectively preserve the global topology of a dynamic network at each time step, due to not considering the inactive sub-networks that receive accumulated topological changes propagated via the high-order proximity. To tackle this challenge, we propose a novel node selecting strategy to diversely select the representative nodes over a network, which is coordinated with a new incremental learning paradigm of Skip-Gram based embedding approach. The extensive experiments show GloDyNE, with a small fraction of nodes being selected, can already achieve the superior or comparable performance w.r.t. the state-of-the-art DNE methods in three typical downstream tasks. Particularly, GloDyNE significantly outperforms other methods in the graph reconstruction task, which demonstrates its ability of global topology preservation. The source code is available at https://github.com/houchengbin/GloDyNEComment: Accepted by IEEE-TKDE 202

    houchengbin/Fossil-ID-Multiview-Deep-Ensembles: accepted by "Methods in Ecology and Evolution"

    No full text
    This version of the source code can be used to reproduce the results in our paper (accepted by MEE journal). Please check the Supporting Information S2, which presents the best hyper-parameters used in the main experiments for reproducibility

    houchengbin/Fossil-Image-Identification: accepted by MEE journal

    No full text
    This version of the source code can be used to reproduce the results in our paper (accepted by "Methods in Ecology and Evolution" journal). Please check the Supporting Information S2, which presents the best hyper-parameters used in the main experiments
    • …
    corecore