78 research outputs found
Network embedding and its applications
Apart from the attached attributes of entities, the relationships among entities are also an important perspective that reveals the topological structure of entities in a complex system. A network (or graph) with nodes representing entities and links indicating relationships, has been widely used in sociology, biology, chemistry, medicine, the Internet, etc. However, traditional machine learning and data mining algorithms, designed for the entities with attributes (i.e., data points in a vector space), cannot effectively and/or efficiently utilize the topological information of a network formed by relationships among entities. To fill this gap, Network Embedding (NE) is proposed to embed a network into a low dimensional vector space while preserving some topologies and/or properties, so that the resulting embeddings can facilitate various downstream machine learning and data mining tasks.
Although there have been many successful NE methods, most of them are designed for embedding static plain networks. In fact, real-world networks often come with one or more additional properties such as node attributes and dynamic changes. The central research question of this thesis is "where and how can we apply NE for more realistic scenarios?". To this end, we propose three novel NE methods, each of which is for addressing the new challenges resulting from one type of more realistic networks. Besides, we also discuss the applications of NE with the focus to the drug-target interaction prediction problem.
To be more specific, first, we investigate how to embed the attributed network, which can better describe a real-world complex system by including node attributes to a network. Previous Attributed Network Embedding (ANE) methods cannot effectively embed attributed networks especially when networks become sparse, and/or are not scalable to large-scale networks. To deal with these challenges, we propose a scalable ANE method to effectively and robustly embed attributed networks with different sparsities. Second, we study how to embed the dynamic network, which is often the case in real-world scenarios as real-world complex systems often evolve over time. Most previous Dynamic Network Embedding (DNE) methods try to capture the topological changes at or around the most affected nodes and accordingly update node embeddings. Unfortunately, this kind of approximation, although can improve efficiency, cannot effectively preserve the global topology of a dynamic network at each timestep, due to not considering the inactive sub-networks that receive accumulated topological changes propagated via the high-order proximity. To tackle this challenge, we propose a DNE method for better global topology preservation. Third, comparing to static networks, dynamic networks have a unique character called the degree of changes, which can be used to describe a kind of dynamic character of an input dynamic network about its rate of streaming edges between consecutive snapshots. The degree of changes could be very different for different dynamic networks. However, it remains unknown if existing DNE methods can robustly obtain good effectiveness to different degrees of changes, in particular for corresponding dynamic networks generated from the same dataset by different slicing settings. To answer this open question, we test several state-of-the-art DNE methods, and then further propose a DNE method that can more robustly obtain good effectiveness to the dynamic networks with different degree of changes. Fourth, regarding a specific application of NE to a real-world problem, we propose a NE based Drug-Target Interaction (DTI) prediction method by additionally utilizing the two implicit networks which are extracted from a given DTI network but are ignored in previous DTI prediction methods. A case study indicates that the proposed method can predict novel DTIs
Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge Graphs
Node Importance Estimation (NIE) is a task of inferring importance scores of
the nodes in a graph. Due to the availability of richer data and knowledge,
recent research interests of NIE have been dedicating to knowledge graphs for
predicting future or missing node importance scores. Existing state-of-the-art
NIE methods train the model by available labels, and they consider every
interested node equally before training. However, the nodes with higher
importance often require or receive more attention in real-world scenarios,
e.g., people may care more about the movies or webpages with higher importance.
To this end, we introduce Label Informed ContrAstive Pretraining (LICAP) to the
NIE problem for being better aware of the nodes with high importance scores.
Specifically, LICAP is a novel type of contrastive learning framework that aims
to fully utilize the continuous labels to generate contrastive samples for
pretraining embeddings. Considering the NIE problem, LICAP adopts a novel
sampling strategy called top nodes preferred hierarchical sampling to first
group all interested nodes into a top bin and a non-top bin based on node
importance scores, and then divide the nodes within top bin into several finer
bins also based on the scores. The contrastive samples are generated from those
bins, and are then used to pretrain node embeddings of knowledge graphs via a
newly proposed Predicate-aware Graph Attention Networks (PreGAT), so as to
better separate the top nodes from non-top nodes, and distinguish the top nodes
within top bin by keeping the relative order among finer bins. Extensive
experiments demonstrate that the LICAP pretrained embeddings can further boost
the performance of existing NIE methods and achieve the new state-of-the-art
performance regarding both regression and ranking metrics. The source code for
reproducibility is available at https://github.com/zhangtia16/LICAPComment: Accepted by IEEE TNNL
Fossil Image Identification using Deep Learning Ensembles of Data Augmented Multiviews
Identification of fossil species is crucial to evolutionary studies. Recent
advances from deep learning have shown promising prospects in fossil image
identification. However, the quantity and quality of labeled fossil images are
often limited due to fossil preservation, conditioned sampling, and expensive
and inconsistent label annotation by domain experts, which pose great
challenges to the training of deep learning based image classification models.
To address these challenges, we follow the idea of the wisdom of crowds and
propose a novel multiview ensemble framework, which collects multiple views of
each fossil specimen image reflecting its different characteristics to train
multiple base deep learning models and then makes final decisions via soft
voting. We further develop OGS method that integrates original, gray, and
skeleton views under this framework to demonstrate the effectiveness.
Experimental results on the fusulinid fossil dataset over five deep learning
based milestone models show that OGS using three base models consistently
outperforms the baseline using a single base model, and the ablation study
verifies the usefulness of each selected view. Besides, OGS obtains the
superior or comparable performance compared to the method under well-known
bagging framework. Moreover, as the available training data decreases, the
proposed framework achieves more performance gains compared to the baseline.
Furthermore, a consistency test with two human experts shows that OGS obtains
the highest agreement with both the labels of dataset and the two experts.
Notably, this methodology is designed for general fossil identification and it
is expected to see applications on other fossil datasets. The results suggest
the potential application when the quantity and quality of labeled data are
particularly restricted, e.g., to identify rare fossil images.Comment: preprint submitted to Methods in Ecology and Evolutio
GloDyNE: Global Topology Preserving Dynamic Network Embedding
Learning low-dimensional topological representation of a network in dynamic
environments is attracting much attention due to the time-evolving nature of
many real-world networks. The main and common objective of Dynamic Network
Embedding (DNE) is to efficiently update node embeddings while preserving
network topology at each time step. The idea of most existing DNE methods is to
capture the topological changes at or around the most affected nodes (instead
of all nodes) and accordingly update node embeddings. Unfortunately, this kind
of approximation, although can improve efficiency, cannot effectively preserve
the global topology of a dynamic network at each time step, due to not
considering the inactive sub-networks that receive accumulated topological
changes propagated via the high-order proximity. To tackle this challenge, we
propose a novel node selecting strategy to diversely select the representative
nodes over a network, which is coordinated with a new incremental learning
paradigm of Skip-Gram based embedding approach. The extensive experiments show
GloDyNE, with a small fraction of nodes being selected, can already achieve the
superior or comparable performance w.r.t. the state-of-the-art DNE methods in
three typical downstream tasks. Particularly, GloDyNE significantly outperforms
other methods in the graph reconstruction task, which demonstrates its ability
of global topology preservation. The source code is available at
https://github.com/houchengbin/GloDyNEComment: Accepted by IEEE-TKDE 202
houchengbin/Fossil-ID-Multiview-Deep-Ensembles: accepted by "Methods in Ecology and Evolution"
This version of the source code can be used to reproduce the results in our paper (accepted by MEE journal). Please check the Supporting Information S2, which presents the best hyper-parameters used in the main experiments for reproducibility
houchengbin/Fossil-Image-Identification: accepted by MEE journal
This version of the source code can be used to reproduce the results in our paper (accepted by "Methods in Ecology and Evolution" journal). Please check the Supporting Information S2, which presents the best hyper-parameters used in the main experiments
- …