63 research outputs found

    Graph Learning and Its Applications: A Holistic Survey

    Full text link
    Graph learning is a prevalent domain that endeavors to learn the intricate relationships among nodes and the topological structure of graphs. These relationships endow graphs with uniqueness compared to conventional tabular data, as nodes rely on non-Euclidean space and encompass rich information to exploit. Over the years, graph learning has transcended from graph theory to graph data mining. With the advent of representation learning, it has attained remarkable performance in diverse scenarios, including text, image, chemistry, and biology. Owing to its extensive application prospects, graph learning attracts copious attention from the academic community. Despite numerous works proposed to tackle different problems in graph learning, there is a demand to survey previous valuable works. While some researchers have perceived this phenomenon and accomplished impressive surveys on graph learning, they failed to connect related objectives, methods, and applications in a more coherent way. As a result, they did not encompass current ample scenarios and challenging problems due to the rapid expansion of graph learning. Different from previous surveys on graph learning, we provide a holistic review that analyzes current works from the perspective of graph structure, and discusses the latest applications, trends, and challenges in graph learning. Specifically, we commence by proposing a taxonomy from the perspective of the composition of graph data and then summarize the methods employed in graph learning. We then provide a detailed elucidation of mainstream applications. Finally, based on the current trend of techniques, we propose future directions.Comment: 20 pages, 7 figures, 3 table

    DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks

    Full text link
    Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.Comment: Source code available for Apache Giraph on Hadoo

    The Evaluation of DyHATR Performance for Dynamic Heterogeneous Graphs

    Get PDF
    Dynamic heterogeneous graphs can represent real-world networks. Predicting links in these graphs is more complicated than in static graphs. Until now, research interest of link prediction has focused on static heterogeneous graphs or dynamically homogeneous graphs. A link prediction technique combining temporal RNN and hierarchical attention has recently emerged, called DyHATR. This method is claimed to be able to work on dynamic heterogeneous graphs by testing them on four publicly available data sets (Twitter, Math-Overflow, Ecomm, and Alibaba). However, after further analysis, it turned out that the four data sets did not meet the criteria of dynamic heterogeneous graphs. In the present work, we evaluated the performance of DyHATR on dynamic heterogeneous graphs. We conducted experiments with DyHATR based on the Yelp data set represented as a dynamic heterogeneous graph consisting of homogeneous subgraphs. The results show that DyHATR can be applied to identify link prediction on dynamic heterogeneous graphs by simultaneously capturing heterogeneous information and evolutionary patterns, and then considering them to carry out link predicition. Compared to the baseline method, the accuracy achieved by DyHATR is competitive, although the results can still be improved

    The Evaluation of DyHATR Performance for Dynamic Heterogeneous Graphs

    Get PDF
    Dynamic heterogeneous graphs can represent real-world networks. Predicting links in these graphs is more complicated than in static graphs. Until now, research interest of link prediction has focused on static heterogeneous graphs or dynamically homogeneous graphs. A link prediction technique combining temporal RNN and hierarchical attention has recently emerged, called DyHATR. This method is claimed to be able to work on dynamic heterogeneous graphs by testing them on four publicly available data sets (Twitter, Math-Overflow, Ecomm, and Alibaba). However, after further analysis, it turned out that the four data sets did not meet the criteria of dynamic heterogeneous graphs. In the present work, we evaluated the performance of DyHATR on dynamic heterogeneous graphs. We conducted experiments with DyHATR based on the Yelp data set represented as a dynamic heterogeneous graph consisting of homogeneous subgraphs. The results show that DyHATR can be applied to identify link prediction on dynamic heterogeneous graphs by simultaneously capturing heterogeneous information and evolutionary patterns, and then considering them to carry out link predicition. Compared to the baseline method, the accuracy achieved by DyHATR is competitive, although the results can still be improved

    You Only Transfer What You Share: Intersection-Induced Graph Transfer Learning for Link Prediction

    Full text link
    Link prediction is central to many real-world applications, but its performance may be hampered when the graph of interest is sparse. To alleviate issues caused by sparsity, we investigate a previously overlooked phenomenon: in many cases, a densely connected, complementary graph can be found for the original graph. The denser graph may share nodes with the original graph, which offers a natural bridge for transferring selective, meaningful knowledge. We identify this setting as Graph Intersection-induced Transfer Learning (GITL), which is motivated by practical applications in e-commerce or academic co-authorship predictions. We develop a framework to effectively leverage the structural prior in this setting. We first create an intersection subgraph using the shared nodes between the two graphs, then transfer knowledge from the source-enriched intersection subgraph to the full target graph. In the second step, we consider two approaches: a modified label propagation, and a multi-layer perceptron (MLP) model in a teacher-student regime. Experimental results on proprietary e-commerce datasets and open-source citation graphs show that the proposed workflow outperforms existing transfer learning baselines that do not explicitly utilize the intersection structure.Comment: Accepted in TMLR (https://openreview.net/forum?id=Nn71AdKyYH

    A Survey of Imbalanced Learning on Graphs: Problems, Techniques, and Future Directions

    Full text link
    Graphs represent interconnected structures prevalent in a myriad of real-world scenarios. Effective graph analytics, such as graph learning methods, enables users to gain profound insights from graph data, underpinning various tasks including node classification and link prediction. However, these methods often suffer from data imbalance, a common issue in graph data where certain segments possess abundant data while others are scarce, thereby leading to biased learning outcomes. This necessitates the emerging field of imbalanced learning on graphs, which aims to correct these data distribution skews for more accurate and representative learning outcomes. In this survey, we embark on a comprehensive review of the literature on imbalanced learning on graphs. We begin by providing a definitive understanding of the concept and related terminologies, establishing a strong foundational understanding for readers. Following this, we propose two comprehensive taxonomies: (1) the problem taxonomy, which describes the forms of imbalance we consider, the associated tasks, and potential solutions; (2) the technique taxonomy, which details key strategies for addressing these imbalances, and aids readers in their method selection process. Finally, we suggest prospective future directions for both problems and techniques within the sphere of imbalanced learning on graphs, fostering further innovation in this critical area.Comment: The collection of awesome literature on imbalanced learning on graphs: https://github.com/Xtra-Computing/Awesome-Literature-ILoG

    Screening the stones of Venice: Mapping social perceptions of cultural significance through graph-based semi-supervised classification

    Get PDF
    Mapping cultural significance of heritage properties in urban environment from the perspective of the public has become an increasingly relevant process, as highlighted by the 2011 UNESCO Recommendation on the Historic Urban Landscape (HUL). With the ubiquitous use of social media and the prosperous developments in machine and deep learning, it has become feasible to collect and process massive amounts of information produced by online communities about their perceptions of heritage as social constructs. Moreover, such information is usually inter-connected and embedded within specific socioeconomic and spatiotemporal contexts. This paper presents a methodological workflow for using semi-supervised learning with graph neural networks (GNN) to classify, summarize, and map cultural significance categories based on user-generated content on social media. Several GNN models were trained as an ensemble to incorporate the multi-modal (visual and textual) features and the contextual (temporal, spatial, and social) connections of social media data in an attributed multi-graph structure. The classification results with different models were aligned and evaluated with the prediction confidence and agreement. Furthermore, message diffusion methods on graphs were proposed to aggregate the post labels onto their adjacent spatial nodes, which helps to map the cultural significance categories in their geographical contexts. The workflow is tested on data gathered from Venice as a case study, demonstrating the generation of social perception maps for this UNESCO World Heritage property. This research framework could also be applied in other cities worldwide, contributing to more socially inclusive heritage management processes. Furthermore, the proposed methodology holds the potential of diffusing any human-generated location-based information onto spatial networks and temporal timelines, which could be beneficial for measuring the safety, vitality, and/or popularity of urban spaces

    Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

    Get PDF
    [...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...
    corecore