2,269 research outputs found

    A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

    Full text link
    Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximally preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over 150+ papers till year 2018. It provides systematic categorization of problems, techniques and applications. Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed for continuously improving this surve

    Learning Collective Behavior in Multi-relational Networks

    Get PDF
    With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex. The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons

    Attention-based Graph Neural Network for Semi-supervised Learning

    Full text link
    Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other

    COSINE: Compressive Network Embedding on Large-scale Information Networks

    Full text link
    There is recently a surge in approaches that learn low-dimensional embeddings of nodes in networks. As there are many large-scale real-world networks, it's inefficient for existing approaches to store amounts of parameters in memory and update them edge after edge. With the knowledge that nodes having similar neighborhood will be close to each other in embedding space, we propose COSINE (COmpresSIve NE) algorithm which reduces the memory footprint and accelerates the training process by parameters sharing among similar nodes. COSINE applies graph partitioning algorithms to networks and builds parameter sharing dependency of nodes based on the result of partitioning. With parameters sharing among similar nodes, COSINE injects prior knowledge about higher structural information into training process which makes network embedding more efficient and effective. COSINE can be applied to any embedding lookup method and learn high-quality embeddings with limited memory and shorter training time. We conduct experiments of multi-label classification and link prediction, where baselines and our model have the same memory usage. Experimental results show that COSINE gives baselines up to 23% increase on classification and up to 25% increase on link prediction. Moreover, time of all representation learning methods using COSINE decreases from 30% to 70%

    Link Prediction in Social Networks: the State-of-the-Art

    Full text link
    In social networks, link prediction predicts missing links in current networks and new or dissolution links in future networks, is important for mining and analyzing the evolution of social networks. In the past decade, many works have been done about the link prediction in social networks. The goal of this paper is to comprehensively review, analyze and discuss the state-of-the-art of the link prediction in social networks. A systematical category for link prediction techniques and problems is presented. Then link prediction techniques and problems are analyzed and discussed. Typical applications of link prediction are also addressed. Achievements and roadmaps of some active research groups are introduced. Finally, some future challenges of the link prediction in social networks are discussed.Comment: 38 pages, 13 figures, Science China: Information Science, 201

    A Survey on Embedding Dynamic Graphs

    Full text link
    Embedding static graphs in low-dimensional vector spaces plays a key role in network analytics and inference, supporting applications like node classification, link prediction, and graph visualization. However, many real-world networks present dynamic behavior, including topological evolution, feature evolution, and diffusion. Therefore, several methods for embedding dynamic graphs have been proposed to learn network representations over time, facing novel challenges, such as time-domain modeling, temporal features to be captured, and the temporal granularity to be embedded. In this survey, we overview dynamic graph embedding, discussing its fundamentals and the recent advances developed so far. We introduce the formal definition of dynamic graph embedding, focusing on the problem setting and introducing a novel taxonomy for dynamic graph embedding input and output. We further explore different dynamic behaviors that may be encompassed by embeddings, classifying by topological evolution, feature evolution, and processes on networks. Afterward, we describe existing techniques and propose a taxonomy for dynamic graph embedding techniques based on algorithmic approaches, from matrix and tensor factorization to deep learning, random walks, and temporal point processes. We also elucidate main applications, including dynamic link prediction, anomaly detection, and diffusion prediction, and we further state some promising research directions in the area.Comment: 41 pages, 10 figure

    Semi-Supervised Learning on Graphs Based on Local Label Distributions

    Full text link
    Most approaches that tackle the problem of node classification consider nodes to be similar, if they have shared neighbors or are close to each other in the graph. Recent methods for attributed graphs additionally take attributes of neighboring nodes into account. We argue that the class labels of the neighbors bear important information and considering them helps to improve classification quality. Two nodes which are similar based on class labels in their neighborhood do not need to be close-by in the graph and may even belong to different connected components. In this work, we propose a novel approach for the semi-supervised node classification. Precisely, we propose a new node embedding which is based on the class labels in the local neighborhood of a node. We show that this is a different setting from attribute-based embeddings and thus, we propose a new method to learn label-based node embeddings which can mirror a variety of relations between the class labels of neighboring nodes. Our experimental evaluation demonstrates that our new methods can significantly improve the prediction quality on real world data sets

    Network Representation Learning: From Traditional Feature Learning to Deep Learning

    Get PDF
    Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data features. However, it is non-trivial to generalize deep learning to graph-structured data since it is different from the regular data such as pictures having spatial information and sounds having temporal information. Recently, researchers proposed many deep learning-based methods in the area of NRL. In this survey, we investigate classical NRL from traditional feature learning method to the deep learning-based model, analyze relationships between them, and summarize the latest progress. Finally, we discuss open issues considering NRL and point out the future directions in this field

    Attentional Heterogeneous Graph Neural Network: Application to Program Reidentification

    Full text link
    Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techniques often overlook this critical program reidentification problem (i.e., checking the program's identity). In this paper, we propose an attentional heterogeneous graph neural network model (DeepHGNN) to verify the program's identity based on its system behaviors. The key idea is to leverage the representation learning of the heterogeneous program behavior graph to guide the reidentification process. We formulate the program reidentification as a graph classification problem and develop an effective attentional heterogeneous graph embedding algorithm to solve it. Extensive experiments --- using real-world enterprise monitoring data and real attacks --- demonstrate the effectiveness of DeepHGNN across multiple popular metrics and the robustness to the normal dynamic changes like program version upgrades

    Anomaly Detection and Correction in Large Labeled Bipartite Graphs

    Full text link
    Binary classification problems can be naturally modeled as bipartite graphs, where we attempt to classify right nodes based on their left adjacencies. We consider the case of labeled bipartite graphs in which some labels and edges are not trustworthy. Our goal is to reduce noise by identifying and fixing these labels and edges. We first propose a geometric technique for generating random graph instances with untrustworthy labels and analyze the resulting graph properties. We focus on generating graphs which reflect real-world data, where degree and label frequencies follow power law distributions. We review several algorithms for the problem of detection and correction, proposing novel extensions and making observations specific to the bipartite case. These algorithms range from math programming algorithms to discrete combinatorial algorithms to Bayesian approximation algorithms to machine learning algorithms. We compare the performance of all these algorithms using several metrics and, based on our observations, identify the relative strengths and weaknesses of each individual algorithm.Comment: 36 pages, 4 figure
    • …
    corecore