5,505 research outputs found

    node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching

    Full text link
    Identity stitching, the task of identifying and matching various online references (e.g., sessions over different devices and timespans) to the same user in real-world web services, is crucial for personalization and recommendations. However, traditional user stitching approaches, such as grouping or blocking, require quadratic pairwise comparisons between a massive number of user activities, thus posing both computational and storage challenges. Recent works, which are often application-specific, heuristically seek to reduce the amount of comparisons, but they suffer from low precision and recall. To solve the problem in an application-independent way, we take a heterogeneous network-based approach in which users (nodes) interact with content (e.g., sessions, websites), and may have attributes (e.g., location). We propose node2bits, an efficient framework that represents multi-dimensional features of node contexts with binary hashcodes. node2bits leverages feature-based temporal walks to encapsulate short- and long-term interactions between nodes in heterogeneous web networks, and adopts SimHash to obtain compact, binary representations and avoid the quadratic complexity for similarity search. Extensive experiments on large-scale real networks show that node2bits outperforms traditional techniques and existing works that generate real-valued embeddings by up to 5.16% in F1 score on user stitching, while taking only up to 1.56% as much storage

    Dynamic Node Embeddings from Edge Streams

    Full text link
    Networks evolve continuously over time with the addition, deletion, and changing of links and nodes. Such temporal networks (or edge streams) consist of a sequence of timestamped edges and are seemingly ubiquitous. Despite the importance of accurately modeling the temporal information, most embedding methods ignore it entirely or approximate the temporal network using a sequence of static snapshot graphs. In this work, we propose using the notion of temporal walks for learning dynamic embeddings from temporal networks. Temporal walks capture the temporally valid interactions (e.g., flow of information, spread of disease) in the dynamic network in a lossless fashion. Based on the notion of temporal walks, we describe a general class of embeddings called continuous-time dynamic network embeddings (CTDNEs) that completely avoid the issues and problems that arise when approximating the temporal network as a sequence of static snapshot graphs. Unlike previous work, CTDNEs learn dynamic node embeddings directly from the temporal network at the finest temporal granularity and thus use only temporally valid information. As such CTDNEs naturally support online learning of the node embeddings in a streaming real-time fashion. Finally, the experiments demonstrate the effectiveness of this class of embedding methods that leverage temporal walks as it achieves an average gain in AUC of 11.9% across all methods and graphs.Comment: IEEE Transactions on Emerging Topics in Computational Intelligence (TETIC

    Tree Structure-Aware Graph Representation Learning via Integrated Hierarchical Aggregation and Relational Metric Learning

    Full text link
    While Graph Neural Network (GNN) has shown superiority in learning node representations of homogeneous graphs, leveraging GNN on heterogeneous graphs remains a challenging problem. The dominating reason is that GNN learns node representations by aggregating neighbors' information regardless of node types. Some work is proposed to alleviate such issue by exploiting relations or meta-path to sample neighbors with distinct categories, then use attention mechanism to learn different importance for different categories. However, one limitation is that the learned representations for different types of nodes should own different feature spaces, while all the above work still project node representations into one feature space. Moreover, after exploring massive heterogeneous graphs, we identify a fact that multiple nodes with the same type always connect to a node with another type, which reveals the many-to-one schema, a.k.a. the hierarchical tree structure. But all the above work cannot preserve such tree structure, since the exact multi-hop path correlation from neighbors to the target node would be erased through aggregation. Therefore, to overcome the limitations of the literature, we propose T-GNN, a tree structure-aware graph neural network model for graph representation learning. Specifically, the proposed T-GNN consists of two modules: (1) the integrated hierarchical aggregation module and (2) the relational metric learning module. The integrated hierarchical aggregation module aims to preserve the tree structure by combining GNN with Gated Recurrent Unit to integrate the hierarchical and sequential neighborhood information on the tree structure to node representations. The relational metric learning module aims to preserve the heterogeneity by embedding each type of nodes into a type-specific space with distinct distribution based on similarity metrics.Comment: accepted by ICDM 2020 as regular pape

    Heterogeneous Graph Attention Network

    Full text link
    Graph neural network, as a powerful graph representation technique based on deep learning, has shown superior performance and attracted considerable research interest. However, it has not been fully considered in graph neural network for heterogeneous graph which contains different types of nodes and links. The heterogeneity and rich semantic information bring great challenges for designing a graph neural network for heterogeneous graph. Recently, one of the most exciting advancements in deep learning is the attention mechanism, whose great potential has been well demonstrated in various areas. In this paper, we first propose a novel heterogeneous graph neural network based on the hierarchical attention, including node-level and semantic-level attentions. Specifically, the node-level attention aims to learn the importance between a node and its metapath based neighbors, while the semantic-level attention is able to learn the importance of different meta-paths. With the learned importance from both node-level and semantic-level attention, the importance of node and meta-path can be fully considered. Then the proposed model can generate node embedding by aggregating features from meta-path based neighbors in a hierarchical manner. Extensive experimental results on three real-world heterogeneous graphs not only show the superior performance of our proposed model over the state-of-the-arts, but also demonstrate its potentially good interpretability for graph analysis.Comment: 10 page

    SaC2Vec: Information Network Representation with Structure and Content

    Full text link
    Network representation learning (also known as information network embedding) has been the central piece of research in social and information network analysis for the last couple of years. An information network can be viewed as a linked structure of a set of entities. A set of linked web pages and documents, a set of users in a social network are common examples of information network. Network embedding learns low dimensional representations of the nodes, which can further be used for downstream network mining applications such as community detection or node clustering. Information network representation techniques traditionally use only the link structure of the network. But in real world networks, nodes come with additional content such as textual descriptions or associated images. This content is semantically correlated with the network structure and hence using the content along with the topological structure of the network can facilitate the overall network representation. In this paper, we propose Sac2Vec, a network representation technique that exploits both the structure and content. We convert the network into a multi-layered graph and use random walk and language modeling technique to generate the embedding of the nodes. Our approach is simple and computationally fast, yet able to use the content as a complement to structure and vice-versa. We also generalize the approach for networks having multiple types of content in each node. Experimental evaluations on four real world publicly available datasets show the merit of our approach compared to state-of-the-art algorithms in the domain.Comment: 10 Pages, Submitted to a conference for publicatio

    A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy

    Full text link
    We study a variant of domain adaptation for named-entity recognition where multiple, heterogeneously tagged training sets are available. Furthermore, the test tag-set is not identical to any individual training tag-set. Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of training tags. This setting occurs when various datasets are created using different annotation schemes. This is also the case of extending a tag-set with a new tag by annotating only the new tag in a new dataset. We propose to use the given tag hierarchy to jointly learn a neural network that shares its tagging layer among all tag-sets. We compare this model to combining independent models and to a model based on the multitasking approach. Our experiments show the benefit of the tag-hierarchy model, especially when facing non-trivial consolidation of tag-sets.Comment: Accepted at ACL 201

    Semi-Supervised Learning on Graphs Based on Local Label Distributions

    Full text link
    Most approaches that tackle the problem of node classification consider nodes to be similar, if they have shared neighbors or are close to each other in the graph. Recent methods for attributed graphs additionally take attributes of neighboring nodes into account. We argue that the class labels of the neighbors bear important information and considering them helps to improve classification quality. Two nodes which are similar based on class labels in their neighborhood do not need to be close-by in the graph and may even belong to different connected components. In this work, we propose a novel approach for the semi-supervised node classification. Precisely, we propose a new node embedding which is based on the class labels in the local neighborhood of a node. We show that this is a different setting from attribute-based embeddings and thus, we propose a new method to learn label-based node embeddings which can mirror a variety of relations between the class labels of neighboring nodes. Our experimental evaluation demonstrates that our new methods can significantly improve the prediction quality on real world data sets

    Deep Learning on Graphs: A Survey

    Full text link
    Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages, 11 figure

    A Comprehensive Survey on Graph Neural Networks

    Full text link
    Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.Comment: Minor revision (updated tables and references

    A Survey on Dynamic Network Embedding

    Full text link
    Real-world networks are composed of diverse interacting and evolving entities, while most of existing researches simply characterize them as particular static networks, without consideration of the evolution trend in dynamic networks. Recently, significant progresses in tracking the properties of dynamic networks have been made, which exploit changes of entities and links in the network to devise network embedding techniques. Compared to widely proposed static network embedding methods, dynamic network embedding endeavors to encode nodes as low-dimensional dense representations that effectively preserve the network structures and the temporal dynamics, which is beneficial to multifarious downstream machine learning tasks. In this paper, we conduct a systematical survey on dynamic network embedding. In specific, basic concepts of dynamic network embedding are described, notably, we propose a novel taxonomy of existing dynamic network embedding techniques for the first time, including matrix factorization based, Skip-Gram based, autoencoder based, neural networks based and other embedding methods. Additionally, we carefully summarize the commonly used datasets and a wide variety of subsequent tasks that dynamic network embedding can benefit. Afterwards and primarily, we suggest several challenges that the existing algorithms faced and outline possible directions to facilitate the future research, such as dynamic embedding models, large-scale dynamic networks, heterogeneous dynamic networks, dynamic attributed networks, task-oriented dynamic network embedding and more embedding spaces.Comment: 25 page
    corecore