2,898 research outputs found

    Representation Learning for Attributed Multiplex Heterogeneous Network

    Full text link
    Network embedding (or graph embedding) has been widely used in many real-world applications. However, existing methods mainly focus on networks with single-typed nodes/edges and cannot scale well to handle large networks. Many real-world networks consist of billions of nodes and edges of multiple types, and each node is associated with different attributes. In this paper, we formalize the problem of embedding learning for the Attributed Multiplex Heterogeneous Network and propose a unified framework to address this problem. The framework supports both transductive and inductive learning. We also give the theoretical analysis of the proposed framework, showing its connection with previous works and proving its better expressiveness. We conduct systematical evaluations for the proposed framework on four different genres of challenging datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results demonstrate that with the learned embeddings from the proposed framework, we can achieve statistically significant improvements (e.g., 5.99-28.23% lift by F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link prediction. The framework has also been successfully deployed on the recommendation system of a worldwide leading e-commerce company, Alibaba Group. Results of the offline A/B tests on product recommendation further confirm the effectiveness and efficiency of the framework in practice.Comment: Accepted to KDD 2019. Website: https://sites.google.com/view/gatn

    Unsupervised Structural Embedding Methods for Efficient Collective Network Mining

    Full text link
    How can we align accounts of the same user across social networks? Can we identify the professional role of an email user from their patterns of communication? Can we predict the medical effects of chemical compounds from their atomic network structure? Many problems in graph data mining, including all of the above, are defined on multiple networks. The central element to all of these problems is cross-network comparison, whether at the level of individual nodes or entities in the network or at the level of entire networks themselves. To perform this comparison meaningfully, we must describe the entities in each network expressively in terms of patterns that generalize across the networks. Moreover, because the networks in question are often very large, our techniques must be computationally efficient. In this thesis, we propose scalable unsupervised methods that embed nodes in vector space by mapping nodes with similar structural roles in their respective networks, even if they come from different networks, to similar parts of the embedding space. We perform network alignment by matching nodes across two or more networks based on the similarity of their embeddings, and refine this process by reinforcing the consistency of each node’s alignment with those of its neighbors. By characterizing the distribution of node embeddings in a graph, we develop graph-level feature vectors that are highly effective for graph classification. With principled sparsification and randomized approximation techniques, we make all our methods computationally efficient and able to scale to graphs with millions of nodes or edges. We demonstrate the effectiveness of structural node embeddings on industry-scale applications, and propose an extensive set of embedding evaluation techniques that lay the groundwork for further methodological development and application.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162895/1/mheimann_1.pd

    Joint Use of Node Attributes and Proximity for Node Classification

    Get PDF
    Node classification aims to infer unknown node labels from known labels and other node attributes. Standard approaches for this task assume homophily, whereby a node’s label is predicted from the labels of other nodes nearby in the network. However, there are also cases of networks where labels are better predicted from the individual attributes of each node rather than the labels of nearby nodes. Ideally, node classification methods should flexibly adapt to a range of settings wherein unknown labels are predicted either from labels of nearby nodes, or individual node attributes, or partly both. In this paper, we propose a principled approach, JANE, based on a generative probabilistic model that jointly weighs the role of attributes and node proximity via embeddings in predicting labels. Experiments on multiple network datasets demonstrate that JANE exhibits the desired combination of versatility and competitive performance compared to baselines.Peer reviewe

    Scalably Using Node Attributes and Graph Structure for Node Classification

    Get PDF
    The task of node classification concerns a network where nodes are associated with labels, but labels are known only for some of the nodes. The task consists of inferring the unknown labels given the known node labels, the structure of the network, and other known node attributes. Common node classification approaches are based on the assumption that adjacent nodes have similar attributes and, therefore, that a node’s label can be predicted from the labels of its neighbors. While such an assumption is often valid (e.g., for political affiliation in social networks), it may not hold in some cases. In fact, nodes that share the same label may be adjacent but differ in their attributes, or may not be adjacent but have similar attributes. In this work, we present JANE (Jointly using Attributes and Node Embeddings), a novel and principled approach to node classification that flexibly adapts to a range of settings wherein unknown labels may be predicted from known labels of adjacent nodes in the network, other node attributes, or both. Our experiments on synthetic data highlight the limitations of benchmark algorithms and the versatility of JANE. Further, our experiments on seven real datasets of sizes ranging from 2.5K to 1.5M nodes and edge homophily ranging from 0.86 to 0.29 show that JANE scales well to large networks while also demonstrating an up to 20% improvement in accuracy compared to strong baseline algorithms
    • …
    corecore