38,656 research outputs found
Understanding Coarsening for Embedding Large-Scale Graphs
A significant portion of the data today, e.g, social networks, web
connections, etc., can be modeled by graphs. A proper analysis of graphs with
Machine Learning (ML) algorithms has the potential to yield far-reaching
insights into many areas of research and industry. However, the irregular
structure of graph data constitutes an obstacle for running ML tasks on graphs
such as link prediction, node classification, and anomaly detection. Graph
embedding is a compute-intensive process of representing graphs as a set of
vectors in a d-dimensional space, which in turn makes it amenable to ML tasks.
Many approaches have been proposed in the literature to improve the performance
of graph embedding, e.g., using distributed algorithms, accelerators, and
pre-processing techniques. Graph coarsening, which can be considered a
pre-processing step, is a structural approximation of a given, large graph with
a smaller one. As the literature suggests, the cost of embedding significantly
decreases when coarsening is employed. In this work, we thoroughly analyze the
impact of the coarsening quality on the embedding performance both in terms of
speed and accuracy. Our experiments with a state-of-the-art, fast graph
embedding tool show that there is an interplay between the coarsening decisions
taken and the embedding quality.Comment: 10 pages, 6 figures, submitted to 2020 IEEE International Conference
on Big Dat
Representation Learning for Attributed Multiplex Heterogeneous Network
Network embedding (or graph embedding) has been widely used in many
real-world applications. However, existing methods mainly focus on networks
with single-typed nodes/edges and cannot scale well to handle large networks.
Many real-world networks consist of billions of nodes and edges of multiple
types, and each node is associated with different attributes. In this paper, we
formalize the problem of embedding learning for the Attributed Multiplex
Heterogeneous Network and propose a unified framework to address this problem.
The framework supports both transductive and inductive learning. We also give
the theoretical analysis of the proposed framework, showing its connection with
previous works and proving its better expressiveness. We conduct systematical
evaluations for the proposed framework on four different genres of challenging
datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results
demonstrate that with the learned embeddings from the proposed framework, we
can achieve statistically significant improvements (e.g., 5.99-28.23% lift by
F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link
prediction. The framework has also been successfully deployed on the
recommendation system of a worldwide leading e-commerce company, Alibaba Group.
Results of the offline A/B tests on product recommendation further confirm the
effectiveness and efficiency of the framework in practice.Comment: Accepted to KDD 2019. Website: https://sites.google.com/view/gatn
- …