6,970 research outputs found
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
Heterogeneous information networks (HINs) are ubiquitous in real-world
applications. In the meantime, network embedding has emerged as a convenient
tool to mine and learn from networked data. As a result, it is of interest to
develop HIN embedding methods. However, the heterogeneity in HINs introduces
not only rich information but also potentially incompatible semantics, which
poses special challenges to embedding learning in HINs. With the intention to
preserve the rich yet potentially incompatible information in HIN embedding, we
propose to study the problem of comprehensive transcription of heterogeneous
information networks. The comprehensive transcription of HINs also provides an
easy-to-use approach to unleash the power of HINs, since it requires no
additional supervision, expertise, or feature engineering. To cope with the
challenges in the comprehensive transcription of HINs, we propose the HEER
algorithm, which embeds HINs via edge representations that are further coupled
with properly-learned heterogeneous metrics. To corroborate the efficacy of
HEER, we conducted experiments on two large-scale real-words datasets with an
edge reconstruction task and multiple case studies. Experiment results
demonstrate the effectiveness of the proposed HEER model and the utility of
edge representations and heterogeneous metrics. The code and data are available
at https://github.com/GentleZhu/HEER.Comment: 10 pages. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, London, United Kingdom,
ACM, 201
Relation Structure-Aware Heterogeneous Information Network Embedding
Heterogeneous information network (HIN) embedding aims to embed multiple
types of nodes into a low-dimensional space. Although most existing HIN
embedding methods consider heterogeneous relations in HINs, they usually employ
one single model for all relations without distinction, which inevitably
restricts the capability of network embedding. In this paper, we take the
structural characteristics of heterogeneous relations into consideration and
propose a novel Relation structure-aware Heterogeneous Information Network
Embedding model (RHINE). By exploring the real-world networks with thorough
mathematical analysis, we present two structure-related measures which can
consistently distinguish heterogeneous relations into two categories:
Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the
distinctive characteristics of relations, in our RHINE, we propose different
models specifically tailored to handle ARs and IRs, which can better capture
the structures and semantics of the networks. At last, we combine and optimize
these models in a unified and elegant manner. Extensive experiments on three
real-world datasets demonstrate that our model significantly outperforms the
state-of-the-art methods in various tasks, including node clustering, link
prediction, and node classification
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
- …