58,777 research outputs found
Sampling Online Social Networks via Heterogeneous Statistics
Most sampling techniques for online social networks (OSNs) are based on a
particular sampling method on a single graph, which is referred to as a
statistics. However, various realizing methods on different graphs could
possibly be used in the same OSN, and they may lead to different sampling
efficiencies, i.e., asymptotic variances. To utilize multiple statistics for
accurate measurements, we formulate a mixture sampling problem, through which
we construct a mixture unbiased estimator which minimizes asymptotic variance.
Given fixed sampling budgets for different statistics, we derive the optimal
weights to combine the individual estimators; given fixed total budget, we show
that a greedy allocation towards the most efficient statistics is optimal. In
practice, the sampling efficiencies of statistics can be quite different for
various targets and are unknown before sampling. To solve this problem, we
design a two-stage framework which adaptively spends a partial budget to test
different statistics and allocates the remaining budget to the inferred best
statistics. We show that our two-stage framework is a generalization of 1)
randomly choosing a statistics and 2) evenly allocating the total budget among
all available statistics, and our adaptive algorithm achieves higher efficiency
than these benchmark strategies in theory and experiment
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
Heterogeneous information networks (HINs) are ubiquitous in real-world
applications. In the meantime, network embedding has emerged as a convenient
tool to mine and learn from networked data. As a result, it is of interest to
develop HIN embedding methods. However, the heterogeneity in HINs introduces
not only rich information but also potentially incompatible semantics, which
poses special challenges to embedding learning in HINs. With the intention to
preserve the rich yet potentially incompatible information in HIN embedding, we
propose to study the problem of comprehensive transcription of heterogeneous
information networks. The comprehensive transcription of HINs also provides an
easy-to-use approach to unleash the power of HINs, since it requires no
additional supervision, expertise, or feature engineering. To cope with the
challenges in the comprehensive transcription of HINs, we propose the HEER
algorithm, which embeds HINs via edge representations that are further coupled
with properly-learned heterogeneous metrics. To corroborate the efficacy of
HEER, we conducted experiments on two large-scale real-words datasets with an
edge reconstruction task and multiple case studies. Experiment results
demonstrate the effectiveness of the proposed HEER model and the utility of
edge representations and heterogeneous metrics. The code and data are available
at https://github.com/GentleZhu/HEER.Comment: 10 pages. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, London, United Kingdom,
ACM, 201
Relation Structure-Aware Heterogeneous Information Network Embedding
Heterogeneous information network (HIN) embedding aims to embed multiple
types of nodes into a low-dimensional space. Although most existing HIN
embedding methods consider heterogeneous relations in HINs, they usually employ
one single model for all relations without distinction, which inevitably
restricts the capability of network embedding. In this paper, we take the
structural characteristics of heterogeneous relations into consideration and
propose a novel Relation structure-aware Heterogeneous Information Network
Embedding model (RHINE). By exploring the real-world networks with thorough
mathematical analysis, we present two structure-related measures which can
consistently distinguish heterogeneous relations into two categories:
Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the
distinctive characteristics of relations, in our RHINE, we propose different
models specifically tailored to handle ARs and IRs, which can better capture
the structures and semantics of the networks. At last, we combine and optimize
these models in a unified and elegant manner. Extensive experiments on three
real-world datasets demonstrate that our model significantly outperforms the
state-of-the-art methods in various tasks, including node clustering, link
prediction, and node classification
- …