58,777 research outputs found

    Sampling Online Social Networks via Heterogeneous Statistics

    Full text link
    Most sampling techniques for online social networks (OSNs) are based on a particular sampling method on a single graph, which is referred to as a statistics. However, various realizing methods on different graphs could possibly be used in the same OSN, and they may lead to different sampling efficiencies, i.e., asymptotic variances. To utilize multiple statistics for accurate measurements, we formulate a mixture sampling problem, through which we construct a mixture unbiased estimator which minimizes asymptotic variance. Given fixed sampling budgets for different statistics, we derive the optimal weights to combine the individual estimators; given fixed total budget, we show that a greedy allocation towards the most efficient statistics is optimal. In practice, the sampling efficiencies of statistics can be quite different for various targets and are unknown before sampling. To solve this problem, we design a two-stage framework which adaptively spends a partial budget to test different statistics and allocates the remaining budget to the inferred best statistics. We show that our two-stage framework is a generalization of 1) randomly choosing a statistics and 2) evenly allocating the total budget among all available statistics, and our adaptive algorithm achieves higher efficiency than these benchmark strategies in theory and experiment

    Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks

    Full text link
    Heterogeneous information networks (HINs) are ubiquitous in real-world applications. In the meantime, network embedding has emerged as a convenient tool to mine and learn from networked data. As a result, it is of interest to develop HIN embedding methods. However, the heterogeneity in HINs introduces not only rich information but also potentially incompatible semantics, which poses special challenges to embedding learning in HINs. With the intention to preserve the rich yet potentially incompatible information in HIN embedding, we propose to study the problem of comprehensive transcription of heterogeneous information networks. The comprehensive transcription of HINs also provides an easy-to-use approach to unleash the power of HINs, since it requires no additional supervision, expertise, or feature engineering. To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics. To corroborate the efficacy of HEER, we conducted experiments on two large-scale real-words datasets with an edge reconstruction task and multiple case studies. Experiment results demonstrate the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneous metrics. The code and data are available at https://github.com/GentleZhu/HEER.Comment: 10 pages. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, United Kingdom, ACM, 201

    Relation Structure-Aware Heterogeneous Information Network Embedding

    Full text link
    Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification
    • …
    corecore