182,399 research outputs found
Sampling Online Social Networks via Heterogeneous Statistics
Most sampling techniques for online social networks (OSNs) are based on a
particular sampling method on a single graph, which is referred to as a
statistics. However, various realizing methods on different graphs could
possibly be used in the same OSN, and they may lead to different sampling
efficiencies, i.e., asymptotic variances. To utilize multiple statistics for
accurate measurements, we formulate a mixture sampling problem, through which
we construct a mixture unbiased estimator which minimizes asymptotic variance.
Given fixed sampling budgets for different statistics, we derive the optimal
weights to combine the individual estimators; given fixed total budget, we show
that a greedy allocation towards the most efficient statistics is optimal. In
practice, the sampling efficiencies of statistics can be quite different for
various targets and are unknown before sampling. To solve this problem, we
design a two-stage framework which adaptively spends a partial budget to test
different statistics and allocates the remaining budget to the inferred best
statistics. We show that our two-stage framework is a generalization of 1)
randomly choosing a statistics and 2) evenly allocating the total budget among
all available statistics, and our adaptive algorithm achieves higher efficiency
than these benchmark strategies in theory and experiment
Weighted Random Walk Sampling for Multi-Relational Recommendation
In the information overloaded web, personalized recommender systems are
essential tools to help users find most relevant information. The most
heavily-used recommendation frameworks assume user interactions that are
characterized by a single relation. However, for many tasks, such as
recommendation in social networks, user-item interactions must be modeled as a
complex network of multiple relations, not only a single relation. Recently
research on multi-relational factorization and hybrid recommender models has
shown that using extended meta-paths to capture additional information about
both users and items in the network can enhance the accuracy of recommendations
in such networks. Most of this work is focused on unweighted heterogeneous
networks, and to apply these techniques, weighted relations must be simplified
into binary ones. However, information associated with weighted edges, such as
user ratings, which may be crucial for recommendation, are lost in such
binarization. In this paper, we explore a random walk sampling method in which
the frequency of edge sampling is a function of edge weight, and apply this
generate extended meta-paths in weighted heterogeneous networks. With this
sampling technique, we demonstrate improved performance on multiple data sets
both in terms of recommendation accuracy and model generation efficiency
Uniform sampling of steady states in metabolic networks: heterogeneous scales and rounding
The uniform sampling of convex polytopes is an interesting computational
problem with many applications in inference from linear constraints, but the
performances of sampling algorithms can be affected by ill-conditioning. This
is the case of inferring the feasible steady states in models of metabolic
networks, since they can show heterogeneous time scales . In this work we focus
on rounding procedures based on building an ellipsoid that closely matches the
sampling space, that can be used to define an efficient hit-and-run (HR) Markov
Chain Monte Carlo. In this way the uniformity of the sampling of the convex
space of interest is rigorously guaranteed, at odds with non markovian methods.
We analyze and compare three rounding methods in order to sample the feasible
steady states of metabolic networks of three models of growing size up to
genomic scale. The first is based on principal component analysis (PCA), the
second on linear programming (LP) and finally we employ the lovasz ellipsoid
method (LEM). Our results show that a rounding procedure is mandatory for the
application of the HR in these inference problem and suggest that a combination
of LEM or LP with a subsequent PCA perform the best. We finally compare the
distributions of the HR with that of two heuristics based on the Artificially
Centered hit-and-run (ACHR), gpSampler and optGpSampler. They show a good
agreement with the results of the HR for the small network, while on genome
scale models present inconsistencies.Comment: Replacement with major revision
Unbiased sampling of network ensembles
Sampling random graphs with given properties is a key step in the analysis of
networks, as random ensembles represent basic null models required to identify
patterns such as communities and motifs. An important requirement is that the
sampling process is unbiased and efficient. The main approaches are
microcanonical, i.e. they sample graphs that match the enforced constraints
exactly. Unfortunately, when applied to strongly heterogeneous networks (like
most real-world examples), the majority of these approaches become biased
and/or time-consuming. Moreover, the algorithms defined in the simplest cases,
such as binary graphs with given degrees, are not easily generalizable to more
complicated ensembles. Here we propose a solution to the problem via the
introduction of a "Maximize and Sample" ("Max & Sam" for short) method to
correctly sample ensembles of networks where the constraints are `soft', i.e.
realized as ensemble averages. Our method is based on exact maximum-entropy
distributions and is therefore unbiased by construction, even for strongly
heterogeneous networks. It is also more computationally efficient than most
microcanonical alternatives. Finally, it works for both binary and weighted
networks with a variety of constraints, including combined degree-strength
sequences and full reciprocity structure, for which no alternative method
exists. Our canonical approach can in principle be turned into an unbiased
microcanonical one, via a restriction to the relevant subset. Importantly, the
analysis of the fluctuations of the constraints suggests that the
microcanonical and canonical versions of all the ensembles considered here are
not equivalent. We show various real-world applications and provide a code
implementing all our algorithms.Comment: MatLab code available at
http://www.mathworks.it/matlabcentral/fileexchange/46912-max-sam-package-zi
Exact Simulation for Fork-Join Networks with Heterogeneous Service
This paper considers a fork-join network with a group of heterogeneous servers in each service station, e.g. servers having different service rate. The main research interests are the properties of such fork-join networks in equilibrium, such as distributions of response times, maximum queue lengths and load carried by servers. This paper uses exact Monte-Carlo simulation methods to estimate the characteristics of heterogeneous fork-join networks in equilibrium, for which no explicit formulas are available. The algorithm developed is based on coupling from the past. The efficiency of the sampling algorithm is shown theoretically and via simulation
Study of Heterogeneous Academic Networks
Academic networks are derived from scholarly data. They are heterogeneous in the sense that different types of nodes are involved, such as papers and authors. This dissertation studies such heterogeneous networks for measuring the academic influence and learning vector representations of authors. Academic influence has been traditionally measured by the citation count and metrics derived from it. PageRank based algorithms have been used to give higher weight to citations from more influential papers. A better metric is to add authors into the citation network so that the importance of authors and papers are evaluated recursively within the same framework. Based on such heterogeneous academic networks, we propose a new algorithm for ranking authors. Tested on two large networks, we find that our method outperforms the other 10 methods in terms of the number of award winners among top-ranked authors. We further improve the method by finding and dealing with the long reference issue. Moreover, we find the mutual citation in paper networks and the self citation issue in author networks. Our new method can reduce the impact of the above three issues and identify more rising stars. To learn efficient author representations from heterogeneous academic networks, we propose a new embedding method called Stratified Embedding for Heterogeneous Networks (SEHN) based on Skip-Gram Negative Sampling (SGNS). We conduct Random Walks to generate the traces that represent the structure of the network, then separate the traces into different layers so that each layer contains the nodes of one type only. Such stratification improves embeddings that are derived from the mixed traces by a large margin. SEHN improves the state-of-the-art Metapath2vec by up to 24% at a certain point. The efficacy of stratification is also demonstrated on two classic network embedding algorithms DeepWalk and Node2vec. The results are validated in two heterogeneous networks. We also demonstrate that SEHN outperforms the embedding of homogeneous author networks that are induced from their corresponding heterogeneous networks
- …