7,297 research outputs found
On sampling social networking services
This article aims at summarizing the existing methods for sampling social
networking services and proposing a faster confidence interval for related
sampling methods. It also includes comparisons of common network sampling
techniques
Multiscale mixing patterns in networks
Assortative mixing in networks is the tendency for nodes with the same
attributes, or metadata, to link to each other. It is a property often found in
social networks manifesting as a higher tendency of links occurring between
people with the same age, race, or political belief. Quantifying the level of
assortativity or disassortativity (the preference of linking to nodes with
different attributes) can shed light on the factors involved in the formation
of links and contagion processes in complex networks. It is common practice to
measure the level of assortativity according to the assortativity coefficient,
or modularity in the case of discrete-valued metadata. This global value is the
average level of assortativity across the network and may not be a
representative statistic when mixing patterns are heterogeneous. For example, a
social network spanning the globe may exhibit local differences in mixing
patterns as a consequence of differences in cultural norms. Here, we introduce
an approach to localise this global measure so that we can describe the
assortativity, across multiple scales, at the node level. Consequently we are
able to capture and qualitatively evaluate the distribution of mixing patterns
in the network. We find that for many real-world networks the distribution of
assortativity is skewed, overdispersed and multimodal. Our method provides a
clearer lens through which we can more closely examine mixing patterns in
networks.Comment: 11 pages, 7 figure
Provable and practical approximations for the degree distribution using sublinear graph samples
The degree distribution is one of the most fundamental properties used in the
analysis of massive graphs. There is a large literature on graph sampling,
where the goal is to estimate properties (especially the degree distribution)
of a large graph through a small, random sample. The degree distribution
estimation poses a significant challenge, due to its heavy-tailed nature and
the large variance in degrees.
We design a new algorithm, SADDLES, for this problem, using recent
mathematical techniques from the field of sublinear algorithms. The SADDLES
algorithm gives provably accurate outputs for all values of the degree
distribution. For the analysis, we define two fatness measures of the degree
distribution, called the -index and the -index. We prove that SADDLES is
sublinear in the graph size when these indices are large. A corollary of this
result is a provably sublinear algorithm for any degree distribution bounded
below by a power law.
We deploy our new algorithm on a variety of real datasets and demonstrate its
excellent empirical behavior. In all instances, we get extremely accurate
approximations for all values in the degree distribution by observing at most
of the vertices. This is a major improvement over the state-of-the-art
sampling algorithms, which typically sample more than of the vertices to
give comparable results. We also observe that the and -indices of real
graphs are large, validating our theoretical analysis.Comment: Longer version of the WWW 2018 submissio
Exploring the assortativity-clustering space of a network's degree sequence
Nowadays there is a multitude of measures designed to capture different
aspects of network structure. To be able to say if the structure of certain
network is expected or not, one needs a reference model (null model). One
frequently used null model is the ensemble of graphs with the same set of
degrees as the original network. In this paper we argue that this ensemble can
be more than just a null model -- it also carries information about the
original network and factors that affect its evolution. By mapping out this
ensemble in the space of some low-level network structure -- in our case those
measured by the assortativity and clustering coefficients -- one can for
example study how close to the valid region of the parameter space the observed
networks are. Such analysis suggests which quantities are actively optimized
during the evolution of the network. We use four very different biological
networks to exemplify our method. Among other things, we find that high
clustering might be a force in the evolution of protein interaction networks.
We also find that all four networks are conspicuously robust to both random
errors and targeted attacks
- …