3,279 research outputs found
Estimating and Sampling Graphs with Multidimensional Random Walks
Estimating characteristics of large graphs via sampling is a vital part of
the study of complex networks. Current sampling methods such as (independent)
random vertex and random walks are useful but have drawbacks. Random vertex
sampling may require too many resources (time, bandwidth, or money). Random
walks, which normally require fewer resources per sample, can suffer from large
estimation errors in the presence of disconnected or loosely connected graphs.
In this work we propose a new -dimensional random walk that uses
dependent random walkers. We show that the proposed sampling method, which we
call Frontier sampling, exhibits all of the nice sampling properties of a
regular random walk. At the same time, our simulations over large real world
graphs show that, in the presence of disconnected or loosely connected
components, Frontier sampling exhibits lower estimation errors than regular
random walks. We also show that Frontier sampling is more suitable than random
vertex sampling to sample the tail of the degree distribution of the graph
On sampling nodes in a network
Random walk is an important tool in many graph mining applications including estimating graph parameters, sampling portions of the graph, and extracting dense communities. In this paper we consider the problem of sampling nodes from a large graph according to a prescribed distribution by using random walk as the basic primitive. Our goal is to obtain algorithms that make a small number of queries to the graph but output a node that is sampled according to the prescribed distribution. Focusing on the uniform distribution case, we study the query complexity of three algorithms and show a near-tight bound expressed in terms of the parameters of the graph such as average degree and the mixing time. Both theoretically and empirically, we show that some algorithms are preferable in practice than the others. We also extend our study to the problem of sampling nodes according to some polynomial function of their degrees; this has implications for designing efficient algorithms for applications such as triangle counting
2.5K-Graphs: from Sampling to Generation
Understanding network structure and having access to realistic graphs plays a
central role in computer and social networks research. In this paper, we
propose a complete, and practical methodology for generating graphs that
resemble a real graph of interest. The metrics of the original topology we
target to match are the joint degree distribution (JDD) and the
degree-dependent average clustering coefficient (). We start by
developing efficient estimators for these two metrics based on a node sample
collected via either independence sampling or random walks. Then, we process
the output of the estimators to ensure that the target properties are
realizable. Finally, we propose an efficient algorithm for generating
topologies that have the exact target JDD and a close to the
target. Extensive simulations using real-life graphs show that the graphs
generated by our methodology are similar to the original graph with respect to,
not only the two target metrics, but also a wide range of other topological
metrics; furthermore, our generator is order of magnitudes faster than
state-of-the-art techniques
- …