2,086 research outputs found
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Systematic Topology Analysis and Generation Using Degree Correlations
We present a new, systematic approach for analyzing network topologies. We
first introduce the dK-series of probability distributions specifying all
degree correlations within d-sized subgraphs of a given graph G. Increasing
values of d capture progressively more properties of G at the cost of more
complex representation of the probability distribution. Using this series, we
can quantitatively measure the distance between two graphs and construct random
graphs that accurately reproduce virtually all metrics proposed in the
literature. The nature of the dK-series implies that it will also capture any
future metrics that may be proposed. Using our approach, we construct graphs
for d=0,1,2,3 and demonstrate that these graphs reproduce, with increasing
accuracy, important properties of measured and modeled Internet topologies. We
find that the d=2 case is sufficient for most practical purposes, while d=3
essentially reconstructs the Internet AS- and router-level topologies exactly.
We hope that a systematic method to analyze and synthesize topologies offers a
significant improvement to the set of tools available to network topology and
protocol researchers.Comment: Final versio
Anonymizing Social Graphs via Uncertainty Semantics
Rather than anonymizing social graphs by generalizing them to super
nodes/edges or adding/removing nodes and edges to satisfy given privacy
parameters, recent methods exploit the semantics of uncertain graphs to achieve
privacy protection of participating entities and their relationship. These
techniques anonymize a deterministic graph by converting it into an uncertain
form. In this paper, we propose a generalized obfuscation model based on
uncertain adjacency matrices that keep expected node degrees equal to those in
the unanonymized graph. We analyze two recently proposed schemes and show their
fitting into the model. We also point out disadvantages in each method and
present several elegant techniques to fill the gap between them. Finally, to
support fair comparisons, we develop a new tradeoff quantifying framework by
leveraging the concept of incorrectness in location privacy research.
Experiments on large social graphs demonstrate the effectiveness of our
schemes
2.5K-Graphs: from Sampling to Generation
Understanding network structure and having access to realistic graphs plays a
central role in computer and social networks research. In this paper, we
propose a complete, and practical methodology for generating graphs that
resemble a real graph of interest. The metrics of the original topology we
target to match are the joint degree distribution (JDD) and the
degree-dependent average clustering coefficient (). We start by
developing efficient estimators for these two metrics based on a node sample
collected via either independence sampling or random walks. Then, we process
the output of the estimators to ensure that the target properties are
realizable. Finally, we propose an efficient algorithm for generating
topologies that have the exact target JDD and a close to the
target. Extensive simulations using real-life graphs show that the graphs
generated by our methodology are similar to the original graph with respect to,
not only the two target metrics, but also a wide range of other topological
metrics; furthermore, our generator is order of magnitudes faster than
state-of-the-art techniques
- …