543 research outputs found
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Analysis of a large-scale weighted network of one-to-one human communication
We construct a connected network of 3.9 million nodes from mobile phone call
records, which can be regarded as a proxy for the underlying human
communication network at the societal level. We assign two weights on each edge
to reflect the strength of social interaction, which are the aggregate call
duration and the cumulative number of calls placed between the individuals over
a period of 18 weeks. We present a detailed analysis of this weighted network
by examining its degree, strength, and weight distributions, as well as its
topological assortativity and weighted assortativity, clustering and weighted
clustering, together with correlations between these quantities. We give an
account of motif intensity and coherence distributions and compare them to a
randomized reference system. We also use the concept of link overlap to measure
the number of common neighbors any two adjacent nodes have, which serves as a
useful local measure for identifying the interconnectedness of communities. We
report a positive correlation between the overlap and weight of a link, thus
providing strong quantitative evidence for the weak ties hypothesis, a central
concept in social network analysis. The percolation properties of the network
are found to depend on the type and order of removed links, and they can help
understand how the local structure of the network manifests itself at the
global level. We hope that our results will contribute to modeling weighted
large-scale social networks, and believe that the systematic approach followed
here can be adopted to study other weighted networks.Comment: 25 pages, 17 figures, 2 table
- …