15,997 research outputs found
An efficient and principled method for detecting communities in networks
A fundamental problem in the analysis of network data is the detection of
network communities, groups of densely interconnected nodes, which may be
overlapping or disjoint. Here we describe a method for finding overlapping
communities based on a principled statistical approach using generative network
models. We show how the method can be implemented using a fast, closed-form
expectation-maximization algorithm that allows us to analyze networks of
millions of nodes in reasonable running times. We test the method both on
real-world networks and on synthetic benchmarks and find that it gives results
competitive with previous methods. We also show that the same approach can be
used to extract nonoverlapping community divisions via a relaxation method, and
demonstrate that the algorithm is competitively fast and accurate for the
nonoverlapping problem.Comment: 14 pages, 5 figures, 1 tabl
Fundamental structures of dynamic social networks
Social systems are in a constant state of flux with dynamics spanning from
minute-by-minute changes to patterns present on the timescale of years.
Accurate models of social dynamics are important for understanding spreading of
influence or diseases, formation of friendships, and the productivity of teams.
While there has been much progress on understanding complex networks over the
past decade, little is known about the regularities governing the
micro-dynamics of social networks. Here we explore the dynamic social network
of a densely-connected population of approximately 1000 individuals and their
interactions in the network of real-world person-to-person proximity measured
via Bluetooth, as well as their telecommunication networks, online social media
contacts, geo-location, and demographic data. These high-resolution data allow
us to observe social groups directly, rendering community detection
unnecessary. Starting from 5-minute time slices we uncover dynamic social
structures expressed on multiple timescales. On the hourly timescale, we find
that gatherings are fluid, with members coming and going, but organized via a
stable core of individuals. Each core represents a social context. Cores
exhibit a pattern of recurring meetings across weeks and months, each with
varying degrees of regularity. Taken together, these findings provide a
powerful simplification of the social network, where cores represent
fundamental structures expressed with strong temporal and spatial regularity.
Using this framework, we explore the complex interplay between social and
geospatial behavior, documenting how the formation of cores are preceded by
coordination behavior in the communication networks, and demonstrating that
social behavior can be predicted with high precision.Comment: Main Manuscript: 16 pages, 4 figures. Supplementary Information: 39
pages, 34 figure
Partitioning networks into cliques: a randomized heuristic approach
In the context of community detection in social networks, the term community can be grounded in the strict way that simply everybody should know each other within the community. We consider the corresponding community detection problem. We search for a partitioning of a network into the minimum number of non-overlapping cliques, such that the cliques cover all vertices. This problem is called the clique covering problem (CCP) and is one of the classical NP-hard problems. For CCP, we propose a randomized heuristic approach. To construct a high quality solution to CCP, we present an iterated greedy (IG) algorithm. IG can also be combined with a heuristic used to determine how far the algorithm is from the optimum in the worst case. Randomized local search (RLS) for maximum independent set was proposed to find such a bound. The experimental results of IG and the bounds obtained by RLS indicate that IG is a very suitable technique for solving CCP in real-world graphs. In addition, we summarize our basic rigorous results, which were developed for analysis of IG and understanding of its behavior on several relevant graph classes
Approximate Closest Community Search in Networks
Recently, there has been significant interest in the study of the community
search problem in social and information networks: given one or more query
nodes, find densely connected communities containing the query nodes. However,
most existing studies do not address the "free rider" issue, that is, nodes far
away from query nodes and irrelevant to them are included in the detected
community. Some state-of-the-art models have attempted to address this issue,
but not only are their formulated problems NP-hard, they do not admit any
approximations without restrictive assumptions, which may not always hold in
practice.
In this paper, given an undirected graph G and a set of query nodes Q, we
study community search using the k-truss based community model. We formulate
our problem of finding a closest truss community (CTC), as finding a connected
k-truss subgraph with the largest k that contains Q, and has the minimum
diameter among such subgraphs. We prove this problem is NP-hard. Furthermore,
it is NP-hard to approximate the problem within a factor , for
any . However, we develop a greedy algorithmic framework,
which first finds a CTC containing Q, and then iteratively removes the furthest
nodes from Q, from the graph. The method achieves 2-approximation to the
optimal solution. To further improve the efficiency, we make use of a compact
truss index and develop efficient algorithms for k-truss identification and
maintenance as nodes get eliminated. In addition, using bulk deletion
optimization and local exploration strategies, we propose two more efficient
algorithms. One of them trades some approximation quality for efficiency while
the other is a very efficient heuristic. Extensive experiments on 6 real-world
networks show the effectiveness and efficiency of our community model and
search algorithms
A Replica Inference Approach to Unsupervised Multi-Scale Image Segmentation
We apply a replica inference based Potts model method to unsupervised image
segmentation on multiple scales. This approach was inspired by the statistical
mechanics problem of "community detection" and its phase diagram. Specifically,
the problem is cast as identifying tightly bound clusters ("communities" or
"solutes") against a background or "solvent". Within our multiresolution
approach, we compute information theory based correlations among multiple
solutions ("replicas") of the same graph over a range of resolutions.
Significant multiresolution structures are identified by replica correlations
as manifest in information theory overlaps. With the aid of these correlations
as well as thermodynamic measures, the phase diagram of the corresponding Potts
model is analyzed both at zero and finite temperatures. Optimal parameters
corresponding to a sensible unsupervised segmentation correspond to the "easy
phase" of the Potts model. Our algorithm is fast and shown to be at least as
accurate as the best algorithms to date and to be especially suited to the
detection of camouflaged images.Comment: 26 pages, 22 figure
- …