5,311 research outputs found
Communities and beyond: mesoscopic analysis of a large social network with complementary methods
Community detection methods have so far been tested mostly on small empirical
networks and on synthetic benchmarks. Much less is known about their
performance on large real-world networks, which nonetheless are a significant
target for application. We analyze the performance of three state-of-the-art
community detection methods by using them to identify communities in a large
social network constructed from mobile phone call records. We find that all
methods detect communities that are meaningful in some respects but fall short
in others, and that there often is a hierarchical relationship between
communities detected by different methods. Our results suggest that community
detection methods could be useful in studying the general mesoscale structure
of networks, as opposed to only trying to identify dense structures.Comment: 11 pages, 10 figures. V2: typos corrected, one sentence added. V3:
revised version, Appendix added. V4: final published versio
A Replica Inference Approach to Unsupervised Multi-Scale Image Segmentation
We apply a replica inference based Potts model method to unsupervised image
segmentation on multiple scales. This approach was inspired by the statistical
mechanics problem of "community detection" and its phase diagram. Specifically,
the problem is cast as identifying tightly bound clusters ("communities" or
"solutes") against a background or "solvent". Within our multiresolution
approach, we compute information theory based correlations among multiple
solutions ("replicas") of the same graph over a range of resolutions.
Significant multiresolution structures are identified by replica correlations
as manifest in information theory overlaps. With the aid of these correlations
as well as thermodynamic measures, the phase diagram of the corresponding Potts
model is analyzed both at zero and finite temperatures. Optimal parameters
corresponding to a sensible unsupervised segmentation correspond to the "easy
phase" of the Potts model. Our algorithm is fast and shown to be at least as
accurate as the best algorithms to date and to be especially suited to the
detection of camouflaged images.Comment: 26 pages, 22 figure
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
Entity Ranking on Graphs: Studies on Expert Finding
Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models
Statistical significance of communities in networks
Nodes in real-world networks are usually organized in local modules. These
groups, called communities, are intuitively defined as sub-graphs with a larger
density of internal connections than of external links. In this work, we
introduce a new measure aimed at quantifying the statistical significance of
single communities. Extreme and Order Statistics are used to predict the
statistics associated with individual clusters in random graphs. These
distributions allows us to define one community significance as the probability
that a generic clustering algorithm finds such a group in a random graph. The
method is successfully applied in the case of real-world networks for the
evaluation of the significance of their communities.Comment: 9 pages, 8 figures, 2 tables. The software to calculate the C-score
can be found at http://filrad.homelinux.org/cscor
Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm
We introduce a community detection algorithm (Fluid Communities) based on the
idea of fluids interacting in an environment, expanding and contracting as a
result of that interaction. Fluid Communities is based on the propagation
methodology, which represents the state-of-the-art in terms of computational
cost and scalability. While being highly efficient, Fluid Communities is able
to find communities in synthetic graphs with an accuracy close to the current
best alternatives. Additionally, Fluid Communities is the first
propagation-based algorithm capable of identifying a variable number of
communities in network. To illustrate the relevance of the algorithm, we
evaluate the diversity of the communities found by Fluid Communities, and find
them to be significantly different from the ones found by alternative methods.Comment: Accepted at the 6th International Conference on Complex Networks and
Their Application
- …