7,552 research outputs found
Fast Multi-Scale Community Detection based on Local Criteria within a Multi-Threaded Algorithm
Many systems can be described using graphs, or networks. Detecting
communities in these networks can provide information about the underlying
structure and functioning of the original systems. Yet this detection is a
complex task and a large amount of work was dedicated to it in the past decade.
One important feature is that communities can be found at several scales, or
levels of resolution, indicating several levels of organisations. Therefore
solutions to the community structure may not be unique. Also networks tend to
be large and hence require efficient processing. In this work, we present a new
algorithm for the fast detection of communities across scales using a local
criterion. We exploit the local aspect of the criterion to enable parallel
computation and improve the algorithm's efficiency further. The algorithm is
tested against large generated multi-scale networks and experiments demonstrate
its efficiency and accuracy.Comment: arXiv admin note: text overlap with arXiv:1204.100
Detecting highly overlapping community structure by greedy clique expansion
In complex networks it is common for each node to belong to several
communities, implying a highly overlapping community structure. Recent advances
in benchmarking indicate that existing community assignment algorithms that are
capable of detecting overlapping communities perform well only when the extent
of community overlap is kept to modest levels. To overcome this limitation, we
introduce a new community assignment algorithm called Greedy Clique Expansion
(GCE). The algorithm identifies distinct cliques as seeds and expands these
seeds by greedily optimizing a local fitness function. We perform extensive
benchmarks on synthetic data to demonstrate that GCE's good performance is
robust across diverse graph topologies. Significantly, GCE is the only
algorithm to perform well on these synthetic graphs, in which every node
belongs to multiple communities. Furthermore, when put to the task of
identifying functional modules in protein interaction data, and college dorm
assignments in Facebook friendship data, we find that GCE performs
competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at
http://sites.google.com/site/greedycliqueexpansion
Correlation-Based Community Detection
Mining community structures from the complex network is an important problem
across a variety of fields. Many existing community detection methods detect
communities through optimizing a community evaluation function. However, most
of these functions even have high values on random graphs and may fail to
detect small communities in the large-scale network (the so-called resolution
limit problem). In this paper, we introduce two novel node-centric community
evaluation functions by connecting correlation analysis with community
detection. We will further show that the correlation analysis can provide a
novel theoretical framework which unifies some existing evaluation functions in
the context of a correlation-based optimization problem. In this framework, we
can mitigate the resolution limit problem and eliminate the influence of random
fluctuations by selecting the right correlation function. Furthermore, we
introduce three key properties used in mining association rule into the context
of community detection to help us choose the appropriate correlation function.
Based on our introduced correlation functions, we propose a community detection
algorithm called CBCD. Our proposed algorithm outperforms existing
state-of-the-art algorithms on both synthetic benchmark networks and real-world
networks
Ensemble-Based Discovery of Disjoint, Overlapping and Fuzzy Community Structures in Networks
Though much work has been done on ensemble clustering in data mining, the
application of ensemble methods to community detection in networks is in its
infancy. In this paper, we propose two ensemble methods: ENDISCO and MEDOC.
ENDISCO performs disjoint community detection. In contrast, MEDOC performs
disjoint, overlapping, and fuzzy community detection and represents the first
ever ensemble method for fuzzy and overlapping community detection. We run
extensive experiments with both algorithms against both synthetic and several
real-world datasets for which community structures are known. We show that
ENDISCO and MEDOC both beat the best-known existing standalone community
detection algorithms (though we emphasize that they leverage them). In the case
of disjoint community detection, we show that both ENDISCO and MEDOC beat an
existing ensemble community detection algorithm both in terms of multiple
accuracy measures and run-time. We further show that our ensemble algorithms
can help explore core-periphery structure of network communities, identify
stable communities in dynamic networks and help solve the "degeneracy of
solutions" problem, generating robust results
Efficiently Detecting Overlapping Communities through Seeding and Semi-Supervised Learning
Seeding then expanding is a commonly used scheme to discover overlapping
communities in a network. Most seeding methods are either too complex to scale
to large networks or too simple to select high-quality seeds, and the
non-principled functions used by most expanding methods lead to poor
performance when applied to diverse networks. This paper proposes a new method
that transforms a network into a corpus where each edge is treated as a
document, and all nodes of the network are treated as terms of the corpus. An
effective seeding method is also proposed that selects seeds as a training set,
then a principled expanding method based on semi-supervised learning is applied
to classify edges. We compare our new algorithm with four other community
detection algorithms on a wide range of synthetic and empirical networks.
Experimental results show that the new algorithm can significantly improve
clustering performance in most cases. Furthermore, the time complexity of the
new algorithm is linear to the number of edges, and this low complexity makes
the new algorithm scalable to large networks
Real-Time Community Detection in Large Social Networks on a Laptop
For a broad range of research, governmental and commercial applications it is
important to understand the allegiances, communities and structure of key
players in society. One promising direction towards extracting this information
is to exploit the rich relational data in digital social networks (the social
graph). As social media data sets are very large, most approaches make use of
distributed computing systems for this purpose. Distributing graph processing
requires solving many difficult engineering problems, which has lead some
researchers to look at single-machine solutions that are faster and easier to
maintain. In this article, we present a single-machine real-time system for
large-scale graph processing that allows analysts to interactively explore
graph structures. The key idea is that the aggregate actions of large numbers
of users can be compressed into a data structure that encapsulates user
similarities while being robust to noise and queryable in real-time. We achieve
single machine real-time performance by compressing the neighbourhood of each
vertex using minhash signatures and facilitate rapid queries through Locality
Sensitive Hashing. These techniques reduce query times from hours using
industrial desktop machines operating on the full graph to milliseconds on
standard laptops. Our method allows exploration of strongly associated regions
(i.e. communities) of large graphs in real-time on a laptop. It has been
deployed in software that is actively used by social network analysts and
offers another channel for media owners to monetise their data, helping them to
continue to provide free services that are valued by billions of people
globally
Local Partition in Rich Graphs
Local graph partitioning is a key graph mining tool that allows researchers
to identify small groups of interrelated nodes (e.g. people) and their
connective edges (e.g. interactions). Because local graph partitioning is
primarily focused on the network structure of the graph (vertices and edges),
it often fails to consider the additional information contained in the
attributes. In this paper we propose---(i) a scalable algorithm to improve
local graph partitioning by taking into account both the network structure of
the graph and the attribute data and (ii) an application of the proposed local
graph partitioning algorithm (AttriPart) to predict the evolution of local
communities (LocalForecasting). Experimental results show that our proposed
AttriPart algorithm finds up to 1.6 denser local partitions, while
running approximately 43 faster than traditional local partitioning
techniques (PageRank-Nibble). In addition, our LocalForecasting algorithm shows
a significant improvement in the number of nodes and edges correctly predicted
over baseline methods.Comment: Under KDD 2018 revie
GenPerm: A Unified Method for Detecting Non-overlapping and Overlapping Communities
Detection of non-overlapping and overlapping communities are essentially the
same problem. However, current algorithms focus either on finding overlapping
or non-overlapping communities. We present a generalized framework that can
identify both non-overlapping and overlapping communities, without any prior
input about the network or its community distribution. To do so, we introduce a
vertex-based metric, GenPerm, that quantifies by how much a vertex belongs to
each of its constituent communities. Our community detection algorithm is based
on maximizing the GenPerm over all the vertices in the network. We demonstrate,
through experiments over synthetic and real-world networks, that GenPerm is
more effective than other metrics in evaluating community structure. Further,
we show that due to its vertex-centric property, GenPerm can be used to unfold
several inferences beyond community detection, such as core-periphery analysis
and message spreading. Our algorithm for maximizing GenPerm outperforms six
state-of-the-art algorithms in accurately predicting the ground-truth labels.
Finally, we discuss the problem of resolution limit in overlapping communities
and demonstrate that maximizing GenPerm can mitigate this problem.Comment: This paper (final version) is accepted in IEEE Transactions on
Knowledge and Data Engineering (TKDE). 13 Figures, 6 table
Identification of Overlapping Communities by Locally Calculating Community-Changing Resolution Levels
An algorithm for the detection of overlapping natural communities in networks
was proposed by Lancichinetti, Fortunato, and Kertesz (LFK) last year. The LFK
algorithm constructs natural communities of (in principle) all nodes of a graph
by maximising the local fitness of communities. The resulting modules can
overlap. The generation of communities can easily be repeated for many values
of resolution; thus allowing different views on the network at different
resolutions. We implemented the main idea of the LFK algorithm---to generate
natural communities of each node of a network---in a different way. We start
with a value of the resolution parameter that is high enough for each node to
be its own natural community. As soon as the resolution is reduced, each node
acquires other nodes as members of its community, i.e. natural communities
grow. For each community found at a certain resolution level we calculate the
next lower resolution where a node is added. After adding a node to a community
of a seed node we check whether it is also the natural community of a node that
we have already analysed. In this case, we can stop expanding the seed node's
community. We tested our algorithm on a small benchmark graph and on a network
of about 500 papers in information science (weighted with the Salton index of
bibliographic coupling). In our tests, this approach results in characteristic
ranges of resolution where a large resolution change does not lead to a growth
of the natural community. Such stable modules were also obtained by applying
the LFK algorithm but since we determine communities for all resolution values
in one run, our approach is faster than the LFK reference. And our algorithm
reveals the hierarchical structure of the graph more easily.Comment: 10 pages, 12 figures, also presented as "A local algorithm to get
overlapping communities at all resolution levels in one run" in a poster
session at ASONAM conference, Odense, Denmark, August 201
A Survey of Community Search Over Big Graphs
With the rapid development of information technologies, various big graphs
are prevalent in many real applications (e.g., social media and knowledge
bases). An important component of these graphs is the network community.
Essentially, a community is a group of vertices which are densely connected
internally. Community retrieval can be used in many real applications, such as
event organization, friend recommendation, and so on. Consequently, how to
efficiently find high-quality communities from big graphs is an important
research topic in the era of big data. Recently a large group of research
works, called community search, have been proposed. They aim to provide
efficient solutions for searching high-quality communities from large networks
in real-time. Nevertheless, these works focus on different types of graphs and
formulate communities in different manners, and thus it is desirable to have a
comprehensive review of these works.
In this survey, we conduct a thorough review of existing community search
works. Moreover, we analyze and compare the quality of communities under their
models, and the performance of different solutions. Furthermore, we point out
new research directions. This survey does not only help researchers to have a
better understanding of existing community search solutions, but also provides
practitioners a better judgment on choosing the proper solutions
- …