177,631 research outputs found
A General Optimization Technique for High Quality Community Detection in Complex Networks
Recent years have witnessed the development of a large body of algorithms for
community detection in complex networks. Most of them are based upon the
optimization of objective functions, among which modularity is the most common,
though a number of alternatives have been suggested in the scientific
literature. We present here an effective general search strategy for the
optimization of various objective functions for community detection purposes.
When applied to modularity, on both real-world and synthetic networks, our
search strategy substantially outperforms the best existing algorithms in terms
of final scores of the objective function; for description length, its
performance is on par with the original Infomap algorithm. The execution time
of our algorithm is on par with non-greedy alternatives present in literature,
and networks of up to 10,000 nodes can be analyzed in time spans ranging from
minutes to a few hours on average workstations, making our approach readily
applicable to tasks which require the quality of partitioning to be as high as
possible, and are not limited by strict time constraints. Finally, based on the
most effective of the available optimization techniques, we compare the
performance of modularity and code length as objective functions, in terms of
the quality of the partitions one can achieve by optimizing them. To this end,
we evaluated the ability of each objective function to reconstruct the
underlying structure of a large set of synthetic and real-world networks.Comment: MAIN text: 14 pages, 4 figures, 1 table Supplementary information: 19
pages, 8 figures, 5 table
Bi-Objective Community Detection (BOCD) in Networks using Genetic Algorithm
A lot of research effort has been put into community detection from all
corners of academic interest such as physics, mathematics and computer science.
In this paper I have proposed a Bi-Objective Genetic Algorithm for community
detection which maximizes modularity and community score. Then the results
obtained for both benchmark and real life data sets are compared with other
algorithms using the modularity and MNI performance metrics. The results show
that the BOCD algorithm is capable of successfully detecting community
structure in both real life and synthetic datasets, as well as improving upon
the performance of previous techniques.Comment: 11 pages, 3 Figures, 3 Tables. arXiv admin note: substantial text
overlap with arXiv:0906.061
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Median evidential c-means algorithm and its application to community detection
Median clustering is of great value for partitioning relational data. In this
paper, a new prototype-based clustering method, called Median Evidential
C-Means (MECM), which is an extension of median c-means and median fuzzy
c-means on the theoretical framework of belief functions is proposed. The
median variant relaxes the restriction of a metric space embedding for the
objects but constrains the prototypes to be in the original data set. Due to
these properties, MECM could be applied to graph clustering problems. A
community detection scheme for social networks based on MECM is investigated
and the obtained credal partitions of graphs, which are more refined than crisp
and fuzzy ones, enable us to have a better understanding of the graph
structures. An initial prototype-selection scheme based on evidential
semi-centrality is presented to avoid local premature convergence and an
evidential modularity function is defined to choose the optimal number of
communities. Finally, experiments in synthetic and real data sets illustrate
the performance of MECM and show its difference to other methods
Searching for network modules
When analyzing complex networks a key target is to uncover their modular
structure, which means searching for a family of modules, namely node subsets
spanning each a subnetwork more densely connected than the average. This work
proposes a novel type of objective function for graph clustering, in the form
of a multilinear polynomial whose coefficients are determined by network
topology. It may be thought of as a potential function, to be maximized, taking
its values on fuzzy clusterings or families of fuzzy subsets of nodes over
which every node distributes a unit membership. When suitably parametrized,
this potential is shown to attain its maximum when every node concentrates its
all unit membership on some module. The output thus is a partition, while the
original discrete optimization problem is turned into a continuous version
allowing to conceive alternative search strategies. The instance of the problem
being a pseudo-Boolean function assigning real-valued cluster scores to node
subsets, modularity maximization is employed to exemplify a so-called quadratic
form, in that the scores of singletons and pairs also fully determine the
scores of larger clusters, while the resulting multilinear polynomial potential
function has degree 2. After considering further quadratic instances, different
from modularity and obtained by interpreting network topology in alternative
manners, a greedy local-search strategy for the continuous framework is
analytically compared with an existing greedy agglomerative procedure for the
discrete case. Overlapping is finally discussed in terms of multiple runs, i.e.
several local searches with different initializations.Comment: 10 page
A maximal clique based multiobjective evolutionary algorithm for overlapping community detection
Detecting community structure has become one im-portant technique for studying complex networks. Although many community detection algorithms have been proposed, most of them focus on separated communities, where each node can be-long to only one community. However, in many real-world net-works, communities are often overlapped with each other. De-veloping overlapping community detection algorithms thus be-comes necessary. Along this avenue, this paper proposes a maxi-mal clique based multiobjective evolutionary algorithm for over-lapping community detection. In this algorithm, a new represen-tation scheme based on the introduced maximal-clique graph is presented. Since the maximal-clique graph is defined by using a set of maximal cliques of original graph as nodes and two maximal cliques are allowed to share the same nodes of the original graph, overlap is an intrinsic property of the maximal-clique graph. Attributing to this property, the new representation scheme al-lows multiobjective evolutionary algorithms to handle the over-lapping community detection problem in a way similar to that of the separated community detection, such that the optimization problems are simplified. As a result, the proposed algorithm could detect overlapping community structure with higher partition accuracy and lower computational cost when compared with the existing ones. The experiments on both synthetic and real-world networks validate the effectiveness and efficiency of the proposed algorithm
- …