18,455 research outputs found
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
Detecting highly overlapping community structure by greedy clique expansion
In complex networks it is common for each node to belong to several
communities, implying a highly overlapping community structure. Recent advances
in benchmarking indicate that existing community assignment algorithms that are
capable of detecting overlapping communities perform well only when the extent
of community overlap is kept to modest levels. To overcome this limitation, we
introduce a new community assignment algorithm called Greedy Clique Expansion
(GCE). The algorithm identifies distinct cliques as seeds and expands these
seeds by greedily optimizing a local fitness function. We perform extensive
benchmarks on synthetic data to demonstrate that GCE's good performance is
robust across diverse graph topologies. Significantly, GCE is the only
algorithm to perform well on these synthetic graphs, in which every node
belongs to multiple communities. Furthermore, when put to the task of
identifying functional modules in protein interaction data, and college dorm
assignments in Facebook friendship data, we find that GCE performs
competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at
http://sites.google.com/site/greedycliqueexpansion
Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints
Algorithms for detecting communities in complex networks are generally
unsupervised, relying solely on the structure of the network. However, these
methods can often fail to uncover meaningful groupings that reflect the
underlying communities in the data, particularly when those structures are
highly overlapping. One way to improve the usefulness of these algorithms is by
incorporating additional background information, which can be used as a source
of constraints to direct the community detection process. In this work, we
explore the potential of semi-supervised strategies to improve algorithms for
finding overlapping communities in networks. Specifically, we propose a new
method, based on label propagation, for finding communities using a limited
number of pairwise constraints. Evaluations on synthetic and real-world
datasets demonstrate the potential of this approach for uncovering meaningful
community structures in cases where each node can potentially belong to more
than one community.Comment: Fix table
Seeding for pervasively overlapping communities
In some social and biological networks, the majority of nodes belong to
multiple communities. It has recently been shown that a number of the
algorithms that are designed to detect overlapping communities do not perform
well in such highly overlapping settings. Here, we consider one class of these
algorithms, those which optimize a local fitness measure, typically by using a
greedy heuristic to expand a seed into a community. We perform synthetic
benchmarks which indicate that an appropriate seeding strategy becomes
increasingly important as the extent of community overlap increases. We find
that distinct cliques provide the best seeds. We find further support for this
seeding strategy with benchmarks on a Facebook network and the yeast
interactome.Comment: 8 Page
Obtaining Communities with a Fitness Growth Process
The study of community structure has been a hot topic of research over the
last years. But, while successfully applied in several areas, the concept lacks
of a general and precise notion. Facts like the hierarchical structure and
heterogeneity of complex networks make it difficult to unify the idea of
community and its evaluation. The global functional known as modularity is
probably the most used technique in this area. Nevertheless, its limits have
been deeply studied. Local techniques as the ones by Lancichinetti et al. and
Palla et al. arose as an answer to the resolution limit and degeneracies that
modularity has.
Here we start from the algorithm by Lancichinetti et al. and propose a unique
growth process for a fitness function that, while being local, finds a
community partition that covers the whole network, updating the scale parameter
dynamically. We test the quality of our results by using a set of benchmarks of
heterogeneous graphs. We discuss alternative measures for evaluating the
community structure and, in the light of them, infer possible explanations for
the better performance of local methods compared to global ones in these cases
Link Clustering with Extended Link Similarity and EQ Evaluation Division.
Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method
Bi-Objective Community Detection (BOCD) in Networks using Genetic Algorithm
A lot of research effort has been put into community detection from all
corners of academic interest such as physics, mathematics and computer science.
In this paper I have proposed a Bi-Objective Genetic Algorithm for community
detection which maximizes modularity and community score. Then the results
obtained for both benchmark and real life data sets are compared with other
algorithms using the modularity and MNI performance metrics. The results show
that the BOCD algorithm is capable of successfully detecting community
structure in both real life and synthetic datasets, as well as improving upon
the performance of previous techniques.Comment: 11 pages, 3 Figures, 3 Tables. arXiv admin note: substantial text
overlap with arXiv:0906.061
- …