11,441 research outputs found
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
Searching for network modules
When analyzing complex networks a key target is to uncover their modular
structure, which means searching for a family of modules, namely node subsets
spanning each a subnetwork more densely connected than the average. This work
proposes a novel type of objective function for graph clustering, in the form
of a multilinear polynomial whose coefficients are determined by network
topology. It may be thought of as a potential function, to be maximized, taking
its values on fuzzy clusterings or families of fuzzy subsets of nodes over
which every node distributes a unit membership. When suitably parametrized,
this potential is shown to attain its maximum when every node concentrates its
all unit membership on some module. The output thus is a partition, while the
original discrete optimization problem is turned into a continuous version
allowing to conceive alternative search strategies. The instance of the problem
being a pseudo-Boolean function assigning real-valued cluster scores to node
subsets, modularity maximization is employed to exemplify a so-called quadratic
form, in that the scores of singletons and pairs also fully determine the
scores of larger clusters, while the resulting multilinear polynomial potential
function has degree 2. After considering further quadratic instances, different
from modularity and obtained by interpreting network topology in alternative
manners, a greedy local-search strategy for the continuous framework is
analytically compared with an existing greedy agglomerative procedure for the
discrete case. Overlapping is finally discussed in terms of multiple runs, i.e.
several local searches with different initializations.Comment: 10 page
Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems of interacting
objects, unifying the study of diverse phenomena including biological organisms
and human society. One crucial step when studying the structure and dynamics of
networks is to identify communities: groups of related nodes that correspond to
functional subunits such as protein complexes or social spheres. Communities in
networks often overlap such that nodes simultaneously belong to several groups.
Meanwhile, many networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical structure. However, the
fact that many real networks have communities with pervasive overlap, where
each and every node belongs to more than one group, has the consequence that a
global hierarchy of nodes cannot capture the relationships between overlapping
groups. Here we reinvent communities as groups of links rather than nodes and
show that this unorthodox approach successfully reconciles the antagonistic
organizing principles of overlapping communities and hierarchy. In contrast to
the existing literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hierarchical
organization. We find relevant link communities in many networks, including
major biological networks such as protein-protein interaction and metabolic
networks, and show that a large social network contains hierarchically
organized community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link communities are
fundamental building blocks that reveal overlap and hierarchical organization
in networks to be two aspects of the same phenomenon.Comment: Main text and supplementary informatio
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
Link Clustering with Extended Link Similarity and EQ Evaluation Division.
Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method
- …