1,549 research outputs found
A framework for community detection in heterogeneous multi-relational networks
There has been a surge of interest in community detection in homogeneous
single-relational networks which contain only one type of nodes and edges.
However, many real-world systems are naturally described as heterogeneous
multi-relational networks which contain multiple types of nodes and edges. In
this paper, we propose a new method for detecting communities in such networks.
Our method is based on optimizing the composite modularity, which is a new
modularity proposed for evaluating partitions of a heterogeneous
multi-relational network into communities. Our method is parameter-free,
scalable, and suitable for various networks with general structure. We
demonstrate that it outperforms the state-of-the-art techniques in detecting
pre-planted communities in synthetic networks. Applied to a real-world Digg
network, it successfully detects meaningful communities.Comment: 27 pages, 10 figure
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Structure of Heterogeneous Networks
Heterogeneous networks play a key role in the evolution of communities and
the decisions individuals make. These networks link different types of
entities, for example, people and the events they attend. Network analysis
algorithms usually project such networks unto simple graphs composed of
entities of a single type. In the process, they conflate relations between
entities of different types and loose important structural information. We
develop a mathematical framework that can be used to compactly represent and
analyze heterogeneous networks that combine multiple entity and link types. We
generalize Bonacich centrality, which measures connectivity between nodes by
the number of paths between them, to heterogeneous networks and use this
measure to study network structure. Specifically, we extend the popular
modularity-maximization method for community detection to use this centrality
metric. We also rank nodes based on their connectivity to other nodes. One
advantage of this centrality metric is that it has a tunable parameter we can
use to set the length scale of interactions. By studying how rankings change
with this parameter allows us to identify important nodes in the network. We
apply the proposed method to analyze the structure of several heterogeneous
networks. We show that exploiting additional sources of evidence corresponding
to links between, as well as among, different entity types yields new insights
into network structure
The Community Structure of R&D Cooperation in Europe. Evidence from a social network perspective
The focus of this paper is on pre-competitive R&D cooperation across Europe, as captured by R&D joint ventures funded by the European Commission in the time period 1998-2002, within the 5th Framework Program. The cooperations in this Framework Program give rise to a bipartite network with 72,745 network edges between 25,839 actors (representing organizations that include firms, universities, research organizations and public agencies) and 9,490 R&D projects. With this construction, participating actors are linked only through joint projects.
In this paper we describe the community identification problem based on the concept of modularity, and use the recently introduced label-propagation algorithm to identify communities in the network, and differentiate the identified communities by developing community-specific profiles using social network analysis and geographic visualization techniques. We expect the results to enrich our picture of the European Research Area by providing new insights into the global and local structures of R&D cooperation across Europe
Motif-based communities in complex networks
Community definitions usually focus on edges, inside and between the
communities. However, the high density of edges within a community determines
correlations between nodes going beyond nearest-neighbours, and which are
indicated by the presence of motifs. We show how motifs can be used to define
general classes of nodes, including communities, by extending the mathematical
expression of Newman-Girvan modularity. We construct then a general framework
and apply it to some synthetic and real networks
- …