Search CORE

33 research outputs found

Community detection algorithms: a comparative analysis

Author: Fortunato Santo
Lancichinetti Andrea
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2009
Field of study

Uncovering the community structure exhibited by real networks is a crucial step towards an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom, Blondel et al. and Ronhovde and Nussinov, respectively, have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.Comment: 12 pages, 8 figures. The software to compute the values of our general normalized mutual information is available at http://santo.fortunato.googlepages.com/inthepress

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Robustness of journal rankings by network flows with different amounts of memory

Author: Bohlin Ludvig
Esquivel Alcides Viamontes
Lancichinetti Andrea
Rosvall Martin
Publication venue
Publication date: 09/04/2015
Field of study

As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. Here we compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating scholarly literature, stepping between journals and remembering their previous steps to different degree: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that higher-order Markov models perform better and are more robust to the selection of journals. Whereas our analysis indicates that higher-order models perform better, the performance gain for the second-order Markov model comes at the cost of requiring more citation data over a longer time period.Comment: 9 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems

Author: Arenas Alex
De Domenico Manlio
Lancichinetti Andrea
Rosvall Martin
Publication venue: 'American Physical Society (APS)'
Publication date: 13/08/2014
Field of study

Unveiling the community structure of networks is a powerful methodology to comprehend interconnected systems across the social and natural sciences. To identify different types of functional modules in interaction data aggregated in a single network layer, researchers have developed many powerful methods. For example, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow in the systems they represent. However, many networked systems consist of agents or components that exhibit multiple layers of interactions. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here we propose a method based on compression of network flows that can identify modular flows in non-aggregated multilayer networks. Our numerical experiments on synthetic networks show that the method can accurately identify modules that cannot be identified in aggregated networks or by analyzing the layers separately. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks: scientists affiliated to the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method reveals smaller modules with more overlap that better capture the actual organization

arXiv.org e-Print Archive

Publikationer från Umeå universitet

Archivio della ricerca - Fondazione Bruno Kessler

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Mapping bilateral information interests using the activity of Wikipedia editors

Author: Bohlin Ludvig
Karimi Fariba
Lancichinetti Andrea
Rosvall Martin
Samoilenko Anna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values, and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today's world map of in- formation interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries' bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.Comment: 11 pages, 3 figures in Palgrave Communications 1 (2015

arXiv.org e-Print Archive

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities

Author: Andrea Lancichinetti
G. W. Flake
H. A. Simon
J. Baumes
L. Danon
M. Molloy
Santo Fortunato
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2009
Field of study

Many complex networks display a mesoscopic structure with groups of nodes sharing many links with the other nodes in their group and comparatively few with nodes of different groups. This feature is known as community structure and encodes precious information about the organization and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they should be tested. Recently we have proposed a general class of undirected and unweighted benchmark graphs, with heterogenous distributions of node degree and community size. An increasing attention has been recently devoted to develop algorithms able to consider the direction and the weight of the links, which require suitable benchmark graphs for testing. In this paper we extend the basic ideas behind our previous benchmark to generate directed and weighted networks with built-in community structure. We also consider the possibility that nodes belong to more communities, a feature occurring in real systems, like, e. g., social networks. As a practical application, we show how modularity optimization performs on our new benchmark.Comment: 9 pages, 13 figures. Final version published in Physical Review E. The code to create the benchmark graphs can be freely downloaded from http://santo.fortunato.googlepages.com/inthepress

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A high-reproducibility and high-accuracy method for automated topic classification

Author: Acuna Daniel
Amaral Luís A. Nunes
Körding Konrad
Lancichinetti Andrea
Sirer M. Irmak
Wang Jane X.
Publication venue
Publication date: 03/02/2014
Field of study

Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.Comment: 23 pages, 24 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Finding Statistically Significant Communities in Networks

Author: Fortunato Santo
Lancichinetti Andrea
Radicchi Filippo
Ramasco José J.
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Digital.CSIC

PORTO Publications Open Repository TOrino

Combinatorial approach to Modularity

Author: Andrea Lancichinetti
D. Zelterman
Filippo Radicchi
José J. Ramasco
M. Gaertler
R. Pastor-Satorras
W. W. Zachary
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2010
Field of study

Communities are clusters of nodes with a higher than average density of internal connections. Their detection is of great relevance to better understand the structure and hierarchies present in a network. Modularity has become a standard tool in the area of community detection, providing at the same time a way to evaluate partitions and, by maximizing it, a method to find communities. In this work, we study the modularity from a combinatorial point of view. Our analysis (as the modularity definition) relies on the use of the configurational model, a technique that given a graph produces a series of randomized copies keeping the degree sequence invariant. We develop an approach that enumerates the null model partitions and can be used to calculate the probability distribution function of the modularity. Our theory allows for a deep inquiry of several interesting features characterizing modularity such as its resolution limit and the statistics of the partitions that maximize it. Additionally, the study of the probability of extremes of the modularity in the random graph partitions opens the way for a definition of the statistical significance of network partitions.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino