Search CORE

4,827 research outputs found

Topological Feature Based Classification

Author: Peel Leto
Publication venue
Publication date: 01/01/2011
Field of study

There has been a lot of interest in developing algorithms to extract clusters or communities from networks. This work proposes a method, based on blockmodelling, for leveraging communities and other topological features for use in a predictive classification task. Motivated by the issues faced by the field of community detection and inspired by recent advances in Bayesian topic modelling, the presented model automatically discovers topological features relevant to a given classification task. In this way, rather than attempting to identify some universal best set of clusters for an undefined goal, the aim is to find the best set of clusters for a particular purpose. Using this method, topological features can be validated and assessed within a given context by their predictive performance. The proposed model differs from other relational and semi-supervised learning models as it identifies topological features to explain the classification decision. In a demonstration on a number of real networks the predictive capability of the topological features are shown to rival the performance of content based relational learners. Additionally, the model is shown to outperform graph-based semi-supervised methods on directed and approximately bipartite networks.Comment: Awarded 3rd Best Student Paper at 14th International Conference on Information Fusion 201

arXiv.org e-Print Archive

CiteSeerX

Maastricht University Research Portal

UCL Discovery

Evaluating Overfit and Underfit in Models of Network Community Structure

Author: Clauset Aaron
Ghasemian Amir
Hosseinmardi Homa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

arXiv.org e-Print Archive

Crossref

Multilayer Networks

Author: Arenas Alexandre
Barthelemy Marc
Gleeson James P.
Kivelä Mikko
Moreno Yamir
Porter Mason A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/03/2014
Field of study

In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such "multilayer" features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize "traditional" network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others. We also survey and discuss existing data sets that can be represented as multilayer networks. We review attempts to generalize single-layer-network diagnostics to multilayer networks. We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks

Author: Corneli Marco
Latouche Pierre
Rossi Fabrice
Publication venue: 'Elsevier BV'
Publication date: 01/06/2016
Field of study

The stochastic block model (SBM) is a flexible probabilistic tool that can be used to model interactions between clusters of nodes in a network. However, it does not account for interactions of time varying intensity between clusters. The extension of the SBM developed in this paper addresses this shortcoming through a temporal partition: assuming interactions between nodes are recorded on fixed-length time intervals, the inference procedure associated with the model we propose allows to cluster simultaneously the nodes of the network and the time intervals. The number of clusters of nodes and of time intervals, as well as the memberships to clusters, are obtained by maximizing an exact integrated complete-data likelihood, relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology

arXiv.org e-Print Archive

Crossref

HAL-Paris1

Link Prediction in Complex Networks: A Survey

Author: Adamic
Airoldi
Albert
Alon
Amaral
Arenas
Baiesi
Barabási
Barahona
Bayes
Bianconi
Blondel
Boccaletti
Breiman
Brin
Buntine
Burke
Butts
Caldarelli
Carmi
Casella
Casella
Chebotarev
Chu
Clauset
Colizza
Cui
da
Dasgupta
Dawah
Dorelan
Dorogovtsev
Fouss
Fouss
Gallagher
Gastner
Geisser
Getoor
Girvan
Granger
Guha
Guimerà
Guimerà
Guimerà
Guimerà
Hanely
Heckerman
Heckerman
Herlocker
Holland
Holme
Holme
Huang
Huang
Huss
Jaccard
Jeh
Jung
Kaluza
Katz
Kim
Klein
Kohavi
Kossinets
Krebs
Kunegis
Lambiotte
Leicht
Leroy
Leskovec
Liben-Nowell
Lin
Linyuan Lü
Liu
Liu
Liu
Liu
Liu
Lusseau
Lü
Lü
Mann
Manning
Mantrach
Marvel
Metropolis
Molloy
Moore
Mossel
Murata
Neal
Neville
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Ou
O’Madadhain
Pan
Pastor-Satorras
Pastor-Satorras
Penrose
Perotti
Polikar
Ravasz
Redner
Reed
Reichardt
Sales-Pardo
Salton
Salton
Schafer
Schafer
Shang
Shang
Shawe-Taylor
Spiegelhalter
Spring
Stumpf
Su
Sun
Szell
Sørensen
Tao Zhou
Taskar
Tong
Traag
Tylenda
Valverde
von Mering
Vázquez
Wang
Watts
White
White
White
Wilcoxon
Xiao
Xie
Yan
Yin
Yin
Yu
Yu
Yu
Zachary
Zeng
Zhang
Zhang
Zhang
Zhang
Zhang
Zheleva
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 04/10/2010
Field of study

Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labelled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.Comment: 44 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Structured Review of the Evidence for Effects of Code Duplication on Software Quality

Author: Hordijk Wiebe
Ponisio María Laura
Wieringa Roel
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

University of Twente Research Information