Search CORE

11,968 research outputs found

Comparative Study for Inference of Hidden Classes in Stochastic Block Models

Author: Dempster A P
Florent Krzakala
Goldenberg A
Jörg Reichardt
Lenka Zdeborová
Mossel E Neeman J Sly A
Newman M E J
Onsjö M
Pan Zhang
Sen P
Weiss Y
Yedidia J S
Publication venue: 'IOP Publishing'
Publication date: 13/07/2012
Field of study

Inference of hidden classes in stochastic block model is a classical problem with important applications. Most commonly used methods for this problem involve na\"{\i}ve mean field approaches or heuristic spectral methods. Recently, belief propagation was proposed for this problem. In this contribution we perform a comparative study between the three methods on synthetically created networks. We show that belief propagation shows much better performance when compared to na\"{\i}ve mean field and spectral approaches. This applies to accuracy, computational efficiency and the tendency to overfit the data.Comment: 8 pages, 5 figures AIGM1

arXiv.org e-Print Archive

Inference of hidden structures in complex physical systems by multi-scale clustering

Author: A Cardillo
A Lancichinetti
A Montanari
AA Abin
Andrea Montanari
C Dasgupta
CA Angell
CL Henley
D Hu
DA Keen
Dandan Hu
DJ Sordelet
DS Bassett
F Cerina
G Bianconi
G Petri
G Tarjus
Greg Ver Steeg
H Dandan
H Dandan
HW Sheng
J Reichardt
J Reichardt
J Saida
J Villain
J-P Bouchaud
J. Dana. Honeycutt
JL Finney
JM Kumpula
L Berthier
L Wang
M Meil
M Mezard
M Mitchell
M Mosayebi
M Rosvall
Manlio De Domenico
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
O Melchert
P Holme
P Ronhovde
P Ronhovde
P Ronhovde
P Ronhovde
P Ronhovde
PG Wolynes
PJ Steinhardt
R Monasson
Richard K. Darst
RK Darst
RL McGreevy
S Fortunato
S Fortunato
S Karmakar
S Wiseman
S. Kirkpatrick
SY Wang
T Nakamura
TR Kirkpatrick
V Gudkov
V Lubchenko
VD Blondel
W Kob
WH Zachariasen
Z Nussinov
Z Nussinov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/01/2016
Field of study

We survey the application of a relatively new branch of statistical physics--"community detection"-- to data mining. In particular, we focus on the diagnosis of materials and automated image segmentation. Community detection describes the quest of partitioning a complex system involving many elements into optimally decoupled subsets or communities of such elements. We review a multiresolution variant which is used to ascertain structures at different spatial and temporal scales. Significant patterns are obtained by examining the correlations between different independent solvers. Similar to other combinatorial optimization problems in the NP complexity class, community detection exhibits several phases. Typically, illuminating orders are revealed by choosing parameters that lead to extremal information theory correlations.Comment: 25 pages, 16 Figures; a review of earlier work

arXiv.org e-Print Archive

Evaluating Overfit and Underfit in Models of Network Community Structure

Author: Clauset Aaron
Ghasemian Amir
Hosseinmardi Homa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

arXiv.org e-Print Archive

Spectral redemption: clustering sparse networks

Author: Krzakala Florent
Moore Cristopher
Mossel Elchanan
Neeman Joe
Sly Allan
Zdeborová Lenka
Zhang Pan
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 23/08/2013
Field of study

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.Comment: 11 pages, 6 figures. Clarified to what extent our claims are rigorous, and to what extent they are conjectures; also added an interpretation of the eigenvectors of the 2n-dimensional version of the non-backtracking matri

arXiv.org e-Print Archive

Latent tree models

Author: Zwiernik Piotr
Publication venue
Publication date: 02/08/2017
Field of study

Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. This article offers a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned

arXiv.org e-Print Archive

A survey of statistical network models

Author: Alice X. Zheng
Anna Goldenberg
Citable Link
Edoardo M. Airoldi
Stephen E. Fienberg
Publication venue
Publication date: 01/01/2009
Field of study

Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

arXiv.org e-Print Archive

CiteSeerX