Search CORE

1,108 research outputs found

Non-parametric Bayesian modeling of complex networks

Author: Mørup Morten
Schmidt Mikkel N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Modeling structure in complex networks using Bayesian non-parametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This paper provides a gentle introduction to non-parametric Bayesian modeling of complex networks: Using an infinite mixture model as running example we go through the steps of deriving the model as an infinite limit of a finite parametric model, inferring the model parameters by Markov chain Monte Carlo, and checking the model's fit and predictive performance. We explain how advanced non-parametric models for complex networks can be derived and point out relevant literature

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

A network approach to topic models

Author: Altmann Eduardo G.
Gerlach Martin
Peixoto Tiago P.
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 04/07/2018
Field of study

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e.g., it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. More importantly, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io

arXiv.org e-Print Archive

MPG.PuRe

Approximating predictive probabilities of Gibbs-type priors

Author: Arbel Julyan
Favaro Stefano
Publication venue
Publication date: 24/03/2020
Field of study

Gibbs-type random probability measures, or Gibbs-type priors, are arguably the most "natural" generalization of the celebrated Dirichlet prior. Among them the two parameter Poisson-Dirichlet prior certainly stands out for the mathematical tractability and interpretability of its predictive probabilities, which made it the natural candidate in several applications. Given a sample of size

n

, in this paper we show that the predictive probabilities of any Gibbs-type prior admit a large

n

approximation, with an error term vanishing as

o(1/n)

, which maintains the same desirable features as the predictive probabilities of the two parameter Poisson-Dirichlet prior.Comment: 22 pages, 6 figures. Added posterior simulation study, corrected typo

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Institutional Research Information System University of Turin