712,372 research outputs found

    A network approach to topic models

    Full text link
    One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, e.g. a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. Here we obtain a fresh view on the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. This is achieved by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods -- using a stochastic block model (SBM) with non-parametric priors -- we obtain a more versatile and principled framework for topic modeling (e.g., it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. More importantly, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.Comment: 22 pages, 10 figures, code available at https://topsbm.github.io

    Impact of environmental inputs on reverse-engineering approach to network structures

    Get PDF
    Background: Uncovering complex network structures from a biological system is one of the main topic in system biology. The network structures can be inferred by the dynamical Bayesian network or Granger causality, but neither techniques have seriously taken into account the impact of environmental inputs. Results: With considerations of natural rhythmic dynamics of biological data, we propose a system biology approach to reveal the impact of environmental inputs on network structures. We first represent the environmental inputs by a harmonic oscillator and combine them with Granger causality to identify environmental inputs and then uncover the causal network structures. We also generalize it to multiple harmonic oscillators to represent various exogenous influences. This system approach is extensively tested with toy models and successfully applied to a real biological network of microarray data of the flowering genes of the model plant Arabidopsis Thaliana. The aim is to identify those genes that are directly affected by the presence of the sunlight and uncover the interactive network structures associating with flowering metabolism. Conclusion: We demonstrate that environmental inputs are crucial for correctly inferring network structures. Harmonic causal method is proved to be a powerful technique to detect environment inputs and uncover network structures, especially when the biological data exhibit periodic oscillations

    Principles of Modeling in Information Communication Systems and Networks

    Get PDF
    The authors present in this entry chapter the basic rubrics of models, modeling, and simulation, an un- derstanding of which is indispensible for the comprehension of subsequent chapters of this text on the all-important topic of modeling and simulation in Information Communication Systems and Networks (ICSN). A good example is the case of analyzing simulation results of traffic models as a tool for investigat- ing network behavioral pattarns as it affects the transmitted content (Atayero, et al., 2013). The various classifications of models are discussed, for example classification based on the degree of semblance to the original object (i.e. isomorphism). Various fundamental terminologies without the knowledge of which the concepts and models and modeling cannot be properly understood are explained. Model stuctures are highlighted and discussed. The methodological basis of formalizing complex system structures is presented. The concept of componential approach to modeling is presented and the necessary stages of mathematical model formation are examined and explained. The chapter concludes with a presentation of the concept of simulation vis-Γ -vis information communication systems and networks
    • …
    corecore