17,170 research outputs found

    Nonparametric Bayesian models for learning network coupling relationships

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.As the traditional machine learning setting assumes that the data are identically and independently distributed (i.i.d), this is quite like a perfect conditioned vacuum and seldom a real case in practical applications. Thus, the non-i.i.d learning (Cao, Ou, Yu & Wei 2010)(Cao, Ou & Yu 2012)(Cao 2014) has emerged as a powerful tool in describing the fundamental phenomena in the real world, as more factors to be well catered in this modelling. One critical factor in the non-i.i.d. learning is the relations among the data, ranging from the feature information, node partitioning to the correlation of the outcome, which is referred to as the coupling relation in the non-i.i.d. learning. In our work, we aim at uncovering this coupling relation with the nonparametric Bayesian relational models, that is, the data points in our work are supposed to be coupled with each other, and it is this coupling relation we are interested in for further investigation. The coupling relation is widely seen and motivated in real world applications, for example, the hidden structure learning in social networks for link prediction and structure understanding, the fraud detection in the transactional stock market, the protein interaction modelling in biology. In this thesis, we are particularly interested in the learning and inferencing on the relational data, which is to further discover the coupling relation between the corresponding points. For the detail modelling perspective, we have focused on the framework of mixed-membership stochastic blockmodel, in which membership indicator and mixed-membership distribution are noted to represent the nodes’ belonging community for one relation and the histogram of all the belonging communities for one node. More specifically, we are trying to model the coupling relation through three different aspects: 1) the mixed-membership distributions’ coupling relation across the time. In this work, the coupling relation is reflected in the sticky phenomenon between the mixed-membership distributions in two consecutive time; 2) the membership indicators’ coupling relation, in which the Copula function is utilized to depict the coupling relation; 3) the node information and mixed-membership distribution’s coupling relation. This is achieved by the new proposal transform for the node information’s integration. As these three aspects describe the critical parts of the nodes’ interaction with the communities, we are hoping the complex hidden structures can thus be well studied. In all of the above extensions, we set the number of the communities in a nonparametric Bayesian prior (mainly Hierarchical Dirichlet Process), instead of fixing it as in the previous classical models. In such a way, the complexity of our model can grow along with the data size. That is to say, while we have more data, our model can have a larger amount of communities to account for them. This appealing property enables our models to fit the data better. Moreover, the nice formalization of the Hierarchical Dirichlet Process facilitates us to some benefits, such as the conjugate prior. Thus, this nonparametric Bayesian prior has introduced new elements to the coupling relations’ learning. Under this varying backgrounds and scenarios, we have shown our proposed models and frameworks for learning the coupling relations are evidenced to outperform the state-of-the-art methods via literature explanation and empirical results. The outcomes are sequentially accepted by top journals. Therefore, the nonparametric Bayesian models in learning the coupling relations presents high research value and would still be attractive opportunities for further exploration and exploit

    The Discrete Infinite Logistic Normal Distribution

    Full text link
    We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US

    STATISTICAL METHODS FOR MIXED FREQUENCY DATA SAMPLING MODELS

    Get PDF
    The MIDAS models are developed to handle different sampling frequencies in one regression model, preserving information in the higher sampling frequency. Time averaging has been the traditional parametric approach to handle mixed sampling frequencies. However, it ignores information potentially embedded in high frequency. MIDAS regression models provide a concise way to utilize additional information in HF variables. While a parametric MIDAS model provides a parsimonious way to summarize information in HF data, nonparametric models would maintain more flexibility at the expense of the computational complexity. Moreover, one parametric form may not necessarily be appropriate for all cross-sectional subjects. This thesis proposes two new methods designed for mixed frequency data. First part of this thesis proposes a specification test to choose between time averaging and MIDAS models. If time averaging is enough for given mixed frequency data, there is no need to use complicated nonlinear mixed frequency models. In such case, a specification test that justifies the use of the the simplest model, time averaging, is useful. We propose a specification test revising from a DWH type test. In particular, a set of instrumental variables is proposed and theoretically validated when the frequency ratio is large. As a result, our method tends to be more powerful than existing methods, as reconfirmed through the simulations. The second part of the thesis provides a new way to identify groups in a panel data setting involving mixed frequencies. A flexible MIDAS model is proposed using a nonparametric approach. This nonparametric MIDAS model is further extended to a panel setting using a penalized regression idea. The estimated parameters can then be clustered using traditional clustering methods. The proposed clustering algorithm delivers reasonable clustering results both in theory and in simulations, without requiring prior knowledge about the true group membership information. An empirical application is presented to examine the panel MIDAS model

    Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

    Full text link
    The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.Comment: in Neural Information Processing Systems (NIPS) 2014. 9 pages + 3 page appendi
    • …
    corecore