17,170 research outputs found
Nonparametric Bayesian models for learning network coupling relationships
University of Technology, Sydney. Faculty of Engineering and Information Technology.As the traditional machine learning setting assumes that the data are identically and independently distributed (i.i.d), this is quite like a perfect conditioned vacuum and seldom a real case in practical applications. Thus, the non-i.i.d learning (Cao, Ou, Yu & Wei 2010)(Cao, Ou & Yu 2012)(Cao 2014) has emerged as a powerful tool in describing the fundamental phenomena in the real world, as more factors to be well catered in this modelling. One critical factor in the non-i.i.d. learning is the relations among the data, ranging from the feature information, node partitioning to the correlation of the outcome, which is referred to as the coupling relation in the non-i.i.d. learning. In our work, we aim at uncovering this coupling relation with the nonparametric Bayesian relational models, that is, the data points in our work are supposed to be coupled with each other, and it is this coupling relation we are interested in for further investigation. The coupling relation is widely seen and motivated in real world applications, for example, the hidden structure learning in social networks for link prediction and structure understanding, the fraud detection in the transactional stock market, the protein interaction modelling in biology.
In this thesis, we are particularly interested in the learning and inferencing on the relational data, which is to further discover the coupling relation between the corresponding points. For the detail modelling perspective, we have focused on the framework of mixed-membership stochastic blockmodel, in which membership indicator and mixed-membership distribution are noted to represent the nodes’ belonging community for one relation and the histogram of all the belonging communities for one node. More specifically, we are trying to model the coupling relation through three different aspects: 1) the mixed-membership distributions’ coupling relation across the time. In this work, the coupling relation is reflected in the sticky phenomenon between the mixed-membership distributions in two consecutive time; 2) the membership indicators’ coupling relation, in which the Copula function is utilized to depict the coupling relation; 3) the node information and mixed-membership distribution’s coupling relation. This is achieved by the new proposal transform for the node information’s integration. As these three aspects describe the critical parts of the nodes’ interaction with the communities, we are hoping the complex hidden structures can thus be well studied. In all of the above extensions, we set the number of the communities in a nonparametric Bayesian prior (mainly Hierarchical Dirichlet Process), instead of fixing it as in the previous classical models. In such a way, the complexity of our model can grow along with the data size. That is to say, while we have more data, our model can have a larger amount of communities to account for them. This appealing property enables our models to fit the data better. Moreover, the nice formalization of the Hierarchical Dirichlet Process facilitates us to some benefits, such as the conjugate prior. Thus, this nonparametric Bayesian prior has introduced new elements to the coupling relations’ learning.
Under this varying backgrounds and scenarios, we have shown our proposed models and frameworks for learning the coupling relations are evidenced to outperform the state-of-the-art methods via literature explanation and empirical results. The outcomes are sequentially accepted by top journals. Therefore, the nonparametric Bayesian models in learning the coupling relations presents high research value and would still be attractive opportunities for further exploration and exploit
The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a
Bayesian nonparametric prior for mixed membership models. DILN is a
generalization of the hierarchical Dirichlet process (HDP) that models
correlation structure between the weights of the atoms at the group level. We
derive a representation of DILN as a normalized collection of gamma-distributed
random variables, and study its statistical properties. We consider
applications to topic modeling and derive a variational inference algorithm for
approximate posterior inference. We study the empirical performance of the DILN
topic model on four corpora, comparing performance with the HDP and the
correlated topic model (CTM). To deal with large-scale data sets, we also
develop an online inference algorithm for DILN and compare with online HDP and
online LDA on the Nature magazine, which contains approximately 350,000
articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of
this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US
STATISTICAL METHODS FOR MIXED FREQUENCY DATA SAMPLING MODELS
The MIDAS models are developed to handle different sampling frequencies in one regression model, preserving information in the higher sampling frequency. Time averaging has been the traditional parametric approach to handle mixed sampling frequencies. However, it ignores information potentially embedded in high frequency. MIDAS regression models provide a concise way to utilize additional information in HF variables. While a parametric MIDAS model provides a parsimonious way to summarize information in HF data, nonparametric models would maintain more flexibility at the expense of the computational complexity. Moreover, one parametric form may not necessarily be appropriate for all cross-sectional subjects. This thesis proposes two new methods designed for mixed frequency data.
First part of this thesis proposes a specification test to choose between time averaging and MIDAS models. If time averaging is enough for given mixed frequency data, there is no need to use complicated nonlinear mixed frequency models. In such case, a specification test that justifies the use of the the simplest model, time averaging, is useful. We propose a specification test revising from a DWH type test. In particular, a set of instrumental variables is proposed and theoretically validated when the frequency ratio is large. As a result, our method tends to be more powerful than existing methods, as reconfirmed through the simulations.
The second part of the thesis provides a new way to identify groups in a panel data setting involving mixed frequencies. A flexible MIDAS model is proposed using a nonparametric approach. This nonparametric MIDAS model is further extended to a panel setting using a penalized regression idea. The estimated parameters can then be clustered using traditional clustering methods. The proposed clustering algorithm delivers reasonable clustering results both in theory and in simulations, without requiring prior knowledge about the true group membership information. An empirical application is presented to examine the panel MIDAS model
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
The beta-negative binomial process (BNBP), an integer-valued stochastic
process, is employed to partition a count vector into a latent random count
matrix. As the marginal probability distribution of the BNBP that governs the
exchangeable random partitions of grouped data has not yet been developed,
current inference for the BNBP has to truncate the number of atoms of the beta
process. This paper introduces an exchangeable partition probability function
to explicitly describe how the BNBP clusters the data points of each group into
a random number of exchangeable partitions, which are shared across all the
groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a
novel nonparametric Bayesian topic model that is distinct from existing ones,
with simple implementation, fast convergence, good mixing, and state-of-the-art
predictive performance.Comment: in Neural Information Processing Systems (NIPS) 2014. 9 pages + 3
page appendi
- …