2,820 research outputs found
Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process.
Incorporating the side information of text corpus, i.e., authors, time
stamps, and emotional tags, into the traditional text mining models has gained
significant interests in the area of information retrieval, statistical natural
language processing, and machine learning. One branch of these works is the
so-called Author Topic Model (ATM), which incorporates the authors's interests
as side information into the classical topic model. However, the existing ATM
needs to predefine the number of topics, which is difficult and inappropriate
in many real-world settings. In this paper, we propose an Infinite Author Topic
(IAT) model to resolve this issue. Instead of assigning a discrete probability
on fixed number of topics, we use a stochastic process to determine the number
of topics from the data itself. To be specific, we extend a gamma-negative
binomial process to three levels in order to capture the
author-document-keyword hierarchical structure. Furthermore, each document is
assigned a mixed gamma process that accounts for the multi-author's
contribution towards this document. An efficient Gibbs sampling inference
algorithm with each conditional distribution being closed-form is developed for
the IAT model. Experiments on several real-world datasets show the capabilities
of our IAT model to learn the hidden topics, authors' interests on these topics
and the number of topics simultaneously.Comment: 10 pages, 5 figures, submitted to KDD conferenc
A Bayesian nonparametric model for multi-label learning
© 2017, The Author(s). Multi-label learning has become a significant learning paradigm in the past few years due to its broad application scenarios and the ever-increasing number of techniques developed by researchers in this area. Among existing state-of-the-art works, generative statistical models are characterized by their good generalization ability and robustness on large number of labels through learning a low-dimensional label embedding. However, one issue of this branch of models is that the number of dimensions needs to be fixed in advance, which is difficult and inappropriate in many real-world settings. In this paper, we propose a Bayesian nonparametric model to resolve this issue. More specifically, we extend a Gamma-negative binomial process to three levels in order to capture the label-instance-feature structure. Furthermore, a mixing strategy for Gamma processes is designed to account for the multiple labels of an instance. The mixed process also leads to a difficulty in model inference, so an efficient Gibbs sampling inference algorithm is then developed to resolve this difficulty. Experiments on several real-world datasets show the performance of the proposed model on multi-label learning tasks, comparing with three state-of-the-art models from the literature
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
Compound Markov counting processes and their applications to modeling infinitesimally over-dispersed systems
We propose an infinitesimal dispersion index for Markov counting processes.
We show that, under standard moment existence conditions, a process is
infinitesimally (over-) equi-dispersed if, and only if, it is simple
(compound), i.e. it increases in jumps of one (or more) unit(s), even though
infinitesimally equi-dispersed processes might be under-, equi- or
over-dispersed using previously studied indices. Compound processes arise, for
example, when introducing continuous-time white noise to the rates of simple
processes resulting in Levy-driven SDEs. We construct multivariate
infinitesimally over-dispersed compartment models and queuing networks,
suitable for applications where moment constraints inherent to simple processes
do not hold.Comment: 26 page
- …