2,434 research outputs found
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes
We define a family of probability distributions for random count matrices
with a potentially unbounded number of rows and columns. The three
distributions we consider are derived from the gamma-Poisson, gamma-negative
binomial, and beta-negative binomial processes. Because the models lead to
closed-form Gibbs sampling update equations, they are natural candidates for
nonparametric Bayesian priors over count matrices. A key aspect of our analysis
is the recognition that, although the random count matrices within the family
are defined by a row-wise construction, their columns can be shown to be i.i.d.
This fact is used to derive explicit formulas for drawing all the columns at
once. Moreover, by analyzing these matrices' combinatorial structure, we
describe how to sequentially construct a column-i.i.d. random count matrix one
row at a time, and derive the predictive distribution of a new row count vector
with previously unseen features. We describe the similarities and differences
between the three priors, and argue that the greater flexibility of the gamma-
and beta- negative binomial processes, especially their ability to model
over-dispersed, heavy-tailed count data, makes these well suited to a wide
variety of real-world applications. As an example of our framework, we
construct a naive-Bayes text classifier to categorize a count vector to one of
several existing random count matrices of different categories. The classifier
supports an unbounded number of features, and unlike most existing methods, it
does not require a predefined finite vocabulary to be shared by all the
categories, and needs neither feature selection nor parameter tuning. Both the
gamma- and beta- negative binomial processes are shown to significantly
outperform the gamma-Poisson process for document categorization, with
comparable performance to other state-of-the-art supervised text classification
algorithms.Comment: To appear in Journal of the American Statistical Association (Theory
and Methods). 31 pages + 11 page supplement, 5 figure
Document Informed Neural Autoregressive Topic Models with Distributional Prior
We address two challenges in topic models: (1) Context information around
words helps in determining their actual meaning, e.g., "networks" used in the
contexts "artificial neural networks" vs. "biological neuron networks".
Generative topic models infer topic-word distributions, taking no or only
little context into account. Here, we extend a neural autoregressive topic
model to exploit the full context information around words in a document in a
language modeling fashion. The proposed model is named as iDocNADE. (2) Due to
the small number of word occurrences (i.e., lack of context) in short text and
data sparsity in a corpus of few documents, the application of topic models is
challenging on such texts. Therefore, we propose a simple and efficient way of
incorporating external knowledge into neural autoregressive topic models: we
use embeddings as a distributional prior. The proposed variants are named as
DocNADEe and iDocNADEe.
We present novel neural autoregressive topic model variants that consistently
outperform state-of-the-art generative topic models in terms of generalization,
interpretability (topic coherence) and applicability (retrieval and
classification) over 7 long-text and 8 short-text datasets from diverse
domains.Comment: AAAI2019. arXiv admin note: substantial text overlap with
arXiv:1808.0379
Bivariate Beta-LSTM
Long Short-Term Memory (LSTM) infers the long term dependency through a cell
state maintained by the input and the forget gate structures, which models a
gate output as a value in [0,1] through a sigmoid function. However, due to the
graduality of the sigmoid function, the sigmoid gate is not flexible in
representing multi-modality or skewness. Besides, the previous models lack
modeling on the correlation between the gates, which would be a new method to
adopt inductive bias for a relationship between previous and current input.
This paper proposes a new gate structure with the bivariate Beta distribution.
The proposed gate structure enables probabilistic modeling on the gates within
the LSTM cell so that the modelers can customize the cell state flow with
priors and distributions. Moreover, we theoretically show the higher upper
bound of the gradient compared to the sigmoid function, and we empirically
observed that the bivariate Beta distribution gate structure provides higher
gradient values in training. We demonstrate the effectiveness of bivariate Beta
gate structure on the sentence classification, image classification, polyphonic
music modeling, and image caption generation.Comment: AAAI 202
- …