1,824 research outputs found
Syntactic Topic Models
The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency. Each
element in the dependency parse tree also has a distribution over the topics of
its children, as in latent-state syntax models, which provides the syntactic
consistency. These distributions are convolved so that the topic of each word
is likely under both its document and syntactic context. We derive a fast
posterior inference algorithm based on variational methods. We report
qualitative and quantitative studies on both synthetic data and hand-parsed
documents. We show that the STM is a more predictive model of language than
current models based only on syntax or only on topics
Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks
The stochastic block model (SBM) is a flexible probabilistic tool that can be
used to model interactions between clusters of nodes in a network. However, it
does not account for interactions of time varying intensity between clusters.
The extension of the SBM developed in this paper addresses this shortcoming
through a temporal partition: assuming interactions between nodes are recorded
on fixed-length time intervals, the inference procedure associated with the
model we propose allows to cluster simultaneously the nodes of the network and
the time intervals. The number of clusters of nodes and of time intervals, as
well as the memberships to clusters, are obtained by maximizing an exact
integrated complete-data likelihood, relying on a greedy search approach.
Experiments on simulated and real data are carried out in order to assess the
proposed methodology
- …