159,219 research outputs found
Syntactic Topic Models
The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency. Each
element in the dependency parse tree also has a distribution over the topics of
its children, as in latent-state syntax models, which provides the syntactic
consistency. These distributions are convolved so that the topic of each word
is likely under both its document and syntactic context. We derive a fast
posterior inference algorithm based on variational methods. We report
qualitative and quantitative studies on both synthetic data and hand-parsed
documents. We show that the STM is a more predictive model of language than
current models based only on syntax or only on topics
Identification of probabilistic cellular automata
The identification of probabilistic cellular automata (PCA) is studied using a new two stage neighborhood detection algorithm. It is shown that a binary probabilistic cellular automaton (BPCA) can be described by an integer-parameterized polynomial corrupted by noise. Searching for the correct neighborhood of a BPCA is then equivalent to selecting the correct terms which constitute the polynomial model of the BPCA, from a large initial term set. It is proved that the contribution values for the correct terms can be calculated independently of the contribution values for the noise terms. This allows the neighborhood detection technique developed for deterministic rules in to be applied with a larger cutoff value to discard the majority of spurious terms and to produce an initial presearch for the BPCA neighborhood. A multiobjective genetic algorithm (GA) search with integer constraints is then evolved to refine the reduced neighborhood and to identify the polynomial rule which is equivalent to the probabilistic rule with the largest probability. A probability table representing the BPCA can then be determined based on the identified neighborhood and the deterministic rule. The new algorithm is tested over a large set of one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) BPCA rules. Simulation results demonstrate the efficiency of the new method
A stochastic algorithm for probabilistic independent component analysis
The decomposition of a sample of images on a relevant subspace is a recurrent
problem in many different fields from Computer Vision to medical image
analysis. We propose in this paper a new learning principle and implementation
of the generative decomposition model generally known as noisy ICA (for
independent component analysis) based on the SAEM algorithm, which is a
versatile stochastic approximation of the standard EM algorithm. We demonstrate
the applicability of the method on a large range of decomposition models and
illustrate the developments with experimental results on various data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS499 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …