159,219 research outputs found

    Syntactic Topic Models

    Full text link
    The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics

    Identification of probabilistic cellular automata

    Get PDF
    The identification of probabilistic cellular automata (PCA) is studied using a new two stage neighborhood detection algorithm. It is shown that a binary probabilistic cellular automaton (BPCA) can be described by an integer-parameterized polynomial corrupted by noise. Searching for the correct neighborhood of a BPCA is then equivalent to selecting the correct terms which constitute the polynomial model of the BPCA, from a large initial term set. It is proved that the contribution values for the correct terms can be calculated independently of the contribution values for the noise terms. This allows the neighborhood detection technique developed for deterministic rules in to be applied with a larger cutoff value to discard the majority of spurious terms and to produce an initial presearch for the BPCA neighborhood. A multiobjective genetic algorithm (GA) search with integer constraints is then evolved to refine the reduced neighborhood and to identify the polynomial rule which is equivalent to the probabilistic rule with the largest probability. A probability table representing the BPCA can then be determined based on the identified neighborhood and the deterministic rule. The new algorithm is tested over a large set of one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) BPCA rules. Simulation results demonstrate the efficiency of the new method

    A stochastic algorithm for probabilistic independent component analysis

    Full text link
    The decomposition of a sample of images on a relevant subspace is a recurrent problem in many different fields from Computer Vision to medical image analysis. We propose in this paper a new learning principle and implementation of the generative decomposition model generally known as noisy ICA (for independent component analysis) based on the SAEM algorithm, which is a versatile stochastic approximation of the standard EM algorithm. We demonstrate the applicability of the method on a large range of decomposition models and illustrate the developments with experimental results on various data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS499 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore