740 research outputs found

    Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical Study

    Get PDF
    Many automated system analysis techniques (e.g., model checking, model-based testing) rely on first obtaining a model of the system under analysis. System modeling is often done manually, which is often considered as a hindrance to adopt model-based system analysis and development techniques. To overcome this problem, researchers have proposed to automatically "learn" models based on sample system executions and shown that the learned models can be useful sometimes. There are however many questions to be answered. For instance, how much shall we generalize from the observed samples and how fast would learning converge? Or, would the analysis result based on the learned model be more accurate than the estimation we could have obtained by sampling many system executions within the same amount of time? In this work, we investigate existing algorithms for learning probabilistic models for model checking, propose an evolution-based approach for better controlling the degree of generalization and conduct an empirical study in order to answer the questions. One of our findings is that the effectiveness of learning may sometimes be limited.Comment: 15 pages, plus 2 reference pages, accepted by FASE 2017 in ETAP

    A general method for the statistical evaluation of typological distributions

    Get PDF
    The distribution of linguistic structures in the world is the joint product of universal principles, inheritance from ancestor languages, language contact, social structures, and random fluctuation. This paper proposes a method for evaluating the relative significance of each factor — and in particular, of universal principles — via regression modeling: statistical evidence for universal principles is found if the odds for families to have skewed responses (e.g. all or most members have postnominal relative clauses) as opposed to having an opposite response skewing or no skewing at all, is significantly higher for some condition (e.g. VO order) than for another condition, independently of other factors

    Correction of Uniformly Noisy Distributions to Improve Probabilistic Grammatical Inference Algorithms

    No full text
    International audienceIn this paper, we aim at correcting distributions of noisy samples in order to improve the inference of probabilistic automata. Rather than definitively removing corrupted examples before the learning process, we propose a technique, based on statisticalestimates and linear regression, for correcting the probabilistic prefix tree automaton (PPTA). It requires a human expertise to correct only a small sample of data, selected in order to estimate the noise level. This statistical information permits us to automatically correct the whole PPTA and then to infer better models from a generalization point of view. After a theoretical analysis of the noise impact, we present a large experimental study on several datasets

    Fragment Grammars: Exploring Computation and Reuse in Language

    Get PDF
    Language relies on a division of labor between stored units and structure building operations which combine the stored units into larger structures. This division of labor leads to a tradeoff: more structure-building means less need to store while more storage means less need to compute structure. We develop a hierarchical Bayesian model called fragment grammar to explore the optimum balance between structure-building and reuse. The model is developed in the context of stochastic functional programming (SFP) and in particular using a probabilistic variant of Lisp known as the Church programming language (Goodman, Mansinghka, Roy, Bonawitz, & Tenenbaum, 2008). We show how to formalize several probabilistic models of language structure using Church, and how fragment grammar generalizes one of them---adaptor grammars (Johnson, Griffiths, & Goldwater, 2007). We conclude with experimental data with adults and preliminary evaluations of the model on natural language corpus data

    Null Models For Cultural And Social Evolution

    Get PDF
    Analogies between biological and cultural evolution date back to Darwin, yet the analogies have remained loose. Neutral evolution, known to be important in biology, has been proposed as a null model for cultural change, but has not developed into tests for selection on cultural features. Using inference in timeseries of alternative word forms and grammatical constructions, I demonstrate a cultural analog of natural selection on a background of netural evolution. Social evolution, on the other hand, implies selection in a social environment and therefore cannot be described with a neutral model. I propose a model of pure frequency-dependent selection as a generic null model for social evolution, and use the model to illustrate diverse effects of social selection. I derive a non-linear form of frequency-dependent selection from a mechanistic model of mate choice and show unintuitive consequences for evolutionary dynamics. I infer complex forms of frequency dependent selection—including positive and negative frequency-dependent selection at different frequencies—from data regarding the copying of baby names, the fashions of dog breeds, and the use of rare languages, and discuss the implications for cultural diversity

    Learning Semantic Correspondences in Technical Documentation

    Full text link
    We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals.Comment: accepted to ACL-201

    Scalable Text Mining with Sparse Generative Models

    Get PDF
    The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places
    corecore