740 research outputs found
Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical Study
Many automated system analysis techniques (e.g., model checking, model-based
testing) rely on first obtaining a model of the system under analysis. System
modeling is often done manually, which is often considered as a hindrance to
adopt model-based system analysis and development techniques. To overcome this
problem, researchers have proposed to automatically "learn" models based on
sample system executions and shown that the learned models can be useful
sometimes. There are however many questions to be answered. For instance, how
much shall we generalize from the observed samples and how fast would learning
converge? Or, would the analysis result based on the learned model be more
accurate than the estimation we could have obtained by sampling many system
executions within the same amount of time? In this work, we investigate
existing algorithms for learning probabilistic models for model checking,
propose an evolution-based approach for better controlling the degree of
generalization and conduct an empirical study in order to answer the questions.
One of our findings is that the effectiveness of learning may sometimes be
limited.Comment: 15 pages, plus 2 reference pages, accepted by FASE 2017 in ETAP
A general method for the statistical evaluation of typological distributions
The distribution of linguistic structures in the world is the joint product of universal principles, inheritance from ancestor languages, language contact, social structures, and random fluctuation. This paper proposes a method for evaluating the relative significance of each factor — and in particular, of universal principles — via regression modeling: statistical evidence for universal principles is found if the odds for families to have skewed responses (e.g. all or most members have postnominal relative clauses) as opposed to having an opposite response skewing or no skewing at all, is significantly higher for some condition (e.g. VO order) than for another condition, independently of other factors
Correction of Uniformly Noisy Distributions to Improve Probabilistic Grammatical Inference Algorithms
International audienceIn this paper, we aim at correcting distributions of noisy samples in order to improve the inference of probabilistic automata. Rather than definitively removing corrupted examples before the learning process, we propose a technique, based on statisticalestimates and linear regression, for correcting the probabilistic prefix tree automaton (PPTA). It requires a human expertise to correct only a small sample of data, selected in order to estimate the noise level. This statistical information permits us to automatically correct the whole PPTA and then to infer better models from a generalization point of view. After a theoretical analysis of the noise impact, we present a large experimental study on several datasets
Fragment Grammars: Exploring Computation and Reuse in Language
Language relies on a division of labor between stored units and structure building operations which combine the stored units into larger structures. This division of labor leads to a tradeoff: more structure-building means less need to store while more storage means less need to compute structure. We develop a hierarchical Bayesian model called fragment grammar to explore the optimum balance between structure-building and reuse. The model is developed in the context of stochastic functional programming (SFP) and in particular using a probabilistic variant of Lisp known as the Church programming language (Goodman, Mansinghka, Roy, Bonawitz, & Tenenbaum, 2008). We show how to formalize several probabilistic models of language structure using Church, and how fragment grammar generalizes one of them---adaptor grammars (Johnson, Griffiths, & Goldwater, 2007). We conclude with experimental data with adults and preliminary evaluations of the model on natural language corpus data
Null Models For Cultural And Social Evolution
Analogies between biological and cultural evolution date back to Darwin, yet the analogies have remained loose. Neutral evolution, known to be important in biology, has been
proposed as a null model for cultural change, but has not developed into tests for selection on cultural features. Using inference in timeseries of alternative word forms and
grammatical constructions, I demonstrate a cultural analog of natural selection on a background of netural evolution. Social evolution, on the other hand, implies selection in a
social environment and therefore cannot be described with a neutral model. I propose a
model of pure frequency-dependent selection as a generic null model for social evolution,
and use the model to illustrate diverse effects of social selection. I derive a non-linear
form of frequency-dependent selection from a mechanistic model of mate choice and show
unintuitive consequences for evolutionary dynamics. I infer complex forms of frequency
dependent selection—including positive and negative frequency-dependent selection at different frequencies—from data regarding the copying of baby names, the fashions of dog
breeds, and the use of rare languages, and discuss the implications for cultural diversity
Learning Semantic Correspondences in Technical Documentation
We consider the problem of translating high-level textual descriptions to
formal representations in technical documentation as part of an effort to model
the meaning of such documentation. We focus specifically on the problem of
learning translational correspondences between text descriptions and grounded
representations in the target documentation, such as formal representation of
functions or code templates. Our approach exploits the parallel nature of such
documentation, or the tight coupling between high-level text and the low-level
representations we aim to learn. Data is collected by mining technical
documents for such parallel text-representation pairs, which we use to train a
simple semantic parsing model. We report new baseline results on sixteen novel
datasets, including the standard library documentation for nine popular
programming languages across seven natural languages, and a small collection of
Unix utility manuals.Comment: accepted to ACL-201
Scalable Text Mining with Sparse Generative Models
The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods.
This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines.
The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places
- …