57,903 research outputs found

    Phylogenetic information complexity: Is testing a tree easier than finding it?

    Get PDF
    Phylogenetic trees describe the evolutionary history of a group of present-day species from a common ancestor. These trees are typically reconstructed from aligned DNA sequence data. In this paper we analytically address the following question: is the amount of sequence data required to accurately reconstruct a tree significantly more than the amount required to test whether or not a candidate tree was the `true' tree? By `significantly', we mean that the two quantities behave the same way as a function of the number of species being considered. We prove that, for a certain type of model, the amount of information required is not significantly different; while for another type of model, the information required to test a tree is independent of the number of leaves, while that required to reconstruct it grows with this number. Our results combine probabilistic and combinatorial arguments.Comment: 15 pages, 3 figure

    Monte Carlo Approaches to Parameterized Poker Squares

    Full text link
    The paper summarized a variety of Monte Carlo approaches employed in the top three performing entries to the Parameterized Poker Squares NSG Challenge competition. In all cases AI players benefited from real-time machine learning and various Monte Carlo game-tree search techniques

    Parameterizing Random Test Data According to Equivalence Classes

    Get PDF
    We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. Random testing is one solution, but may have limited effectiveness in cases in which a reliable test oracle does not exist, as is the case of the machine learning applications of interest. To address this problem, we have developed an approach to creating data sets called "parameterized random data generation"Â. Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications

    Approximating Likelihood Ratios with Calibrated Discriminative Classifiers

    Full text link
    In many fields of science, generalized likelihood ratio tests are established tools for statistical inference. At the same time, it has become increasingly common that a simulator (or generative model) is used to describe complex processes that tie parameters θ\theta of an underlying theory and measurement apparatus to high-dimensional observations x∈Rp\mathbf{x}\in \mathbb{R}^p. However, simulator often do not provide a way to evaluate the likelihood function for a given observation x\mathbf{x}, which motivates a new class of likelihood-free inference algorithms. In this paper, we show that likelihood ratios are invariant under a specific class of dimensionality reduction maps Rp↦R\mathbb{R}^p \mapsto \mathbb{R}. As a direct consequence, we show that discriminative classifiers can be used to approximate the generalized likelihood ratio statistic when only a generative model for the data is available. This leads to a new machine learning-based approach to likelihood-free inference that is complementary to Approximate Bayesian Computation, and which does not require a prior on the model parameters. Experimental results on artificial problems with known exact likelihoods illustrate the potential of the proposed method.Comment: 35 pages, 5 figure
    • …
    corecore