21,717 research outputs found

    Balanced binary trees in the Tamari lattice

    Get PDF
    We show that the set of balanced binary trees is closed by interval in the Tamari lattice. We establish that the intervals [T0, T1] where T0 and T1 are balanced trees are isomorphic as posets to a hypercube. We introduce tree patterns and synchronous grammars to get a functional equation of the generating series enumerating balanced tree intervals

    The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance

    Full text link
    For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees.Comment: Published at http://dx.doi.org/10.1214/105051606000000547 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Yule-generated trees constrained by node imbalance

    Full text link
    The Yule process generates a class of binary trees which is fundamental to population genetic models and other applications in evolutionary biology. In this paper, we introduce a family of sub-classes of ranked trees, called Omega-trees, which are characterized by imbalance of internal nodes. The degree of imbalance is defined by an integer 0 <= w. For caterpillars, the extreme case of unbalanced trees, w = 0. Under models of neutral evolution, for instance the Yule model, trees with small w are unlikely to occur by chance. Indeed, imbalance can be a signature of permanent selection pressure, such as observable in the genealogies of certain pathogens. From a mathematical point of view it is interesting to observe that the space of Omega-trees maintains several statistical invariants although it is drastically reduced in size compared to the space of unconstrained Yule trees. Using generating functions, we study here some basic combinatorial properties of Omega-trees. We focus on the distribution of the number of subtrees with two leaves. We show that expectation and variance of this distribution match those for unconstrained trees already for very small values of w

    Optimizing a Certified Proof Checker for a Large-Scale Computer-Generated Proof

    Full text link
    In recent work, we formalized the theory of optimal-size sorting networks with the goal of extracting a verified checker for the large-scale computer-generated proof that 25 comparisons are optimal when sorting 9 inputs, which required more than a decade of CPU time and produced 27 GB of proof witnesses. The checker uses an untrusted oracle based on these witnesses and is able to verify the smaller case of 8 inputs within a couple of days, but it did not scale to the full proof for 9 inputs. In this paper, we describe several non-trivial optimizations of the algorithm in the checker, obtained by appropriately changing the formalization and capitalizing on the symbiosis with an adequate implementation of the oracle. We provide experimental evidence of orders of magnitude improvements to both runtime and memory footprint for 8 inputs, and actually manage to check the full proof for 9 inputs.Comment: IMADA-preprint-c

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
    • …
    corecore