Search CORE

21,717 research outputs found

Balanced binary trees in the Tamari lattice

Author: Giraudo Samuele
Publication venue
Publication date: 01/01/2010
Field of study

We show that the set of balanced binary trees is closed by interval in the Tamari lattice. We establish that the intervals [T0, T1] where T0 and T1 are balanced trees are isomorphic as posets to a hypercube. We introduce tree patterns and synchronous grammars to get a functional equation of the generating series enumerating balanced tree intervals

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance

Author: Blum Michael G. B.
François Olivier
Janson Svante
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 14/02/2007
Field of study

For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees.Comment: Published at http://dx.doi.org/10.1214/105051606000000547 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Recommended from our members

Dynamic load balancing in parallel KD-tree k-means

Author: Di Fatta Giuseppe
Pettinger David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/06/2010
Field of study

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

Central Archive at the University of Reading

Crossref

Yule-generated trees constrained by node imbalance

Author: Disanto Filippo
Schlizio Anna
Wiehe Thomas
Publication venue
Publication date: 01/01/2013
Field of study

The Yule process generates a class of binary trees which is fundamental to population genetic models and other applications in evolutionary biology. In this paper, we introduce a family of sub-classes of ranked trees, called Omega-trees, which are characterized by imbalance of internal nodes. The degree of imbalance is defined by an integer 0 <= w. For caterpillars, the extreme case of unbalanced trees, w = 0. Under models of neutral evolution, for instance the Yule model, trees with small w are unlikely to occur by chance. Indeed, imbalance can be a signature of permanent selection pressure, such as observable in the genealogies of certain pathogens. From a mathematical point of view it is interesting to observe that the space of Omega-trees maintains several statistical invariants although it is drastically reduced in size compared to the space of unconstrained Yule trees. Using generating functions, we study here some basic combinatorial properties of Omega-trees. We focus on the distribution of the number of subtrees with two leaves. We show that expectation and variance of this distribution match those for unconstrained trees already for very small values of w

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Optimizing a Certified Proof Checker for a Large-Scale Computer-Generated Proof

Author: A Fouilhe
C Sternagel
DC Voorhis van
DE Knuth
E Contejean
L Cruz-Filipe
N Oury
P Letouzey
R O’Connor
R Thiemann
RW Floyd
X Leroy
Y Bertot
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In recent work, we formalized the theory of optimal-size sorting networks with the goal of extracting a verified checker for the large-scale computer-generated proof that 25 comparisons are optimal when sorting 9 inputs, which required more than a decade of CPU time and produced 27 GB of proof witnesses. The checker uses an untrusted oracle based on these witnesses and is able to verify the smaller case of 8 inputs within a couple of days, but it did not scale to the full proof for 9 inputs. In this paper, we describe several non-trivial optimizations of the algorithm in the checker, obtained by appropriately changing the formalization and capitalizing on the symbiosis with an adequate implementation of the oracle. We provide experimental evidence of orders of magnitude improvements to both runtime and memory footprint for 8 inputs, and actually manage to check the full proof for 9 inputs.Comment: IMADA-preprint-c

arXiv.org e-Print Archive

Crossref

University of Southern Denmark Research Output

PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

Author: La Cava William
Moore Jason H.
Olson Randal S.
Orzechowski Patryk
Urbanowicz Ryan J.
Publication venue
Publication date: 01/03/2017
Field of study

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

arXiv.org e-Print Archive

Directory of Open Access Journals