381 research outputs found
Average Case Analysis of Leaf-Centric Binary Tree Sources
We study the average size of the minimal directed acyclic graph (DAG) with
respect to so-called leaf-centric binary tree sources as studied by Zhang,
Yang, and Kieffer. A leaf-centric binary tree source induces for every a probability distribution on all binary trees with leaves. We
generalize a result shown by Flajolet, Gourdon, Martinez and Devroye according
to which the average size of the minimal DAG of a binary tree that is produced
by the binary search tree model is
The mean and variance of phylogenetic diversity under rarefaction
Phylogenetic diversity (PD) depends on sampling intensity, which complicates
the comparison of PD between samples of different depth. One approach to
dealing with differing sample depth for a given diversity statistic is to
rarefy, which means to take a random subset of a given size of the original
sample. Exact analytical formulae for the mean and variance of species richness
under rarefaction have existed for some time but no such solution exists for
PD. We have derived exact formulae for the mean and variance of PD under
rarefaction. We show that these formulae are correct by comparing exact
solution mean and variance to that calculated by repeated random (Monte Carlo)
subsampling of a dataset of stem counts of woody shrubs of Toohey Forest,
Queensland, Australia. We also demonstrate the application of the method using
two examples: identifying hotspots of mammalian diversity in Australasian
ecoregions, and characterising the human vaginal microbiome. There is a very
high degree of correspondence between the analytical and random subsampling
methods for calculating mean and variance of PD under rarefaction, although the
Monte Carlo method requires a large number of random draws to converge on the
exact solution for the variance. Rarefaction of mammalian PD of ecoregions in
Australasia to a common standard of 25 species reveals very different rank
orderings of ecoregions, indicating quite different hotspots of diversity than
those obtained for unrarefied PD. The application of these methods to the
vaginal microbiome shows that a classical score used to quantify bacterial
vaginosis is correlated with the shape of the rarefaction curve. The analytical
formulae for the mean and variance of PD under rarefaction are both exact and
more efficient than repeated subsampling. Rarefaction of PD allows for many
applications where comparisons of samples of different depth is required.Comment: Final version to be published in Methods in Ecology and Evolutio
Introduction in IND and recursive partitioning
This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, and lists the manual pages for the routines and instructions on installation
- …