12 research outputs found

    A new balance index for phylogenetic trees

    Full text link
    Several indices that measure the degree of balance of a rooted phylogenetic tree have been proposed so far in the literature. In this work we define and study a new index of this kind, which we call the total cophenetic index: the sum, over all pairs of different leaves, of the depth of their least common ancestor. This index makes sense for arbitrary trees, can be computed in linear time and it has a larger range of values and a greater resolution power than other indices like Colless' or Sackin's. We compute its maximum and minimum values for arbitrary and binary trees, as well as exact formulas for its expected value for binary trees under the Yule and the uniform models of evolution. As a byproduct of this study, we obtain an exact formula for the expected value of the Sackin index under the uniform model, a result that seems to be new in the literature.Comment: 24 pages, 2 figures, preliminary version presented at the JBI 201

    On joint subtree distributions under two evolutionary models

    Get PDF
    In population and evolutionary biology, hypotheses about micro-evolutionary and macroevolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model. Based on two novel recursive formulae, we propose a dynamic approach to numerically compute the exact joint distribution (and hence the marginal distributions) for trees of any size. We also obtained insights into the statistical properties of trees generated under these two models, including a constant correlation between the cherry and the pitchfork distributions under the YHK model, and the log-concavity and unimodality of the cherry distributions under both models. In addition, we show that there exists a unique change point for the cherry distributions between these two models

    A balance index for phylogenetic trees based on rooted quartets

    Get PDF
    We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford’s α-model and Aldous’ β-model and on arbitrary trees under the α– γ-model.Peer ReviewedPostprint (author's final draft

    A balance index for phylogenetic trees based on rooted quartets

    Get PDF
    We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford's α\alpha-model and Aldous' β\beta-model and on arbitrary trees under the α\alpha-γ\gamma-model.Comment: 38 pages, 12 figure

    On cherry and pitchfork distributions of random rooted and unrooted phylogenetic trees

    Get PDF
    Tree shape statistics are important for investigating evolutionary mechanisms mediating phylogenetic trees. As a step towards bridging shape statistics between rooted and unrooted trees, we present a comparison study on two subtree statistics known as numbers of cherries and pitchforks for the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. Based on recursive formulas on the joint distribution of the number of cherries and that of pitchforks, it is shown that cherry distributions are log-concave for both rooted and unrooted trees under these two models. Furthermore, the mean number of cherries and that of pitchforks for unrooted trees converge respectively to those for rooted trees under the YHK model while there exists a limiting gap of 1/4 for the PDA model. Finally, the total variation distances between the cherry distributions of rooted and those of unrooted trees converge for both models. Our results indicate that caution is required for conducting statistical analysis for tree shapes involving both rooted and unrooted trees

    Stochastic Tree Models for Macroevolution: Development, Validation and Application

    Get PDF
    Phylogenetic trees capture the relationships between species and can be investigated by morphological and/or molecular data. When focusing on macroevolution, one considers the large-scale history of life with evolutionary changes affecting a single species of the entire clade leading to the enormous diversity of species obtained today. One major problem of biology is the explanation of this biodiversity. Therefore, one may ask which kind of macroevolutionary processes have given rise to observable tree shapes or patterns of species distribution which refers to the appearance of branching orders and time periods. Thus, with an increasing number of known species in the context of phylogenetic studies, testing hypotheses about evolution by analyzing the tree shape of the resulting phylogenetic trees became matter of particular interest. The attention of using those reconstructed phylogenies for studying evolutionary processes increased during the last decades. Many paleontologists (Raup et al., 1973; Gould et al., 1977; Gilinsky and Good, 1989; Nee, 2004) tried to describe such patterns of macroevolution by using models for growing trees. Those models describe stochastic processes to generate phylogenetic trees. Yule (1925) was the first who introduced such a model, the Equal Rate Markov (ERM) model, in the context of biological branching based on a continuous-time, uneven branching process. In the last decades, further dynamical models were proposed (Yule, 1925; Aldous, 1996; Nee, 2006; Rosen, 1978; Ford, 2005; Hernández-García et al., 2010) to address the investigation of tree shapes and hence, capture the rules of macroevolutionary forces. A common model, is the Aldous\\\'' Branching (AB) model, which is known for generating trees with a similar structure of \\\"real\\\" trees. To infer those macroevolutionary forces structures, estimated trees are analyzed and compared to simulated trees generated by models. There are a few drawbacks on recent models such as a missing biological motivation or the generated tree shape does not fit well to one observed in empirical trees. The central aim of this thesis is the development and study of new biologically motivated approaches which might help to better understand or even discover biological forces which lead to the huge diversity of organisms. The first approach, called age model, can be defined as a stochastic procedure which describes the growth of binary trees by an iterative stochastic attachment of leaves, similar to the ERM model. At difference with the latter, the branching rate at each clade is no longer constant, but decreasing in time, i.e., with the age. Thus, species involved in recent speciation events have a tendency to speciate again. The second introduced model, is a branching process which mimics the evolution of species driven by innovations. The process involves a separation of time scales. Rare innovation events trigger rapid cascades of diversification where a feature combines with previously existing features. The model is called innovation model. Three data sets of estimated phylogenetic trees are used to analyze and compare the produced tree shape of the new growth models. A tree shape statistic considering a variety of imbalance measurements is performed. Results show that simulated trees of both growth models fit well to the tree shape observed in real trees. In a further study, a likelihood analysis is performed in order to rank models with respect to their ability to explain observed tree shapes. Results show that the likelihoods of the age model and the AB model are clearly correlated under the trees in the databases when considering small and medium-sized trees with up to 19 leaves. For a data set, representing of phylogenetic trees of protein families, the age model outperforms the AB model. But for another data set, representing phylogenetic trees of species, the AB model performs slightly better. To support this observation a further analysis using larger trees is necessary. But an exact computation of likelihoods for large trees implies a huge computational effort. Therefore, an efficient method for likelihood estimation is proposed and compared to the estimation using a naive sampling strategy. Nevertheless, both models describe the tree generation process in a way which is easy to interpret biologically. Another interesting field of research in biology is the coevolution between species. This is the interaction of species across groups such that the evolution of a species from one group can be triggered by a species from another group. Most prominent examples are systems of host species and their associated parasites. One problem is the reconciliation of the common history of both groups of species and to predict the associations between ancestral hosts and their parasites. To solve this problem some algorithmic methods have been developed in recent years. But only a few host parasite systems have been analyzed in sufficient detail which makes an evaluation of these methods complex. Within the scope of coevolution, the proposed age model is applied to the generation of cophylogenies to evaluate such host parasite reconciliation methods. The presented age model as well as the innovation model produce tree shapes which are similar to obtained tree structures of estimated trees. Both models describe an evolutionary dynamics and might provide a further opportunity to infer macroevolutionary processes which lead to the biodiversity which can be obtained today. Furthermore with the application of the age model in the context of coevolution by generating a useful benchmark set of cophylogenies is a first step towards systematic studies on evaluating reconciliation methods

    Tree balance indices: a comprehensive survey

    Get PDF
    Tree balance plays an important role in phylogenetics and other research areas, which is why several indices to measure tree balance have been introduced over the years. Nevertheless, a formal definition of what a balance index actually is and what makes it a useful measure of balance (or, in other cases, imbalance), has so far not been introduced in the literature. While the established indices all summarize the (im)balance of a tree in a single number, they vary in their definitions and underlying principles. It is the aim of the present manuscript to introduce formal definitions of balance and imbalance indices that classify desirable properties of such indices and to analyze and categorize established indices accordingly. In this regard, we review 19 established (im)balance indices from the literature, summarize their general, statistical and combinatorial properties (where known), prove numerous additional results and indicate directions for future research by making explicit open questions and gaps in the literature. We also prove that a few tree shape statistics that have been used to measure tree balance in the literature do not fulfill our definition of an (im)balance index, which might indicate that their properties are not as useful for practical purposes. Moreover, we show that five additional tree shape statistics from other contexts actually are tree (im)balance indices according to our definition. The manuscript is accompanied by the website containing fact sheets of the discussed indices. Moreover, we introduce the software package \verb|treebalance| implemented in R that can be used to calculate all indices discussed.1 Introduction 2 Preliminaries 3 Summary of tree balance indices 4 Obtaining new balance indices from established indices 5 Normalizing balance indices 6 Related concepts 7 Software 8 Discussion and outlook 9 Fact sheet
    corecore