124 research outputs found

    Constructing liberal and conservative supertrees and exact solutions for reduced consensus problems

    Get PDF
    This thesis studies two different approaches to extracting information from collections of phylogenetic trees: supertrees and reduced consensus. Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former. We study a variant of one of Cotton and Wilkinson\u27s methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page. The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies. Generalizations of the strict and loose consensus methods to the supertree setting, recently introduced by McMorris and Wilkinson, are studied. The supertrees these methods produce are conservative in the sense that they only preserve information (in the form of splits) that is supported by at least one the input trees and that is not contradicted by any of the input trees. Alternative, equivalent, formulations of these supertrees are developed. These are used to prove the NP-completeness of the underlying optimization problems and to give exact integer linear programming solutions. For larger data sets, a divide and conquer approach is adopted, based on the structural properties of these supertrees. Experiments show that it is feasible to solve problems with several hundred taxa and several hundred trees in a reasonable amount of time. A rogue taxon in a collection of phylogenetic trees is one whose position varies drastically from tree to tree. The presence of such taxa can greatly reduce the resolution of the consensus tree (e.g., the majority-rule or strict consensus) for a collection. The reduced consensus approach aims to identify rogue taxa and to produce more informative consensus trees. Given a collection of phylogenetic trees over the same leaf set, the goal is to find a set of taxa whose removal maximizes the number of internal edges in the consensus tree of the collection. This problem is proven to be NP-hard for strict and majority-rule consensus. We describe exact integer linear programming formulations for computing reduced strict, majority and loose consensus trees. In experimental tests, our exact solutions show significant improvement over heuristic methods on several problem instances

    Consensus and Confusion in Molluscan Trees: Evaluating Morphological and Molecular Phylogenies

    Get PDF
    Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a data set of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared with a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from two data partitions and three methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, whereas unequivocally distant taxa will make the most constructive choices for exemplar selection in higher level phylogenomic analyses

    Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

    Get PDF
    Abstract Background Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare. Results Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae) containing 4954 species and 12,033 sites and an rbcL matrix for green plants (Viridiplantae) with 13,533 species and 1,401 sites. Conclusion By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously required many more genes. These demonstrations underscore the importance of using large phylogenies to uncover important evolutionary patterns and we present a fast and simple method for constructing these phylogenies.</p

    An experimental study of Quartets MaxCut and other supertree methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.</p> <p>Results</p> <p>We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.</p> <p>Conclusions</p> <p>Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.</p

    DendroBlast: approximate phylogenetic trees in the absence of multiple sequence alignments

    Get PDF
    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/

    Reweaving the tapestry: a supertree of birds

    Get PDF
    Supertrees are a useful method of constructing large-scale phylogenies by assembling numerous smaller phylogenies that have some, but not necessarily all, taxa in common. Birds are an obvious candidate for supertree construction as they are the most abundant land vertebrates on the planet and no comprehensive phylogeny of both extinct and extant species currently exists. In order to construct supertrees, primary analysis of characters is required. One such study, presented here, describes two new partial specimens belonging to the Primobucconidae from the Green River Formation of Wyoming (USA), which were assigned to the species Primobucco mcgrewi. Although incomplete, these specimens had preserved anatomical features not seen in other material. An attempt to further constrain their phylogenetic position was inconclusive, showing only that the Primobucconidae belong in a clade containing the extant Coraciiformes and related taxa. Over 700 such studies were used to construct a species-level supertree of Aves containing over 5000 taxa. The resulting tree shows the relationships between the main avian groups, with only a few novel clades, some of which can be explained by a lack of information regarding those taxa. The tree was constructed using a strict protocol which ensures robust, accurate and efficient data collection and processing; extending previous work by other authors. Before creating the species-level supertree the protocol was tested on the order Galliformes in order to determine the most efficient method of removing non-independent data. It was found that combining non-independent source trees via a “mini-supertree” analysis produced results more consistent with the input source data and, in addition, significantly reduced computational load. Another method for constructing large-scale trees is via a supermatrix, which is constructed from primary data collated into a single, large matrix. A molecular-only tree was constructed using both supertree and supermatrix methods, from the same data, again of the order Galliformes. Both methods performed equally as well in producing trees that fit the source data. The two methods could be considered complementary rather than conflicting as the supertree took a long time to construct but was very quick to calculate, but the supermatrix took longer to calculate, but was quicker to construct. Dependent upon the data at hand and the other factors involved, the choice of which method to use appears, from this small study, to be of little consequence. Finally an updated species-level supertree of the Dinosauria was also constructed and used to look at diversification rates in order to elucidate the “Cretaceous explosion of terrestrial life”. Results from this study show that this apparent burst in diversity at the end of the Cretaceous is a sampling artefact and in fact, dinosaurs show most of their major diversification shifts in the first third of their history

    Clann: investigating phylogenetic information through supertree analyses

    Get PDF
    Summary: Clann has been developed in order to provide methods of investigating phylogenetic information through the application of supertrees. Availability: Clann has been precompiled for Linux, Apple Macintosh and Windows operating systems and is available from http://bioinf.may.ie/software/clann. Source code is available on request from the authors. Supplementary information: Clann has been written in the C programming language. Source code is available on request

    Large Trees, Supertrees, and Diversification of the Grass Family

    Get PDF
    Phylogenetic studies of grasses (Poaceae) are advanced in comparison with most other angiosperm families. However, few studies have attempted to build large phylogenetic trees of the family and use these for evaluating patterns of diversification or other macroevolutionary hypotheses. Two contrasting approaches can be used to generate large trees: supermatrix analyses and supertrees. In this paper, we evaluated the suitability of each of these methods for the study of patterns and processes of evolution in the grasses. We collected data from DDBJ/EMBL/GenBank to determine sequence availability and asked how far we are from a complete generic-level phylogenetic tree of the grasses. We generated almost complete tribal-level supertrees (39 tribes) with over 400 genera using MRP methods, described their major clades, assessed their accuracy, and used them for the study of diversification. We generated a proportional supertree, by modifying the original supertree, to remove sampling bias associated with the original supertree that may affect diversification statistics. We used methods that incorporate information on the topological distribution of taxon diversity from all internal nodes of the phylogenetic tree to show that the grasses have experienced significant variations in diversification rates (M statistic P-value

    Parallelizing superFine

    Get PDF
    The estimation of the Tree of Life, a rooted binary tree representing how all extant species evolved from a common ancestor, is one of the grand challenges of modern biology. Research groups around the world are attempting to estimate evolutionary trees on particular sets of species (typically clades, or rooted subtrees), in the hope that a final "supertree" can be produced from these smaller estimated trees through the addition of a "scaffold" tree of randomly sampled taxa from the tree of life. However, supertree estimation is itself a computationally challenging problem, because the most accurate trees are produced by running heuristics for NP-hard problems. In this paper we report on a study in which we parallelize SuperFine, the currently most accurate and efficient supertree estimation method. We explore performance of these parallel implementations on simulated data-sets with 1000 taxa and biological data-sets with up to 2,228 taxa. Our study reveals aspects of SuperFine that limit the speed-ups that are possible through the type of outer-loop parallelism we exploit.(undefined
    corecore