171 research outputs found

    Constructing majority-rule supertrees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former.</p> <p>Results</p> <p>We study a variant of one of Cotton and Wilkinson's methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page.</p> <p>Conclusions</p> <p>The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies.</p

    Constructing liberal and conservative supertrees and exact solutions for reduced consensus problems

    Get PDF
    This thesis studies two different approaches to extracting information from collections of phylogenetic trees: supertrees and reduced consensus. Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former. We study a variant of one of Cotton and Wilkinson\u27s methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page. The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies. Generalizations of the strict and loose consensus methods to the supertree setting, recently introduced by McMorris and Wilkinson, are studied. The supertrees these methods produce are conservative in the sense that they only preserve information (in the form of splits) that is supported by at least one the input trees and that is not contradicted by any of the input trees. Alternative, equivalent, formulations of these supertrees are developed. These are used to prove the NP-completeness of the underlying optimization problems and to give exact integer linear programming solutions. For larger data sets, a divide and conquer approach is adopted, based on the structural properties of these supertrees. Experiments show that it is feasible to solve problems with several hundred taxa and several hundred trees in a reasonable amount of time. A rogue taxon in a collection of phylogenetic trees is one whose position varies drastically from tree to tree. The presence of such taxa can greatly reduce the resolution of the consensus tree (e.g., the majority-rule or strict consensus) for a collection. The reduced consensus approach aims to identify rogue taxa and to produce more informative consensus trees. Given a collection of phylogenetic trees over the same leaf set, the goal is to find a set of taxa whose removal maximizes the number of internal edges in the consensus tree of the collection. This problem is proven to be NP-hard for strict and majority-rule consensus. We describe exact integer linear programming formulations for computing reduced strict, majority and loose consensus trees. In experimental tests, our exact solutions show significant improvement over heuristic methods on several problem instances

    Optimizing Phylogenetic Supertrees Using Answer Set Programming

    Full text link
    The supertree construction problem is about combining several phylogenetic trees with possibly conflicting information into a single tree that has all the leaves of the source trees as its leaves and the relationships between the leaves are as consistent with the source trees as possible. This leads to an optimization problem that is computationally challenging and typically heuristic methods, such as matrix representation with parsimony (MRP), are used. In this paper we consider the use of answer set programming to solve the supertree construction problem in terms of two alternative encodings. The first is based on an existing encoding of trees using substructures known as quartets, while the other novel encoding captures the relationships present in trees through direct projections. We use these encodings to compute a genus-level supertree for the family of cats (Felidae). Furthermore, we compare our results to recent supertrees obtained by the MRP method.Comment: To appear in Theory and Practice of Logic Programming (TPLP), Proceedings of ICLP 201

    A higher-level MRP supertree of placental mammals

    Get PDF
    BACKGROUND: The higher-level phylogeny of placental mammals has long been a phylogenetic Gordian knot, with disagreement about both the precise contents of, and relationships between, the extant orders. A recent MRP supertree that favoured 'outdated' hypotheses (notably, monophyly of both Artiodactyla and Lipotyphla) has been heavily criticised for including low-quality and redundant data. We apply a stringent data selection protocol designed to minimise these problems to a much-expanded data set of morphological, molecular and combined source trees, to produce a supertree that includes every family of extant placental mammals. RESULTS: The supertree is well-resolved and supports both polyphyly of Lipotyphla and paraphyly of Artiodactyla with respect to Cetacea. The existence of four 'superorders' – Afrotheria, Xenarthra, Laurasiatheria and Euarchontoglires – is also supported. The topology is highly congruent with recent (molecular) phylogenetic analyses of placental mammals, but is considerably more comprehensive, being the first phylogeny to include all 113 extant families without making a priori assumptions of suprafamilial monophyly. Subsidiary analyses reveal that the data selection protocol played a key role in the major changes relative to a previously published higher-level supertree of placentals. CONCLUSION: The supertree should provide a useful framework for hypothesis testing in phylogenetic comparative biology, and supports the idea that biogeography has played a crucial role in the evolution of placental mammals. Our results demonstrate the importance of minimising poor and redundant data when constructing supertrees

    Axiomatic opportunities and obstacles for inferring a species tree from gene trees

    Full text link
    The reconstruction of a central tendency `species tree' from a large number of conflicting gene trees is a central problem in systematic biology. Moreover, it becomes particularly problematic when taxon coverage is patchy, so that not all taxa are present in every gene tree. Here, we list four apparently desirable properties that a method for estimating a species tree from gene trees could have (the strongest property states that building a species tree from input gene trees and then pruning leaves gives a tree that is the same as, or more resolved than, the tree obtained by first removing the taxa from the input trees and then building the species tree). We show that while it is technically possible to simultaneously satisfy these properties when taxon coverage is complete, they cannot all be satisfied in the more general supertree setting. In part two, we discuss a concordance-based consensus method based on Baum's `plurality clusters', and an extension to concordance supertrees.Comment: 19 pages, 2 figure

    Reweaving the tapestry: a supertree of birds

    Get PDF
    Supertrees are a useful method of constructing large-scale phylogenies by assembling numerous smaller phylogenies that have some, but not necessarily all, taxa in common. Birds are an obvious candidate for supertree construction as they are the most abundant land vertebrates on the planet and no comprehensive phylogeny of both extinct and extant species currently exists. In order to construct supertrees, primary analysis of characters is required. One such study, presented here, describes two new partial specimens belonging to the Primobucconidae from the Green River Formation of Wyoming (USA), which were assigned to the species Primobucco mcgrewi. Although incomplete, these specimens had preserved anatomical features not seen in other material. An attempt to further constrain their phylogenetic position was inconclusive, showing only that the Primobucconidae belong in a clade containing the extant Coraciiformes and related taxa. Over 700 such studies were used to construct a species-level supertree of Aves containing over 5000 taxa. The resulting tree shows the relationships between the main avian groups, with only a few novel clades, some of which can be explained by a lack of information regarding those taxa. The tree was constructed using a strict protocol which ensures robust, accurate and efficient data collection and processing; extending previous work by other authors. Before creating the species-level supertree the protocol was tested on the order Galliformes in order to determine the most efficient method of removing non-independent data. It was found that combining non-independent source trees via a “mini-supertree” analysis produced results more consistent with the input source data and, in addition, significantly reduced computational load. Another method for constructing large-scale trees is via a supermatrix, which is constructed from primary data collated into a single, large matrix. A molecular-only tree was constructed using both supertree and supermatrix methods, from the same data, again of the order Galliformes. Both methods performed equally as well in producing trees that fit the source data. The two methods could be considered complementary rather than conflicting as the supertree took a long time to construct but was very quick to calculate, but the supermatrix took longer to calculate, but was quicker to construct. Dependent upon the data at hand and the other factors involved, the choice of which method to use appears, from this small study, to be of little consequence. Finally an updated species-level supertree of the Dinosauria was also constructed and used to look at diversification rates in order to elucidate the “Cretaceous explosion of terrestrial life”. Results from this study show that this apparent burst in diversity at the end of the Cretaceous is a sampling artefact and in fact, dinosaurs show most of their major diversification shifts in the first third of their history

    Consensus and Confusion in Molluscan Trees: Evaluating Morphological and Molecular Phylogenies

    Get PDF
    Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a data set of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared with a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from two data partitions and three methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, whereas unequivocally distant taxa will make the most constructive choices for exemplar selection in higher level phylogenomic analyses
    corecore