6 research outputs found

    Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance

    Get PDF
    We present a new method for inferring species trees from multi-copy gene trees. Our method is based on a generalization of the Robinson-Foulds (RF) distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple leaves can have the same label. Unlike most previous phylogenetic methods using gene trees, this method does not assume that gene tree incongruence is caused by a single, specific biological process, such as gene duplication and loss, deep coalescence, or lateral gene transfer. We prove that it is NP-hard to compute the RF distance between two mul-trees, but it is easy to calculate the generalized RF distance between a mul-tree and a singly-labeled tree. Motivated by this observation, we formulate the RF supertree problem for mul-trees (MulRF), which takes a collection of mul-trees and constructs a species tree that minimizes the total RF distance from the input mul-trees. We present a fast heuristic algorithm for the MulRF supertree problem. Simulation experiments demonstrate that the MulRF method produces more accurate species trees than gene tree parsimony methods when incongruence is caused by gene tree error, duplications and losses, and/or lateral gene transfer. Furthermore, the MulRF heuristic runs quickly on data sets containing hundreds of trees with up to a hundred taxa.Comment: 16 pages, 11 figure

    Algorithms for constructing more accurate and inclusive phylogenetic trees

    Get PDF
    Despite the unprecedented outpouring of molecular sequence data in phylogenetics, the current understanding of the tree of life is still incomplete. The widespread applications of phylogenies, ranging from drug design to biodiversity conservation, repeatedly remind us of the need for more accurate and inclusive phylogenies. My thesis addresses some of the underlying challenges, by presenting theoretical and empirical results, as well as algorithms for a range of phylogenetic optimization problems. In the first part of this thesis, I develop a heuristic method for the NP-hard unrooted Robinson-Foulds (RF) supertree problem, and show that it yields more accurate supertrees than those obtained from Matrix Representation with Parsimony (MRP) and rooted RF heuristic. In the second, I present an RF distance measure based approach (MulRF) for inferring a species tree from the input multi-copy gene trees, through a generalization of RF distance to multi-labeled trees. Through simulation, I show that this approach, which is independent of gene tree discordance mechanisms, produces more accurate species trees than existing methods when incongruence is caused by gene tree error, duplications and losses, and/or lateral gene transfer. Next, I perform a simulation study to evaluate the performance of Gene Tree Parsimony (GTP) under duplication and duplication and loss cost models and compare it to MulRF method. The objective is to study the effects of various types of sampling (e.g., gene tree and sequence sampling), gene tree error, and duplication and loss rates on the accuracy of the phylogenetic estimates by GTP and MulRF. Next, I present efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence. In the end, I present NP-completeness proofs for two problems whose complexity was previously unknown

    Exploring Complex Disease Gene Relationships Using Simultaneous Analysis

    Get PDF
    The characterization of complex diseases remains a great challenge for biomedical researchers due to the myriad interactions of genetic and environmental factors. Adaptation of phylogenomic techniques to increasingly available genomic data provides an evolutionary perspective that may elucidate important unknown features of complex diseases. Here an automated method is presented that leverages publicly available genomic data and phylogenomic techniques. The approach is tested with nine genes implicated in the development of Alzheimer Disease, a complex neurodegenerative syndrome. The developed technique, which is an update to a previously described Perl script called “ASAP,” was implemented through a suite of Ruby scripts entitled “ASAP2,” first compiles a list of sequence-similarity based orthologues using PSI-BLAST and a recursive NCBI BLAST+ search strategy, then constructs maximum parsimony phylogenetic trees for each set of nucleotide and protein sequences, and calculates phylogenetic metrics (partitioned Bremer support values, combined branch scores, and Robinson-Foulds distance) to provide an empirical assessment of evolutionary conservation within a given genetic network. This study demonstrates the potential for using automated simultaneous phylogenetic analysis to uncover previously unknown relationships among disease-associated genes that may not have been apparent using traditional, single-gene methods. Furthermore, the results provide the first integrated evolutionary history of an Alzheimer Disease gene network and identify potentially important co-evolutionary clustering around components of oxidative stress pathways

    Integrating phylogenomics, biogeography and systematics to explore the taxonomy and the rise of the ratsnakes

    Full text link
    Understanding the evolutionary processes that create the spectacular diversity of organisms, both in species numbers and form, is a primary goal for biologists. Global ratsnakes are a species-rich assemblage with high morphological and ecological diversity and a distribution that encompasses both the Old World (OW) and the New World (NW). To explore the mechanism leading to the divergence of the ratsnakes, I tested the hypotheses regarding the area of origin and global dispersal, and examined the patterns of diversification and trait evolution. Given adaptive radiation via ecological opportunity, a diversity-dependent diversification pattern and an early burst trait evolutionary pattern are expected with rapid divergence triggered by the appearance of new resources, extinction of competitors, colonization of new areas or the appearance of key innovations. Thus, I tested if the radiation of ratsnakes follows diversity-dependent diversification with an early burst in speciation and trait divergence and whether the variation in diversification is associated with OW-NW dispersal or changes in traits. Further, trait convergence between OW and NW lineages was investigated to determine, if given similar environmental conditions, rapid speciation via ecological opportunity is repeatable. To answer the questions mentioned above, a robust phylogenetic tree is fundamental. Due to potential gene tree/species discordance, hundreds of loci sampled across the entire genome were generated using the anchored hybrid enrichment approach and the multi-species coalescent methods were used to build the species phylogeny. Then, given this phylogenetic context, taxonomic changes were made to reflect named monophyletic groups and divergence time and ancestral areas were estimated to 1) infer the processes leading to the current ratsnake global distribution, 2) assess the best fitting diversification and trait evolution models, and 3) determine if ecomorphological convergence occurs with adaptive regimes of traits on the phylogeny. Among all of the inferred species trees, by comparing the extent of tree discordance and the gene tree errors, the species trees generated in the program MPEST with summary statistics of posterior probability gene trees was used for further analysis. First, it was determined that the traditional ratsnake genera Gonyosoma and Coelegnathus are excluded from the monophyletic ratsnake group, with the remaining monophyletic group defined as Coronellini. The reconstructed ancestral areas supported that ratsnakes originating in the OW Eastern Palearctic and with a single dispersal to the NW via Bergingia. Two subclades each defined by a single genus, Lampropeltis and Elaphe, were found to have exclusively elevated species diversification and trait evolutionary rates. As the rate accelerations were only in the recent divergent lineages, colonization to the NW and rapid speciation of the NW lineages were decoupled. A general diversity-dependent radiation pattern in both OW and NW lineages was supported with a recent sharp diversification elevation about 6.5 Ma mainly within the genera Lampropeltis and Elaphe. Three morphological convergence events were detected among OW and NW lineages, corresponding to the previously defined morphological taxonomies (i.e., Elaphe and Pantherophis), indicating without a robust molecular phylogeny, morphological convergence positively misleads taxonomy. This research demonstrates the advantages and challenges of phylogenetic inference using genome scale dataset, highlights the importance of incorporating the biogeographic history and trait evolution in studies of diversification and indicates that oversimplified models are insufficient to describe the complexity of processes shaping the diversity in a species-rich assemblage
    corecore