6 research outputs found

    Supertree-like methods for genome-scale species tree estimation

    Get PDF
    A critical step in many biological studies is the estimation of evolutionary trees (phylogenies) from genomic data. Of particular interest is the species tree, which illustrates how a set of species evolved from a common ancestor. While species trees were previously estimated from a few regions of the genome (genes), it is now widely recognized that biological processes can cause the evolutionary histories of individual genes to differ from each other and from the species tree. This heterogeneity across the genome is phylogenetic signal that can be leveraged to estimate species evolution with greater accuracy. Hence, species tree estimation is expected to be greatly aided by current large-scale sequencing efforts, including the 5000 Insect Genomes Project, the 10000 Plant Genomes Project, the (~60000) Vertebrate Genomes Project, and the Earth BioGenome Project, which aims to assemble genomes (or at least genome-scale data) for 1.5 million eukaryotic species in the next ten years. To analyze these forthcoming datasets, species tree estimation methods must scale to thousands of species and tens of thousands of genes; however, many of the current leading methods, which are heuristics for NP-hard optimization problems, can be prohibitively expensive on datasets of this size. In this dissertation, we argue that new methods are needed to enable scalable and statistically rigorous species tree estimation pipelines; we then seek to address this challenge through the introduction of three supertree-like methods: NJMerge, TreeMerge, and FastMulRFS. For these methods, we present theoretical results (worst-case running time analyses and proofs of statistical consistency) as well as empirical results on simulated datasets (and a fungal dataset for FastMulRFS). Overall, these methods enable statistically consistent species tree estimation pipelines that achieve comparable accuracy to the dominant optimization-based approaches while dramatically reducing running time

    Computing Robinson-Foulds supertree for two trees

    Get PDF
    Supertree problems are important in phylogeny estimation. Supertree construction takes in a set of input trees on subsets of species and aims to find a supertree containing all species subjective to some combinatorial or statistical criterion. As such, it can be used to combine trees estimated by different research projects, or to construct species trees from gene trees that may not contain all species, or to serve a part in divide-and-conquer pipelines that improve the scalability of large scale phylogeny estimation. Yet the most promising supertree methods, such as the popular Robinson-Foulds Supertree (RFS) methods, not only cannot guarantee an optimal solution but also are computationally intensive by themselves, as they are heuristics for NP-hard optimization problems. We present the first polynomial time algorithm to exactly solve the RFS problem on two binary input trees, and prove that finding the Robinson-Foulds Supertree of three input trees is NP-hard. We present GreedyRFS, a greedy heuristic for the Robinson-Foulds Supertree problem that operates by using our exact algorithm for RFS on pairs of trees, until all the trees are merged into a single supertree. Our experiments show that GreedyRFS has better accuracy than FastRFS, the leading heuristic for RFS, when the number of input trees is small, which is the natural case for use within divide-and-conquer pipelines

    Phylogenomic systematics, bioacoustics, and morphology of frogs from Madagascar reveals that background noise drives the evolution of high frequency acoustic signaling

    Get PDF
    Madagascar is considered a globally important biodiversity hotspot, having some of the highest rates of species endemism in the world and has been the focus of substantial effort from researchers to understand the evolution of its distinctive biota. This is especially true for frogs, where the microcontinent hosts an impressively diverse amphibian fauna totaling over 500 species, where a large amount of this diversity has only been described relatively recently. Much of this accelerated taxonomic progress can be attributed to the combination of DNA barcoding and the widespread application of bioacoustics enabling more efficient species identification and characterization of new lineages. The availability of bioacoustic data for the majority of species has revealed the incredible diversity of acoustic signals in Malagasy frogs; despite the incredible acoustic diversity in Madagascar, explanations for the evolution of distinct advertisement calls have been little explored. Acoustic signaling is important to frogs because it is the primary mechanism of communication and mate selection and therefore it is expected that acoustic communication and factors that drive variation in acoustic signals should be under strong selection. Frogs from the most species-rich genus Boophis from the family Mantellidae in Madagascar communicate acoustically during a rainy and short breeding season and aggregate around water bodies in high abundance making Boophis an ideal model system to address the evolution of advertisement calls. A potential explanation for signal diversity is the acoustic adaptation hypothesis, which predicts that natural and sexual selection drive optimization of signal transmission and perception across different habitats. In many organisms, the efficiency of acoustic signal transmission can be affected by habitat structure or background noise. One prediction of the acoustic adaptation hypothesis is that environmental ambient noise, which is the background level of sound in the environment, will drive signal evolution if the noise is sufficiently similar and lessens the receiver’s perception of the signal. Because Boophis reproduce near water bodies and most often in stream habitats that might be noisy from the sound of rushing water, they present an excellent system to address acoustic interference. In Chapter 1, I describe a new species of Boophis that is morphologically cryptic with its sister species but differs remarkably in advertisement call, which underlines the importance of call variation in the genus. To contribute to the taxonomic progress and integration of bioacoustic data, I describe a new species of Boophis by using these multiple lines of evidence and also suggesting future research to understand the evolution of advertisement calls in the genus. I also develop the acoustic analysis pipeline used for acquiring frequency traits from thousands of calls rapidly, where these methods will be used in Chapter 5. In this study, I also suggest that reproductive character displacement could be driving divergence in advertisement calls and that other important factors from the environment could lead to broad patterns in the evolution of advertisement calls. In Chapter 2, to understand signal evolution across a broad range of taxa, a strongly supported phylogenetic hypothesis is needed and I estimate a new multi-locus phylogeny using Sanger sequencing. The systematics of frogs from the family Mantellidae have had a long and turbulent history where Mantellidae was considered a family relatively recently. In Boophis tree frogs, early researchers considered them as belonging to the Asian genus Rhacophorus because of their strong similarities and breeding habitat in water bodies. Furthermore, despite the numerous works that contributed to understanding the molecular phylogeny of Boophis many aspects of their evolutionary relationships remain insufficiently supported and a complete Sanger multi-locus phylogeny had not been estimated. For understanding the relationships among Boophis frogs, I estimated a multi-locus phylogeny from eight Sanger markers that includes as much diversity as possible. Despite these efforts, I could not adequately support the phylogenetic relationships among Boophis taxa and adding additional Sanger markers would not be likely to resolve these relationships. In Chapter 3 I aim to expand upon this dataset and develop a new sequencing technology to obtain thousands of markers affordably. The widespread use of high-throughput sequencing technologies to sequence large portions of organisms’ genome has led to new and exciting challenges and questions that can be addressed with the massive increase in sequence data these new methods provide. The objective of sequence capture is to sequence genomic regions typically through hybridization-based capture from a previously designed set of known markers and benefits from the potential to acquire markers that are useful at all evolutionary time-scales. Therefore, I developed a sequence capture probe set called FrogCap to sequence ~15,000 genomic markers that can be used across the entire frog radiation. I compare the efficacy of the probe set on six phylogenetic scales and quantify the number of markers sequenced, depth of coverage, missing data, and parsimony informative sites, and I also compared differences between these measures across different types of data. The results from this chapter show that FrogCap is a very promising new sequence capture probe set that can be used across all frogs. In Chapter 4 I test the effectiveness of FrogCap by addressing the systematics of frogs from the family Mantellidae, by comparing the FrogCap sequence capture results to those from transcriptomes. The phylogeny of Mantellidae remains unresolved and contentious, where the phylogenetic relationships among subfamilies and genera are not well delimited. Prior to this it was thought the different groups in Madagascar were non-monophyletic and were thought to be from several different families. I address the weak phylogenetic support and also test the FrogCap probe set on Mantellidae frogs, comparing the probe set to transcriptomic sequencing of samples from the same groups. I find that both FrogCap and transcriptomes work similarly well for resolving these difficult and contentious relationships. FrogCap sequence capture also provided several advantages over the transcriptomic data; FrogCap sequences non-protein coding markers from across the genome, such that these other types of markers could be useful by providing sequence data from potentially neutrally evolving genomic regions and also can serve as another line of phylogenetic evidence when compared to other data types. Transcriptomes also provide advantages through a larger amount of sequence data and lesser gene discordance because the transcriptomes are much longer and thus providing more resolution than shorter markers. I find that the FrogCap probe set is an effective tool at disentangling difficult phylogenetic problems, and that transcriptomic sequencing is less effective for phylogenetics. In Chapter 5, I estimate a new phylogeny for Boophis tree frogs using the FrogCap probe set and address whether acoustic interference leads to the evolution of higher frequency advertisement calls in Boophis frogs. I integrate an unprecedented dataset incorporating data collected from all previous chapters which includes a massive dataset of 300+ acoustic recordings from nearly every species in the genus, a new Boophis time-calibrated phylogeny using a backbone of ~15,000 genomic markers acquired from the sequence capture data combined with the data from Chapter 2 for full species sampling and used soft tissue computed tomography on 28 species to acquire detailed morphological information from the larynges of males. After finding that the sequence capture dataset provides strong statistical support for nearly every node in the tree, I time-calibrate the phylogeny and test the acoustic adaptation hypothesis. I find that Boophis tree frogs are evolving higher frequency acoustic signals in loud stream habitats, which is supported after correcting for body size. These results are further evidenced by laryngeal measurements from the CT Scans, where I find that loud stream frog laryngeal morphology is decoupled from the predicted relationship to body size that is found in quiet stream frogs
    corecore