9 research outputs found

    Seeing the Tree of Life behind the phylogenetic forest

    Get PDF

    Using Phylogenomic Data to Untangle the Patterns and Timescale of Flowering Plant Evolution

    Get PDF
    Angiosperms are one of the most dominant groups on Earth, and have fundamentally changed global ecosystem patterns and function. Therefore, unravelling their evolutionary history is key to understanding how the world around us was formed, and how it might change in the future. In this thesis, I use genome-scale data to investigate the evolutionary patterns and timescale of angiosperms at multiple taxonomic levels, ranging from angiosperm-wide to genus-level data sets. I begin by using the largest combination of taxon and gene sampling thus far to provide a novel estimate for the timing of angiosperm origin in the Triassic period. Through a range of sensitivity analyses, I demonstrate that this estimate is robust to many important components of Bayesian molecular dating. I then explore tactics for phylogenomic dating using multiple molecular clocks. I evaluate methods for estimating the number and assignment of molecular clock models, and strategies for partitioning molecular clock models in analyses of multigene data sets. I also demonstrate the importance of critically evaluating the precision in age estimates from molecular dating analyses. Finally, I assess the utility of plastid data sets for resolving challenging phylogenetic relationships, focusing on Pimelea Banks & Sol. ex Gaertn. Through analysis of a multigene data set, sampled from many taxa, I provide an improved phylogeny for Pimelea and its close relatives. I then generate a plastome-scale data set for a representative sample of species to further refine the Pimelea phylogeny, and characterise discordant phylogenetic signals within their chloroplast genomes. The work in this thesis demonstrates the power of genome- scale data to address challenging phylogenetic questions, and the importance of critical evaluation of both methods and results. Future progress in our understanding of angiosperm evolution will depend on broader and denser taxon sampling, and the development of improved phylogenetic methods

    Algorithms for Analysis of Heterogeneous Cancer and Viral Populations Using High-Throughput Sequencing Data

    Get PDF
    Next-generation sequencing (NGS) technologies experienced giant leaps in recent years. Short read samples reach millions of reads, and the number of samples has been growing enormously in the wake of the COVID-19 pandemic. This data can expose essential aspects of disease transmission and development and reveal the key to its treatment. At the same time, single-cell sequencing saw the progress of getting from dozens to tens of thousands of cells per sample. These technological advances bring new challenges for computational biology and require the development of scalable, robust methods to deal with a wide range of problems varying from epidemiology to cancer studies. The first part of this work is focused on processing virus NGS data. It proposes algorithms that can facilitate the initial data analysis steps by filtering genetically related sequencing and the tool investigating intra-host virus diversity vital for biomedical research and epidemiology. The second part addresses single-cell data in cancer studies. It develops evolutionary cancer models involving new quantitative parameters of cancer subclones to understand the underlying processes of cancer development better

    Investigating Evolutionary History Using Phylogenomics

    Get PDF
    Reconstructing the Tree of Life is one of the principal aims of evolutionary biology. The development of molecular phylogenetics to elucidate evolutionary history has complemented palaeontology, biogeography, and archaeology in elucidating biological history. The development of molecular-clock analyses allowed evolutionary timescales to be estimated using nucleotide sequences and other products of the evolutionary process Until recently, the twin challenges of molecular dating were in obtaining sufficient data and developing robust methods. The former concern is now less important as high–throughput sequencing technology allows entire genomes to be sampled. Genome–scale data enhances statistical power, but accompanying this wealth of data is a new suite of analytical challenges. One of these key challenges is analysing these data in synthesis with the paleontological record without statistical overparameterisation. There are also aspects of the evolutionary process, such as among–lineage rate variation, that can affect the precision and accuracy of current methods. In this thesis, I first use the richest nucleotide sequence data set of insects available to estimate an authoritative insect evolutionary timescale that dates the origins and diversification of every major insect order. I then focus on molecular-clock methods by testing their performance in inferring evolutionary rates from time–structured data, common in the study of ancient DNA. I find that among–rate lineage variation and phylo–temporal clustering affect rate estimates. I also study data partitioning, a common technique used to optimise the analysis of multilocus data where independent parameters are applied across different subsets of the data. New data from the genomic revolution gifts biologists new opportunities to re-examine enduring questions about the evolutionary process. Here, I use phylogenetic tools to show that evolution leaves figurative fingerprints on genomes over millions of years

    Universal pacemaker of genome evolution.

    Get PDF
    A fundamental observation of comparative genomics is that the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The most straightforward explanation for the conservation of this distribution appears to be that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. This correlation could be explained by a model that we denoted Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. Alternatively, however, the correlation between the evolutionary rates of genes could be a simple consequence of molecular clock (MC). We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance, although similarly to the MC, the UPM is strongly overdispersed. Thus, the results of this analysis reveal a universal, genome-wide pacemaker of evolution that could have been in operation throughout the history of life
    corecore