32 research outputs found

    Network-based methods for biological data integration in precision medicine

    Full text link
    [eng] The vast and continuously increasing volume of available biomedical data produced during the last decades opens new opportunities for large-scale modeling of disease biology, facilitating a more comprehensive and integrative understanding of its processes. Nevertheless, this type of modelling requires highly efficient computational systems capable of dealing with such levels of data volumes. Computational approximations commonly used in machine learning and data analysis, namely dimensionality reduction and network-based approaches, have been developed with the goal of effectively integrating biomedical data. Among these methods, network-based machine learning stands out due to its major advantage in terms of biomedical interpretability. These methodologies provide a highly intuitive framework for the integration and modelling of biological processes. This PhD thesis aims to explore the potential of integration of complementary available biomedical knowledge with patient-specific data to provide novel computational approaches to solve biomedical scenarios characterized by data scarcity. The primary focus is on studying how high-order graph analysis (i.e., community detection in multiplex and multilayer networks) may help elucidate the interplay of different types of data in contexts where statistical power is heavily impacted by small sample sizes, such as rare diseases and precision oncology. The central focus of this thesis is to illustrate how network biology, among the several data integration approaches with the potential to achieve this task, can play a pivotal role in addressing this challenge provided its advantages in molecular interpretability. Through its insights and methodologies, it introduces how network biology, and in particular, models based on multilayer networks, facilitates bringing the vision of precision medicine to these complex scenarios, providing a natural approach for the discovery of new biomedical relationships that overcomes the difficulties for the study of cohorts presenting limited sample sizes (data-scarce scenarios). Delving into the potential of current artificial intelligence (AI) and network biology applications to address data granularity issues in the precision medicine field, this PhD thesis presents pivotal research works, based on multilayer networks, for the analysis of two rare disease scenarios with specific data granularities, effectively overcoming the classical constraints hindering rare disease and precision oncology research. The first research article presents a personalized medicine study of the molecular determinants of severity in congenital myasthenic syndromes (CMS), a group of rare disorders of the neuromuscular junction (NMJ). The analysis of severity in rare diseases, despite its importance, is typically neglected due to data availability. In this study, modelling of biomedical knowledge via multilayer networks allowed understanding the functional implications of individual mutations in the cohort under study, as well as their relationships with the causal mutations of the disease and the different levels of severity observed. Moreover, the study presents experimental evidence of the role of a previously unsuspected gene in NMJ activity, validating the hypothetical role predicted using the newly introduced methodologies. The second research article focuses on the applicability of multilayer networks for gene priorization. Enhancing concepts for the analysis of different data granularities firstly introduced in the previous article, the presented research provides a methodology based on the persistency of network community structures in a range of modularity resolution, effectively providing a new framework for gene priorization for patient stratification. In summary, this PhD thesis presents major advances on the use of multilayer network-based approaches for the application of precision medicine to data-scarce scenarios, exploring the potential of integrating extensive available biomedical knowledge with patient-specific data

    Stratification of patient subgroups using high-dimensional and time-series observations

    Get PDF
    Precision medicine and patient stratification are expanding as a result of innovations in high-throughput technologies applied to clinical medicine. Stratification can explain differences in disease trajectories and outcomes in heterogeneous cohorts. Thus, approaches employed for patient treatment can be tailored by taking into account individual variabilities and specificities. This thesis focuses on clustering approaches and how they can be applied to both single time points and time-series high-dimensional data for the identification of disease subtypes defined by distinct mechanisms, also called endotypes, in complex and/or heterogeneous diseases. Multiple carefully selected clustering strategies were compared to highlight which would produce the most relevant stratification in terms of mathematical robustness and biological meaning, both of which quantified using standardised methods. More specifically, this strategy was applied to time-series multi-omics data from a cohort of patients with acute pancreatitis, an inflammatory disease of the pancreas. Using this high-dimensional multi-omics data as well as routine lab and clinical measurements, the cohort was stratified into four subgroups. Findings from the analysis of acute pancreatitis data showed that two of the four subgroups could be detected in another syndrome, acute respiratory distress syndrome, suggesting that inflammatory signatures are comparable between diseases. With the aim of applying these principles to other diseases and using preliminary results from other studies suggesting that relevant subgroups might be highlighted, data from inflammatory bowel disease and Parkinson's disease cohorts was analysed. Results from our analyses confirmed that disease knowledge could be gained using this approach. Work from this thesis provides novel approaches for the application and evaluation of stratification methods. Furthermore, results may constitute a basis for the development of tailored treatment approaches for acute pancreatitis, acute respiratory distress syndrome, inflammatory bowel disease and Parkinson’s disease. Also, the observation of commonalities between distinct inflammatory diseases will broaden the perspectives when analysing disease data and more specifically, in biomarker discovery and drug development processes

    Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials

    Get PDF
    INTRODUCTION: The Alzheimer's Disease Neuroimaging Initiative (ADNI) has continued development and standardization of methodologies for biomarkers and has provided an increased depth and breadth of data available to qualified researchers. This review summarizes the over 400 publications using ADNI data during 2014 and 2015. METHODS: We used standard searches to find publications using ADNI data. RESULTS: (1) Structural and functional changes, including subtle changes to hippocampal shape and texture, atrophy in areas outside of hippocampus, and disruption to functional networks, are detectable in presymptomatic subjects before hippocampal atrophy; (2) In subjects with abnormal ÎČ-amyloid deposition (AÎČ+), biomarkers become abnormal in the order predicted by the amyloid cascade hypothesis; (3) Cognitive decline is more closely linked to tau than AÎČ deposition; (4) Cerebrovascular risk factors may interact with AÎČ to increase white-matter (WM) abnormalities which may accelerate Alzheimer's disease (AD) progression in conjunction with tau abnormalities; (5) Different patterns of atrophy are associated with impairment of memory and executive function and may underlie psychiatric symptoms; (6) Structural, functional, and metabolic network connectivities are disrupted as AD progresses. Models of prion-like spreading of AÎČ pathology along WM tracts predict known patterns of cortical AÎČ deposition and declines in glucose metabolism; (7) New AD risk and protective gene loci have been identified using biologically informed approaches; (8) Cognitively normal and mild cognitive impairment (MCI) subjects are heterogeneous and include groups typified not only by "classic" AD pathology but also by normal biomarkers, accelerated decline, and suspected non-Alzheimer's pathology; (9) Selection of subjects at risk of imminent decline on the basis of one or more pathologies improves the power of clinical trials; (10) Sensitivity of cognitive outcome measures to early changes in cognition has been improved and surrogate outcome measures using longitudinal structural magnetic resonance imaging may further reduce clinical trial cost and duration; (11) Advances in machine learning techniques such as neural networks have improved diagnostic and prognostic accuracy especially in challenges involving MCI subjects; and (12) Network connectivity measures and genetic variants show promise in multimodal classification and some classifiers using single modalities are rivaling multimodal classifiers. DISCUSSION: Taken together, these studies fundamentally deepen our understanding of AD progression and its underlying genetic basis, which in turn informs and improves clinical trial desig

    An investigation of genomic instability and its impact on cancer development and heterogeneity

    Full text link
    Genomic instability (GIN), a genomic state facilitating large scale chromosomal rearrangements, is a hallmark of cancer. GIN can contribute to oncogenesis by disrupting genes, and leading to copy number aberrations (CNAs), the gain or loss of genomic segments. In this thesis I describe two projects linked by the overarching theme of GIN, outlined below: Project 1: Copy-number aberrations (CNAs) contribute to clonal diversity within cancer, with clinical implications. Breast cancer is one such example, but the effect of CNAs on gene expression in intra-tumour subclonal populations has not been properly characterised. Due to sequencing technology limits and lack of computational methods, it is difficult to assess CNAs at a subclonal level. Here, I have benchmarked the ‘InferCNV’ computational method and used it to infer single cell CNA profiles from 14 primary breast cancer single cell RNA-sequencing (scRNA-seq) datasets. I reveal diverse intratumoural heterogeneity involving at least four subclonal populations per tumour. Finally, I identify subclones with expression/CNA profiles indicative of metastatic potential, involving differential regulation of metastasis associated genes such as MUCL1, BST2 and IGFBP5. Project 2: High-grade serous ovarian cancer (HGSOC) is characterised by widespread GIN. Drivers of GIN include deficient DNA repair and amplification of Cyclin E1, however no major cause is known for one third of tumours. Deregulation of repetitive elements may contribute to GIN in HGSOC. It is difficult to investigate repetitive elements from sequencing data as they map to multiple places within the genome. I have quantified repetitive RNA in 99 high-grade serous ovarian cancer (HGSOC) and matched control RNA-seq datasets to determine their potential contribution to GIN. I identified retrotransposons which are deregulated in HGSOC, which may have been active during cancer development. Some of these retrotransposons were enriched at structural variant breakpoints, indicating potential causality. Finally, I identified retrotransposon-associated structural variants in proximity to deregulated oncogenes implicated in homologous DNA repair, which may have modulated their expression and contributed to cancer development. In summary, I have explored both a cause (retrotransposons) and consequence (CNA-based heterogeneity) of GIN in cancer, and shown how GIN can contribute to the modulation of cancer-associated genes which influence cancer development and outcomes

    Moving Beyond Genome-wide Association Studies

    Get PDF
    In the last two decades, thousands of genome-wide association studies (GWAS) have been published, describing hundreds of thousands of variant-trait associations across a diverse set of phenotypes. The ubiquity of these studies, however, does not mitigate their significant limitations, including the inability, in many cases, to illustrate the molecular mechanisms underlying these associations. To bridge this gap between association and biological function, a plethora of methodologies have been introduced that move beyond interrogation of the genome at the variant level. Transcriptome-wide association studies (TWAS) examine the association between imputed gene expression and traits of interest, and in doing so reduce the multiple testing burden that plagues GWAS while offering biological rationales for such associations. Many such methods have been introduced in the last five years, however most do not account for the uncertainty in genotype that arises from imputation. We present a new Bayesian TWAS method, inspired by the BayesR framework, that explicitly models well- and poorly-imputed variants under differing assumptions, allowing for more flexibility in the training step where models to predict gene expression values are built. This method is compared to existing methods using simulated data, demonstrating improved accuracy and power in certain scenarios as well as conservation of Type I error. Predictive performance versus elastic net, which is utilized by PrediXcan, a popular state-of- the-art TWAS method, is measured using real RNA sequencing (RNA-seq) data generated by the Depression Genes and Network (DGN) consortium. Chromosome conformation capture (3C) techniques have allowed for analysis of the spatial organization of chromatin within the cell nucleus, and the identification of regions that are in close 3-dimensional (3D) proximity provides insight into regulatory pathways that would be hidden from strictly 1-dimensional (1D) analyses such as GWAS or 1D epigenetic footprints similar to those generated by the ENCODE or Roadmap Epigenomics consortia. HiChIP and PLAC-seq (collectively referred to as HP) are emerging 3C technologies for studying genome-wide long-range chromatin interactions mediated by proteins of interest, enabling more sensitive and cost-efficient interrogation of protein-centric chromatin conformation compared to previous Hi-C methods. We present a stratified and weighted correlation metric, derived from normalized contact counts, for quantification of reproducibility in HP data. Our method is applied to multiple real datasets and is shown to outperform existing methods developed for data generated from Hi-C, a widely used genome-wide 3C technology. Furthermore, in a complex PLAC-seq dataset consisting of 11 samples from four types of human brain cells, our method demonstrates expected clustering of data that could not be reproduced using existing methods developed for Hi-C data. Continuing work in the arena of HP data analysis, we present HPTAD, a method for the identification of topologically associating domains (TADs) using HP data. TADs are contiguous regions of the genome characterized by a higher frequency of within-region interactions relative to between-region interactions; they are implicated in gene regulation and their disruption is associated with a variety of diseases, including cancer. We compare HPTAD to several publicly available tools used to identify TADs from Hi-C input data and demonstrate improved performance relative to “ground truth” TAD regions and boundaries in both mouse and human cell lines. Furthermore, we demonstrate excellent consistency between results obtained from biological replicates and also observe CTCF enrichment at TAD boundaries identified using HPTAD.Doctor of Philosoph

    Quantifying Human Dietary Change over the Last 30,000 Years

    Get PDF
    Dietary change has been linked to many aspects of human evolution over the last three million years, including tool use, brain size increase, aerobic capacity and gut biology. Furthermore, failure to adapt to dietary changes over the last 10,000 years has been implicated in a number of complex and chronic diseases including obesity, type II diabetes, some cancers and coronary heart disease. Such ‘diseases of modernity’ are more common in agrarian and industrial societies than among hunter-gatherers, and it has been argued that this is due to a mismatch between modern diets and the ancestral diets to which our metabolism should be optimised. The aims of this research have grown out of the qualitative studies that perpetuate narratives around human and hominin diets, particularly around the central theme of dietary mismatch and ‘paleo’-named diets. In this work, I investigate nutrient-level differences between modern post-industrial diets, modern hunter-gatherer diets, prehistoric (Palaeolithic, Neolithic and Bronze Age) diets reconstructed from archaeological data, clinical intervention diets, fad diets including The Paleo Diet, Keto Diet and Atkins Diet, fast food diets and milk. Using these data, I develop a hypothesis on the evolution of dietary choice. Modern diets are enriched for certain nutrients, for some of which we have strong taste avidities (e.g. sodium, sucrose, starch, certain fatty acids). By quantifying differences in inferred nutrient profiles between ancestral and modern diets, I examine the nutrients enriched in modern diets, the trajectories of nutrient composition change through time, what might be driving these changes, and why we have evolved taste preferences for some nutrients that in a modern setting are considered ‘unhealthy’. I also examine how nutrients correlate in ancestral foods and explore if avidities for nutrients enriched in modern diets would lead to healthy nutrient profiles in an ancestral setting

    Optimum Average Silhouette Width Clustering Methods

    Get PDF
    Cluster analysis is the search for groups of alike instances in the data. The two major problems in cluster analysis are: how many clusters are present in the data? And how can the actual clustering solution be found? We have developed a unified approach to estimate number of clusters and clustering solution mutually. This work is about theory, methodology and algorithm developed of newly proposed approach. // Average silhouette width (ASW) is a well-known index for measuring the clustering quality and for the estimation of the number of clusters. The index is in wide use across disciplines as standard practice for these tasks. In this work the clustering methodolo- gies is proposed that can itself estimate number of clusters on the fly, as well as produce the clustering against this estimated number by optimizing the ASW index. The performance of the ASW index for these two tasks are meticulously investigated. // ASW based clustering functions are proposed for the two most popular clustering domains i.e., hierarchical and non-hierarchical. The performance comparison for clustering solutions obtained from the proposed methods with a range of clustering methods has been done for the quality evaluation. // The performance comparison for the estimation of the number of clusters of the proposed methods has been made using a wide spectrum of cluster estimation indices and methods. For this, large scale studies for the estimation of the number of clusters have been conducted with well-reputed clustering methods to find out each method’s estimation performance with different indices/methods for various kinds of clustering structures. // Developing mathematical and theoretical aspects for clustering is a relatively new and challenging avenue. Recently this research domain has received considerable attention due to the present need and importance of theory of clustering. The purpose behind the theory development for clustering is to make the general nature of clustering more understandable without assuming particular data generating structures and independently from any clustering algorithm/functions. Lastly, a considerable amount of attention has been drawn towards the theory development of the ASW index in the latter part of the thesis

    Untying Gordian knots: The evolution and biogeography of the large European apomictic polyploid Ranunculus auricomus plant complex

    Get PDF
    Polyploidie, das Vorhandensein von zwei oder mehr vollstĂ€ndigen ChromosomensĂ€tzen, tritt wiederholt ĂŒber den gesamten Baum des Lebens auf. Bei Pflanzen ist die wirtschaftliche, aber vor allem auch die evolutionĂ€re Bedeutung ĂŒberwĂ€ltigend. Polyploidisierungen, wahrscheinlich verbunden mit SchlĂŒsselinnovationen (z.B. die Entwicklung der GefĂ€ĂŸelemente oder des Fruchtblattes), traten in der Evolution der BlĂŒtenpflanzen hĂ€ufig auf. BlĂŒtenpflanzen sind die artenreichste Gruppe im Pflanzenreich mit ca. 370,000 Arten und umfassen 30–70% Neopolyploide. Es wird angenommen, dass Polyploidie und Hybridisierung (Allopolyploidie) besonders zur Entstehung von Biotypen mit neuartiger genomischer Zusammensetzung beitragen und damit SchlĂŒsselfaktoren fĂŒr nachfolgende Artbildungen und Makroevolution sind. Bei Pflanzen sind beide Prozesse hĂ€ufig mit Apomixis, der Reproduktion ĂŒber asexuell gebildete Samen, verbunden. Das rĂ€tselhafte PhĂ€nomen der von Polyploidie und Apomixis begleiteten Artbildung ist jedoch trotz enormer Fortschritte auf dem Gebiet der Genomik noch immer kaum verstanden. Die Frage „Was ist eine Art?“ hat fĂŒr Evolutionsbiologen höchste PrioritĂ€t: Arten sind die Grundlage der BiodiversitĂ€tsforschung, und die evolutionĂ€re und ökologische Forschung stĂŒtzt sich auf gut definierte Einheiten. EvolutionĂ€r junge Artkomplexe bieten eine einzigartige Möglichkeit die Artbildung bei Pflanzen und deren begleitende Prozesse zu erforschen und zu verstehen. Sie umfassen meist wenige sexuelle Stammarten und zahlreiche polyploide, teilweise apomiktische, hybridogene Derivate. Das Fehlen von Rekombination und KreuzbestĂ€ubung in apomiktischen Linien kann zu einer Vielzahl klonaler Hybridlinien mit fixierten morphologischen und ökologischen Merkmalen fĂŒhren (Agamospezies). Selbst das Erkennen und Abgrenzen der sexuellen Stammarten ist aufgrund geringer genetischer Divergenz, eventuellen hybridogenen UrsprĂŒngen, stetigem Genfluss und/oder unvollstĂ€ndiger genetischer Auftrennung der Abstammungslinien (ILS) methodisch herausfordernd. Integrative AnsĂ€tze, die sowohl genomische als auch morphometrische Daten verwenden, um die jungen Stammarten aufzutrennen, fehlen bisher. Die Biogeographie und Evolution der Artkomplexe ist weitaus komplexer. Apomikten besetzen im Vergleich zu ihren sexuellen Verwandten hĂ€ufig grĂ¶ĂŸere Areale oder sind in nördlicheren Regionen verbreitet, ein PhĂ€nomen, das als Geographische Parthenogenese (GP) bezeichnet wird. GP-Muster haben meist einen pleistozĂ€nen Kontext. Klimatische Schwankungen in den gemĂ€ĂŸigten und borealen Zonen boten hĂ€ufig Möglichkeiten zur interspezifischen Hybridisierung, was wahrscheinlich auch zur Entstehung von Apomixis auf der Nordhalbkugel gefĂŒhrt hat. Faktoren, die diese Muster erzeugen, werden immer noch kontrovers diskutiert. GP-Muster wurden bisher oft den Vorteilen apomiktischer Populationen aufgrund von (Allo)polyploidie und uniparentaler Fortpflanzung zugeschrieben: Fixierte, hohe Heterozygotie fĂŒhrt zu einer erhöhten Stresstoleranz, und SelbstfertilitĂ€t bedingt eine bessere KolonisierungsfĂ€higkeit. Einerseits sind die komplexen Wechselwirkungen von genomweiter Heterozygotie, Ploidie, Reproduktionssmodi (sexuell versus asexuell) und klimatischer Umweltfaktoren auf GP-Muster nicht ausreichend untersucht worden, andererseits wurden potentielle Nachteile sexueller Stammarten aufgrund ihres Fortpflanzungssystems auf Fitness und genetische Vielfalt bisher kaum betrachtet. Schließlich sind neben der Biogeographie die retikulate Evolution und die genomische Zusammensetzung und Evolution junger, großer polyploider Pflanzenartenkomplexe noch nicht detailliert entschlĂŒsselt worden. Neben Herausforderungen, die auf eine hohe Anzahl an Polyploidisierungs- und Hybridisierungsereignissen zurĂŒckzufĂŒhren sind, werden bioinformatische Analysen oft durch fehlende Informationen zu sexuellen Stammarten, Ploidiegraden und Reproduktionsmodi erschwert. Der europĂ€ische, polyploid-apomiktische Ranunculus auricomus (Gold-Hahnenfuß) Pflanzenkomplex ist gut geeignet, um alle aufgeworfenen Fragestellungen zu untersuchen. Der Komplex entstand wahrscheinlich durch unzĂ€hlige Hybridisierungen weniger sexueller Stammarten. Bisher wurden mehr als 800 morphologisch sehr diverse Agamospezies (Derivate) beschriebenen. Die sexuellen Stammarten werden weniger als 1.0 Millionen Jahren alt geschĂ€tzt, und die Agamospezies sind wahrscheinlich noch viel jĂŒnger. In meiner Dissertation habe ich unter Verwendung des R. auricomus Komplexes als Modellsystem die bisher wenig verstandenen phylogenetischen, genomischen und biogeographischen Beziehungen junger, polyploider Pflanzengruppen untersucht. Ich habe einen umfassenden theoretischen und bioinformatischen Workflow entwicklelt, beginnend mit der Untersuchung der Evolution der sexuellen Stammarten, ĂŒber die EntschlĂŒsselung der Reproduktionsmodi und Biogeographie polyploid-apomiktischer Derivate bis hin zur Aufdeckung der retikulaten UrsprĂŒnge und Genomzusammensetzung und -evolution des Polyploidkomplexes. Diese Arbeit umfasst 251 Populationen und 87 R. auricomus Taxa europaweit. Die Analysen basieren auf 97,312 genomischen Loci (RADseq), 663 Kerngenen (target enrichment) und 71 Plastidenregionen, und 1,474 Blattploidie-, 4,669 Reproduktions- Samen-, 284 Kreuzungs- (Samenansatz), und 1,593 Morphometrie-Messungen. Phylogenomische Daten basierend auf RADseq, Kerngenen und geometrischer Morphometrie unterstĂŒtzten die Zusammenlegung der zwölf sexuellen Morphospezies in fĂŒnf neu klassifizierte Stammarten. Diese Arten stellen klar unterscheidbare genetische Hauptlinien oder Cluster dar, die sowohl geographisch gut isoliert als auch morphologisch klar differenziert sind: R. cassubicifolius s.l., R. envalirensis s.l., R. flabellifolius, R. marsicus und R. notabilis s.l. Enorme retikulate Beziehungen innerhalb der Kladen, die nicht-vorhandene geographische Isolation und das Fehlen markanter morphologischer Merkmale haben zu diesem taxonomischen Konzept gefĂŒhrt. Allopatrische Artbildungsereignisse fanden interessanterweise vor ca. 0.83–0.58 Millionen Jahren wĂ€hrend enormer klimatischer Schwankungen statt und wurden wahrscheinlich durch Vikarianzprozesse aus einer weit verbreiteten europĂ€ischen Stammart ausgelöst. DarĂŒber hinaus wurde die neue Umschreibung der sexuellen Stammarten durch Populationskreuzungsexperimente unterstĂŒtzt. Kreuzungen zeigten neben Inzuchtdepression, Auszuchtvorteilen und plötzlicher SelbstkompatibilitĂ€t auch völlig fehlende Reproduktionsbarrieren zwischen einigen Morphospezies. DarĂŒber hinaus wurden durchflusszytometrische Ploidy- und Reproduktions-, genomweite RADseq- und klimatische Umweltdaten in einer genetisch-informierten Pfadanalyse basierend auf Generalisierten Linearen Gemischten Modellen (GLMMs) kombiniert. Die Analyse hat ein komplexes europĂ€isches GP-Szenario aufgedeckt, in der Diploide im Vergleich zu Polyploiden eine signifikant höhere SexualitĂ€t (Prozent sexueller Samen), mehr BlĂŒtenblĂ€tter (petaloide NektarblĂ€tter) und bis zu dreimal weniger genomweite Heterozygotie zeigten. Die SexualitĂ€t war ĂŒberaschenderweise positiv mit Sonneneinstrahlung und IsothermalitĂ€t verbunden, und die Heterozygotie zeigte einen positiven Zusammenhang mit der TemperatursaisonalitĂ€t. Die Ergebnisse stimmen mit der sĂŒdlichen Verbreitung diploid-sexueller Populationen ĂŒberein und deuten auf eine höhere Resistenz polyploid-apomiktischer Populationen gegenĂŒber extremeren klimatischen Bedingungen hin. Ein neu entwickelter, multidisziplinĂ€rer Workflow, der alle bisherigen Daten einbezieht, deckte zum ersten Mal den weitestgehend allopolyploiden Ursprung und die Genomzusammensetzung und -evolution des R. auricomus Komplexes auf. Die Taxa waren in nur drei bis fĂŒnf unterstĂŒtzten, nord-sĂŒd verbreiteten Kladen oder Clustern organisiert, die jeweils meistens diploid-sexuelle Stammarten enthielten. Allopolyploidisierungsereignisse bezogen jeweils zwei bis drei verschiedene, diploid-sexuelle Subgenome ein. Es wurde nur ein autotetraploides Ereignis nachgewiesen. Allotetraploide Genome sind gekennzeichnet durch Subgenomdominanz und einer enormen Evolution nach ihrer Entstehung (z.B. Mendelsche Segregation der Hybridgenerationen, RĂŒckkreuzungen zu Elternarten und Genfluss aufgrund fakultativer SexualitĂ€t der Apomikten). Die ĂŒber 800 Taxa des europĂ€ischen R. auricomus-Komplexes sind vermutlich aus vier diploiden Stammarten und eine bisher unbekannte, aktuell wahrscheinlich ausgestorbene Stammart, entstanden. Analysen zeigten auch, dass die Mehrzahl der beschriebenen polyploiden Agamospezies nicht monophyletisch ist und Ă€hnliche Morphotypen wahrscheinlich mehrfach entstanden sind. Eine umfassende taxonomische Überarbeitung des gesamten Komplexes ist daher angebracht. In der Allgemeinen Diskussion kombiniere ich die Ergebnisse meiner Dissertation mit bereits existierenden Pflanzenstudien zur diploid-sexuellen und polyploid-apomiktischen Phylogenetik, Biogeographie und Genomzusammensetzung und -evolution junger Artkomplexe. Ich gebe zudem taxonomische Schlussfolgerungen und erklĂ€re wie Artkomplexe mikro- und makroevolutionĂ€re Prozesse miteinander verbinden. Abschließend gebe ich ein Fazit ĂŒber die Ergebnisse meiner Dissertation und einen Ausblick fĂŒr das laufende Forschungsprojekt und der Forschungsdisziplin der polyploiden Phylogenetik.Polyploidy, the presence of two or more full genomic complements, repeatedly occurs across the tree of life. In plants, not only the economic but particularly the evolutionary importance is overwhelming. Polyploidization events, probably connected to key innovations (e.g., vessel elements or the carpel), occurred frequently in the evolutionary history of flowering plants, which are the most species-rich group in the plant kingdom (ca. 370,000 species) and contain 30–70% neopolyploids. Polyploidy and hybridization (i.e., allopolyploidy) are particularly considered to create biotypes with novel genomic compositions and to be key factors for subsequent speciation and macroevolution. In plants, both processes are frequently connected to apomixis, i.e., the reproduction via asexually-formed seeds. However, the enigmatic phenomenon of plant speciation accompanied by polyploidy and apomixis is still poorly understood despite tremendous progress in the field of genomics. The question of “What is a species?” is of highest priority for evolutionary biologists: Species are the fundamental units for biodiversity, and further evolutionary and ecological research relies on well-defined entities. Evolutionarily young plant species complexes offer a unique opportunity to study plant speciation and accompanying processes. They usually comprise a few sexual progenitor species, and numerous polyploid, partly apomictic, hybrid derivatives. In apomictic lineages, the lack of recombination and cross-fertilization can result in numerous clonal lineages with fixed morphological and ecological traits (agamospecies). Nevertheless, even recognizing and delimiting the sexual progenitors of species complexes is methodically challenging due to low genetic divergence, possible hybrid origins, ongoing gene flow, and/or incomplete lineage sorting (ILS). Integrative approaches using both genomic and morphometric data for disentangling the young progenitors are still lacking so far. The biogeography and evolution of those plant complexes is even more challenging. Apomicts frequently occupy larger areas or more northern regions compared to their sexual relatives, a phenomenon called geographical parthenogenesis (GP). GP patterns usually have a Pleistocene context because climatic range shifts in temperate to boreal zones offered frequent opportunities for interspecific hybridization, probably giving rise to apomixis in the Northern Hemisphere. Factors shaping GP patterns are still controversially discussed. GP has been widely attributed to advantages of apomicts caused by polyploidy and uniparental reproduction, i.e., fixed levels of high heterozygosity leading to increased stress tolerance, and self-fertility leading to better colonizing capabilities. On the one hand, complex interactions of genome-wide heterozygosity, ploidy, reproduction mode (sexual versus asexual), and climatic environmental factors shaping GP have not been studied enough. On the other hand, potential disadvantages of sexual progenitors due to their breeding system on fitness and genetic diversity have received even less attention. Finally, alongside biogeography, the reticulate relationships and genome composition and evolution of young, large polyploid plant species complexes have not yet been deciphered comprehensively. Besides challenges attributed to numerous numbers of polyploidization and hybridization events, bioinformatic analyses are also often hampered by missing information on progenitors, ploidy levels, and reproduction modes. The European apomictic polyploid Ranunculus auricomus (goldilock buttercup) plant complex is well-suited to study all the aforementioned issues. The majority of goldilock buttercups probably arose from hybridization of a few sexual progenitors, leading to more than 800 described, morphologically highly diverse agamospecies. Sexuals are estimated to have speciated less than 1.0 million years ago, and agamospecies are probably much younger. In this thesis, using R. auricomus as a model system, I examined the recalcitrant and hitherto poorly understood phylogenetic, genomic, and biogeographical relationships of young polyploid apomictic plant complexes. I developed a comprehensive theoretical and bioinformatic workflow, starting with analyzing the evolution of the sexual progenitor species, continuing with unraveling reproduction modes and biogeography of apomictic polyploids, and ending up with revealing the reticulate origins and genome composition and evolution of the polyploid complex. Spanning up to 251 populations and 87 R. auricomus taxa Europe-wide, this work gathered data of 97,312 genomic loci (RADseq), 663 nuclear genes (target enrichment), and 71 plastid regions, and 1,474 leaf ploidy, 4,669 reproductive seed, 284 reproductive crossing (seed sets), as well as 1,593 geometric morphometric measurements. First of all, phylogenomics based on RADseq, nuclear gene, and geometric morphometric data supported the lumping of the twelve described sexual morphospecies into five newly circumscribed progenitor species. These species represent clearly distinguishable genetic main lineages or clusters, which are both well geographically isolated and morphologically differentiated: R. cassubicifolius s.l., R. envalirensis s.l., R. flabellifolius, R. marsicus, and R. notabilis s.l. Mainly within-clade reticulate relationships, missing geographical isolation, and a lack of distinctive morphological characters led to this taxonomic treatment. Interestingly, allopatric speciation events took place ca. 0.83–0.58 million years ago during a period of severe climatic oscillations, and were probably triggered by vicariance processes of a widespread European forest-understory ancestor. Sexual species re-circumscriptions were additionally supported by population crossing experiments. Besides inbreeding depression, outbreeding benefits, and sudden self-compatibility, crossings also revealed a lack of reproductive barriers among some of the formerly described morphospecies. Moreover, flow cytometric ploidy and reproductive, RADseq, and environmental data were combined into a genetically informed path analysis based on Generalized Linear Mixed Models (GLMMs). The analysis unveiled a complex European GP scenario, whereby diploids compared to polyploids showed significantly higher sexuality (percent of sexual seeds), more petals (petaloid nectary leaves), and up to three times less genome-wide heterozygosity. Surprisingly, sexuality was positively associated with solar radiation and isothermality, and heterozygosity was positively related to temperature seasonality. Results fit the southern distribution of diploid sexuals and suggest a higher resistance of polyploid apomicts to more extreme climatic conditions. Finally, a self-developed, multidisciplinary workflow incorporating all previously gathered data demonstrated, for the first time, the predominantly allopolyploid origin, genome composition, and post-origin genome evolution of the R. auricomus complex. Taxa were organized in only three to five supported, north-south distributed clades or cluster, each usually containing diploid sexual progenitor species. Allopolyploidizations involved two to three different diploid sexual subgenomes per event. Only one autotetraploid event was detected. Allotetraploids were characterized by subgenome dominance and enormous post-origin evolution, i.e., Mendelian segregation of hybrid generations, back-crossing to parents, and/or gene flow due to facultative sexuality of apomicts. Four diploid sexual progenitors and a previously unknown, nowadays extinct progenitor, probably gave rise to the more than 800 taxa of the European R. auricomus complex. Analyses also showed that the majority of analyzed polyploid agamospecies are non-monophyletic and similar morphotypes probably originated multiple times. The lack of monophyly suggests a comprehensive taxonomic revision of the entire complex. In the General Discussion, I combine my thesis results with existing plant studies on diploid sexual and polyploid apomictic phylogenetics, biogeography, and composition and genome evolution of young species complexes. I explain the taxonomic conclusions and how species complexes link micro- and macroevolutionary processes. Finally, I give conclusions of my thesis and an outlook of the project and the field of polyploid phylogenetics.2021-10-2
    corecore