1,260 research outputs found

    Inter-individual variation of the human epigenome & applications

    Get PDF

    Human gut microbes’ transmission, persistence, and contribution to lactose tolerance

    Get PDF
    Human genotypes and their environment interact to produce selectable phenotypes. How microbes of the human gut microbiome interact with their host genotype to shape phenotype is not fully understood. Microbiota that inhabit the human body are environmentally acquired, yet many are passed intergenerationally between related family members, raising the possibility that they could act like genes. Here, I present three studies aimed at better understanding how certain gut microbiota contribute to host phenotypes. In a first study, I assessed mother to child transmission in understudied populations. I collected stool samples from 386 mother-infant pairs in Gabon and Vietnam, which are relatively under-studied for microbiome dynamics, and in Germany. Using metagenomic sequencing I characterized microbial strain diversity. I found that 25-50% of strains detected in mother-infant pairs were shared, and that strain-sharing between unrelated individuals was rare overall. These observations indicate that vertical transmission of microbes is widespread in human populations. Second, to test whether strains acquired during infancy persist into adulthood (similar to human genes), I collected stool from an adolescent previously surveyed for microbiome diversity as an infant. This dataset represents the longest follow-up to date for the persistence of strains seeded in infancy. I observed two strains that had persisted in the gut despite over 10 years passing, as well as 5 additional strains shared between the subject and his parents. Taken together, the results of these first two studies suggest that gut microbial strains persist throughout life and transmit between host-generations, dynamics more similar to those of the host’s own genome than of their environment. Third, I tested whether gut microbes could confer a phenotype (lactose tolerance) to individuals lacking the necessary genotypes (lactase persistence). I studied 784 women in Gabon, Vietnam and Germany for lactase persistence (genotype), lactose tolerance (phenotype), and characterized their gut microbiomes through metagenomic sequencing. Despite the genotype, I observed that 13% of participants were lactose tolerant by clinical criteria; I termed this novel phenotype microbially-acquired lactose tolerance (MALT). Those with MALT harbored microbiomes enriched for Bifidobacteria, a known lactose degrader. These results indicate that Bifidobacteria - which is passed intergenerationally - can confer a phenotype previously thought to be under only host genetic control. Taken together, my thesis work lends weight to the concept that specific microbes inhabiting the human gut have the potential to behave as epigenetic factors in evolution

    Charting genomic heterogeneity in tumours : from bulk to single cell

    Get PDF
    Tumours do not consist of a single homogeneous population but are complex heterogeneous systems that contain billions of ever-evolving cells with no two tumours being the same. Tumour heterogeneity is present at three levels, 1) inter-patient heterogeneity; 2) intra-patient heterogeneity; and 3) intra-tumour heterogeneity (ITH). Understanding all levels of heterogeneity is crucial for patient prognosis and treatment choice. To this end, we aimed to improve our understanding of all three levels of tumour heterogeneity. In paper I we investigated the prevalence, type, length, and genomic distribution of 853.218 somatic copy number alterations (SCNAs) across 20.249 tumours belonging to 32 cancer types. Based on the 1) number of SCNAs; 2) percentage of the genome altered; and 3) average SCNA size, we found high levels of inter-patient heterogeneity, both between and within cancer types. We found that specific chromosomes were preferentially lost or gained depending on cancer type. Lastly, we detected co-alterations of key oncogenes and TSGs. Taken together, we provided a comprehensive analysis on SCNAs across many cancer types as a valuable resource for the community. In paper II we sought to elucidate intra-patient heterogeneity in non-small cell lung cancer (NSCLC) and their matched brain metastasis (BM). We performed shallow wholegenome sequencing (WGS) on 51 primary NSCLC and matched BM, whole exome sequencing on 40 of the pairs, multi-region sequencing of 15 BMs, and shallow WGS on an additional cohort of 115 BMs. We showed that there is significant intra-patient heterogeneity at the SCNA level, with BM samples showing, on average, more SCNAs compared to their matched NSCLC. In contrast, multi-region sequencing of 15 BMs did not show significant ITH at the level of SCNAs. Finally, we identified putative metastatic driver SCNAs and singlenucleotide variants in key tumour suppressor genes (TSGs) and oncogenes. In paper III we aimed to assess the level of ITH in early localized prostate cancer. We performed organ-wide, multi-region, single-cell DNA sequencing on two prostate midsections. We found transient chromosomal instability (CIN) both in tumour and normal prostate tissue, evidenced by a large number of cells with unique chromosomal (arm) losses and or gains. Furthermore, we found three distinct groups of cells within the prostate: 1) diploid cells; 2) pseudo-diploid cells; and 3) monster cells. We observed an enrichment of diploid cells in normal regions and pseudo-diploid cells in tumour-rich regions, while monster cells were equally distributed over the entire prostate, again suggesting that there were elevated CIN levels across the prostate. Lastly, we detected highly localized subclones that were exclusive to tumour-rich regions and harboured deletions in TSGs that are known to be frequently deleted in prostate cancer. Taken together, with this thesis, I have contributed to advance the understanding of inter-patient, intra-patient, and intra-tumour heterogeneity

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    Inter-individual variation of the human epigenome & applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Comparative genomics of recent adaptation in Candida pathogens

    Full text link
    [eng] Fungal infections pose a serious health threat, affecting >1,000 million people and causing ~1.5 million deaths each year. The problem is growing due to insufficient diagnostic and therapeutic options, increased number of susceptible patients, expansion of pathogens partly linked to climate change and the rise of antifungal drug resistance. Among other fungal pathogens, Candida species are a major cause of severe hospital-acquired infections, with high mortality in immunocompromised patients. Various Candida pathogens constitute a public health issue, which require further efforts to develop new drugs, optimize currently available treatments and improve diagnostics. Given the high dynamism of Candida genomes, a promising strategy to improve current therapies and diagnostics is to understand the evolutionary mechanisms of adaptation to antifungal drugs and to the human host. Previous work using in vitro evolution, population genomics, selection inferences and Genome Wide Association Studies (GWAS) have partially clarified such recent adaptation, but various open questions remain. In the three research articles that conform this PhD thesis we addressed some of these gaps from the perspective of comparative genomics. First, we addressed methodological issues regarding the analysis of Candida genomes. Studying recent adaptation in these pathogens requires adequate bioinformatic tools for variant calling, filtering and functional annotation. Among other reasons, current methods are suboptimal due to limited accuracy to identify structural variants from short read sequencing data. In addition, there is a need for easy-to-use, reproducible variant calling pipelines. To address these gaps we developed the “personalized Structural Variation detection” pipeline (perSVade), a framework to call, filter and annotate several variant types, including structural variants, directly from reads. PerSVade enables accurate identification of structural variants in any species of interest, such as Candida pathogens. In addition, our tool automatically predicts the structural variant calling accuracy on simulated genomes, which informs about the reliability of the calling process. Furthermore, perSVade can be used to analyze single nucleotide polymorphisms and copy number-variants, so that it facilitates multi-variant, reproducible genomic studies. This tool will likely boost variant analyses in Candida pathogens and beyond. Second, we addressed open questions about recent adaptation in Candida, using perSVade for variant identification. On the one hand, we investigated the evolutionary mechanisms of drug resistance in Candida glabrata. For this, we used a large-scale in vitro evolution experiment to study adaptation to two commonly-used antifungals: fluconazole and anidulafungin. Our results show rapid adaptation to one or both drugs, with moderate fitness costs and through few mutations in a narrow set of genes. In addition, we characterize a novel role of ERG3 mutations in cross-resistance towards fluconazole in anidulafungin-adapted strains. These findings illuminate the mutational paths leading to drug resistance and cross-resistance in Candida pathogens. On the other hand, we reanalyzed ~2,000 public genomes and phenotypes to understand the signs of recent selection and drug resistance in six major Candida species: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis and C. orthopsilosis. We found hundreds of genes under recent selection, suggesting that clinical adaptation is diverse and complex. These involve species-specific but also convergently affected processes, such as cell adhesion, which could underlie conserved adaptive mechanisms. In addition, using GWAS we predicted known drivers of antifungal resistance alongside potentially novel players. Furthermore, our analyses reveal an important role of generally-overlooked structural variants, and suggest an unexpected involvement of (para)sexual recombination in the spread of resistance. Taken together, our findings provide novel insights on how Candida pathogens adapt to human-related environments and suggest candidate genes that deserve future attention. In summary, the results of this thesis improve our knowledge about the mechanisms of recent adaptation in Candida pathogens, which may enable improved therapeutic and diagnostic applications.[cat] Les infeccions fúngiques representen una greu amenaça per a la salut, afectant a més de 1.000 milions de persones i causant aproximadament 1,5 milions de morts cada any. El problema està augmentant a causa d’unes opcions terapèutiques i diagnòstiques insuficients, l'increment del nombre de pacients susceptibles, l'expansió dels patògens parcialment vinculada al canvi climàtic i l'augment de la resistència als fàrmacs antifúngics. D’entre diversos fongs patògens, els llevats del gènere Candida són una causa important d'infeccions nosocomials, amb una alta mortalitat en pacients immunodeprimits. Diverses espècies de Candida constitueixen un problema de salut pública, cosa que requereix més esforços per a desenvolupar nous medicaments, optimitzar els tractaments disponibles i millorar els diagnòstics. Tenint en compte el dinamisme genòmic d’aquests patògens, una estratègia prometedora per millorar les teràpies i diagnòstics actuals és comprendre els mecanismes evolutius d'adaptació als fàrmacs antifúngics i a l’hoste humà. Treballs anteriors utilitzant l'evolució in vitro, la genòmica de poblacions, les inferències de selecció i els estudis d'associació de genoma complet (GWAS, per les sigles en anglès) han aclarit parcialment aquesta adaptació recent, però encara hi ha diverses preguntes obertes. En els tres articles que conformen aquesta tesi doctoral, hem abordat algunes d'aquestes preguntes des de la perspectiva de la genòmica comparativa. En primer lloc, hem abordat qüestions metodològiques relatives a l'anàlisi dels genomes de les espècies Candida. L'estudi de l'adaptació recent en aquests patògens requereix eines bioinformàtiques adequades per a la detecció, filtratge i anotació funcional de variants genètiques. Entre altres raons, els mètodes actuals són subòptims a causa de la limitada precisió per identificar variants estructurals a partir de dades de seqüenciació amb lectures curtes. A més, hi ha una necessitat d’eines computacionals per a la detecció de variants que siguin senzilles d'utilitzar i reproduibles. Per abordar aquestes mancances, hem desenvolupat el mètode bioinformàtic "personalized Structural Variation detection" (perSVade), una eina que permet la detecció, filtratge i anotació de diversos tipus de variants, incloent-hi les variants estructurals, directament des de les lectures. PerSVade permet la identificació precisa de les variants estructurals en qualsevol espècie d'interès, com ara els patògens Candida. A més, la nostra eina prediu automàticament la precisió de la detecció d’aquestes variants en genomes simulats, la qual cosa informa sobre la fiabilitat del procés. Finalment, perSVade es pot utilitzar per analitzar altres tipus de variants, com els polimorfismes de nucleòtid únic o els canvis en el nombre de còpies, facilitant així estudis genòmics integrals i reproduibles. Aquesta eina probablement impulsarà les anàlisis genòmiques en els patògens Candida i també en altres espècies. En segon lloc, hem abordat algunes de les preguntes obertes sobre l'adaptació recent en els llevats Candida, utilitzant perSVade per a la identificació de variants. D'una banda, hem investigat els mecanismes evolutius de resistència als fàrmacs antifúngics en Candida glabrata. Per a això, hem utilitzat un experiment d'evolució in vitro a gran escala per estudiar l'adaptació a dos antifúngics comuns: el fluconazol i l’anidulafungina. Els nostres resultats mostren una adaptació ràpida a un o ambdós fàrmacs, amb un cost per al creixement moderat i a través de poques mutacions en un nombre reduït de gens. A més, hem caracteritzat un paper nou de les mutacions en ERG3 en la resistència creuada al fluconazol en soques adaptades a anidulafungina. Aquests descobriments aclareixen els processos mutacionals que condueixen a la resistència als fàrmacs i a la resistència creuada en els patògens Candida. D'altra banda, hem re-analitzat aproximadament 2.000 genomes i fenotips disponibles en repositoris públics per a comprendre els senyals genòmics de selecció recent i de resistència a fàrmacs antifúngics, en sis espècies rellevants de Candida: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis i C. orthopsilosis. Hem trobat centenars de gens sota selecció recent, suggerint que l'adaptació clínica és diversa i complexa. Aquests gens estan relacionats amb funcions específiques de cada espècie, però també trobem processos alterats de manera similar en diferents patògens, com per exemple l’adhesió cel·lular, cosa que indica fenòmens d’adaptació conservats. A part, utilitzant GWAS hem predit mecanismes esperats de resistència a antifúngics i també possibles nous factors. A més, les nostres anàlisis revelen un paper important de les variants estructurals, generalment poc estudiades, i suggereixen una implicació inesperada de la recombinació (para)sexual en la propagació de la resistència. En conjunt, els nostres descobriments proporcionen noves perspectives sobre com els patògens Candida s'adapten als entorns humans, i suggereixen gens candidats que mereixen investigacions futures. En resum, els resultats d’aquesta tesi milloren el nostre coneixement sobre els mecanismes d'adaptació recent en els patògens Candida, cosa que pot permetre el disseny de noves teràpies i diagnòstics

    PROBABILISTIC MODELING OF CHROMATIN INTERACTIONS

    Get PDF
    Higher-order chromatin architecture plays an important role in mammalian transcriptional regulation. However, understanding the mechanisms and impact of complex chromatin contacts remain challenging. In the past decade, breakthroughs in experimental techniques like Chromatin Conformation Capture (3C) assays enable genome-wide detection of chromatin interactions at high resolution. This provides great opportunities of computational modeling to predict functional interactions and identify target genes of disease variants. In this thesis, I started with critical assessment of an existing method for predicting enhancer-promoter interactions. I reported severe overfitting issues of it resulting from improper machine learning experimental design. I also found the limitation of resolution in their training datasets hinder accurate assignment of single regulatory element at interaction boundaries. In the second part, I developed a novel mathematical model to predict CTCF-mediated chromatin loops, which is the most prominent class of chromatin interactions, based on the biological hypothesis of loop extrusion. I showed that this model is capable of predicting CTCF loops measured by CTCF ChIA-PET data with high accuracy, using CTCF ChIP-seq alone as input. Furthermore, this model consistently predicts chromatin interaction frequency due to changes of CTCF binding site by genetic perturbation and looping-related protein factor degradation events. In the last part, I applied this computational framework to a greater set of ChIA-PET data. The analysis result inspired the development of a simple and interpretable for predicting enhancer-promoter interactions. I showed that this model outperforms existing methods on predicting CRISPRi hits that regulated gene expression. Overall, these approaches are applicable to diverse datasets to advance our understanding of chromatin interaction mechanisms as well as their implication in gene regulation and diseases

    Molecular signals of arms race evolution between RNA viruses and their hosts

    Get PDF
    Viruses are intracellular parasites that hijack their hosts’ cellular machinery to replicate themselves. This creates an evolutionary “arms race” between hosts and viruses, where the former develop mechanisms to restrict viral infection and the latter evolve ways to circumvent these molecular barriers. In this thesis, I explore examples of this virus-host molecular interplay, focusing on events in the evolutionary histories of both viruses and hosts. The thesis begins by examining how recombination, the exchange of genetic material between related viruses, expands the genomic diversity of the Sarbecovirus subgenus, which includes SARS-CoV responsible for the 2002 SARS epidemic and SARS-CoV-2 responsible for the COVID-19 pandemic. On the host side, I examine the evolutionary interaction between RNA viruses and two interferon-stimulated genes expressed in hosts. First, I show how the 2′-5′-oligoadenylate synthetase 1 (OAS1) gene of horseshoe bats (Rhinolophoidea), the reservoir host of sarbecoviruses, lost its anti-coronaviral activity at the base of this bat superfamily. By reconstructing the Rhinolophoidea common ancestor OAS1 protein, I first validate the loss of antiviral function and highlight the implications of this event in the virus-host association between sarbecoviruses and horseshoe bat hosts. Second, I focus on the evolution of the human butyrophilin subfamily 3 member A3 (BTN3A3) gene which restricts infection by avian influenza A viruses (IAV). The evolutionary analysis reveals that BTN3A3’s anti-IAV function was gained within the primates and that specific amino acid substitutions need to be acquired in IAVs’ NP protein to evade the human BTN3A3 activity. Gain of BTN3A3-evasion-conferring substitutions correlate with all major human IAV pandemics and epidemics, making these NP residues key markers for IAV transmissibility potential to humans. In the final part of the thesis, I present a novel approach for evaluating dinucleotide compositional biases in virus genomes. An application of my metric on the Flaviviridae virus family uncovers how ancestral host shifts of these viruses correlate with adaptive shifts in their genomes’ dinucleotide representation. Collectively, the contents of this thesis extend our understanding of how viruses interact with their hosts along their intertangled evolution and provide insights into virus host switching and pandemic preparedness

    The development of efficient hemi-autotrophic carbon fixation in Escherichia Coli

    Get PDF
    Carbon fixation is a process vital to any life and as by far its most prevalent variant, the Calvin Benson Bassham (CBB) cycle is vital to virtually all known terrestrial life. Mostly occurring in plants, it uses light energy to sequester atmospheric carbon dioxide (CO2) and convert it into biomass. As the most inefficient natural carboxylation process and source of most biomass documented, even a small increase of its performance could have vast downstream effects. Such a development could assimilate the abundantly available atmospheric CO2 while generating minimal amounts of waste for any biosynthesized product. The Escherichia coli bacterium was previously shown to functionally express the CBB cycle upon the addition of phosphoribulokinase (PRK) and ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCO). Further knock-outs severed its energetic metabolism from the carbon metabolism resulted CO2-dependent biomass accumulation. This carbon fixation is driven by the energy independently generated in the TCA cycle from a supply of pyruvate. This unique, split metabolism was dubbed hemi-autotrophy. The hemi-autotrophic strain of E. coli serves as a model organism for the CBB cycle, but lacking any of the difficulties of light-dependent or multi-cellular organisms. A pyrophosphate-dependent 6-phosphofructokinase (PFP) originating from Methylococcus capsulatus Bath was characterised as catalyzing three reactions of the typical CBB cycle. Where PRK completes its catalysis with a dependency on energy-carrier adenosine triphosphate (ATP), PFP was shown to complete this reaction with the less energetic pyrophosphate (PPi) that is partially generated in its FBPase and SBPase-equivalent reactions. Successful integration of this synthetic CBB cycle would conserve 33% of all ATP expended in the native CBB cycle. The hemi-autrophic E. coli strain’s unique culturing requirements proved challenging but methods with increased dependability were established. Transformations without the relief of these conditions remain elusive, requiring pre-cultures in rich media and heterotrophic metabolism. The consecutive sub-culturing of the strain to increase its hampered growth characteristics resulted in mild improvements. Despite observing modest culturing characteristic and a relatively high chromosomal mutation rate, the strain did not demonstrated an increase in transformation efficiency. The attempted replacements of the plasmid-encoded prkA by pfp did not result in hemi-autotrophic growth in any of its constructs, despite modulation of their expression. Troubled by high mutation rates, it remains unknown whether the expression range of the significantly less efficient PFP was sufficient or if the cytoplasmic availability of PPi remained below its functionally required concentration. The putative H+-pyrophosphatase pump (HPP), natively expressed as the second gene in the pfp-hpp operon, remains uncharacterised but its co-expression did not manage to compensate for this deficiency either. Though native fbp was successfully knocked-out, the essential inorganic pyrophosphatase gene of E. coli remains. Thorough analysis of the components in the CBB system led to several design improvements and pathway modelling indicates the proposed synthetic CBB cycle is a viable alternative to its natural variant. Thermodynamic feasibility of the synthetic pathway was confirmed and kinetic analysis also predicted it to perform at reduced efficiencies while still indicating culture viability. Growth rates approximating those of the hemi-autotrophic strain were produced in a kinetic model of the central carbon metabolism while incorporating minimal assumptions. Modifying it to support the synthetic CBB cycle suggested its viability at a nominal reduction of growth, while suggesting further directions of research for the system

    POPULATION GENOMICS OF THE GALAPAGOS ISLAND MOCKINGBIRDS AND IMPLICATION FOR CONSERVATION

    Get PDF
    Islands are considered as natural laboratories for the understanding of the evolutionary process of speciation. The very first muses of Darwin’s insights into evolution by natural selection were the Galapagos mockingbirds (Mimus spp.), a monophyletic group of four endemic species. Three species are restricted to a single island each whereas the fourth species occurs on (almost) all the other islands of the archipelago. These birds, known for their limited long-distance flying capabilities, are considered terrestrial species and serve as a clear example of allopatric evolution occurring on islands. The aim of my PhD research has been to unveil the evolutionary history of the Galapagos mockingbird species and its conservation implications using a whole-genome approach. Therefore, my research focused on generating a de novo reference genome within this monophyletic group in order to establish an adequate framework for subsequent genome-wide analyses (Chapter 2), and with it unveil the natural history of contrasting Galapagos mockingbird populations along the archipelago (Chapter 3). My findings have revealed that after the common ancestor of these species diverged, there was a systematic and directional spread of these species to the islands, which is directly related to the age of the islands. The geological history of the islands and anthropogenic factors have had different impacts on the demography and genetic variability of these species. Typically, smaller populations are more inbred and have higher rates of non-synonymous mutations becoming fixed. However, despite their extremely small sizes, the populations on Darwin, Wolf, and Floreana islands have maintained stable population sizes over many generations, indicating that the accumulation of these mutations has not had any impact on the fitness of these populations
    corecore