761 research outputs found

    Genomics and spatial surveillance of Chagas disease and American visceral leishmaniasis

    Get PDF
    The Trypanosomatidae are a family of parasitic protozoa that infect various animals and plants. Several species within the Trypanosoma and Leishmania genera also pose a major threat to human health. Among these are Trypanosoma cruzi and Leishmania infantum, aetiological agents of the highly debilitating and often deadly vector-borne zoonoses Chagas disease and American visceral leishmaniasis. Current treatment options are far from safe, only partially effective and rarely available in the impoverished regions of Latin America where these ‘neglected tropical diseases’ prevail. Wider-reaching, sustainable protection against T. cruzi and L. infantum might best be achieved by intercepting key routes of zoonotic transmission, but this prophylactic approach requires a better understanding of how these parasites disperse and evolve at various spatiotemporal scales. This dissertation addresses key questions around trypanosomatid parasite biology and spatial epidemiology based on high-resolution, geo-referenced DNA sequence datasets constructed from disease foci throughout Latin America: Which forms of genetic exchange occur in T. cruzi, and are exchange events frequent enough to significantly alter the distribution of important epidemiological traits? How do demographic histories, for example, the recent invasive expansion of L. infantum into the Americas, impact parasite population structure, and do structural changes pose a threat to public health? Can environmental variables predict parasite dispersal patterns at the landscape scale? Following the first chapter’s review of population genetic and genomic approaches in the study of trypanosomatid diseases in Latin America, Chapter 2 describes how reproductive polymorphism segregates T. cruzi populations in southern Ecuador. The study is the first to clearly demonstrate meiotic sex in this species, for decades thought to exchange genetic material only very rarely, and only by non-Mendelian means. T. cruzi subpopulations from the Ecuadorian study site exhibit all major hallmarks of sexual reproduction, including genome-wide Hardy-Weinberg allele frequencies, rapid decay of linkage disequilibrium with map distance and genealogies that fluctuate among chromosomes. The presence of sex promotes the transfer and transformation of genotypes underlying important epidemiological traits, posing great challenges to disease surveillance and the development of diagnostics and drugs. Chapter 3 demonstrates that mating events are also pivotal to L. infantum population structure in Brazil, where introduction bottlenecks have led to striking genetic discontinuities between sympatric strains. Genetic hybridization occurs genome-wide, including at a recently identified ‘miltefosine sensitivity locus’ that appears to be deleted from the majority of Brazilian L. infantum genomes. The study combines an array of genomic and phenotypic analyses to determine whether rapid population expansion or strong purifying selection has driven this prominent > 12 kb deletion to high abundance across Brazil. Results expose deletion size differences that covary with phylogenetic structure and suggest that deletion-carrying strains do not form a private monophyletic clade. These observations are inconsistent with the hypothesis that the deletion genotype rose to high prevalence simply as the result of a founder effect. Enzymatic assays show that loss of ecto-3’-nucleotidase gene function within the deleted locus is coupled to increased ecto-ATPase activity, raising the possibility that alternative metabolic strategies enhance L. infantum fitness in its introduced range. The study also uses demographic simulation modelling to determine whether L. infantum populations in the Americas have expanded from just one or multiple introduction events. Comparison of observed vs. simulated summary statistics using random forests suggests a single introduction from the Old World, but better spatial sampling coverage is required to rule out other demographic scenarios in a pattern-process modelling approach. Further sampling is also necessary to substantiate signs of convergent selection introduced above. Chapter 4 therefore develops a ‘genome-wide locus sequence typing’ (GLST) tool to summarize parasite genetic polymorphism at a fraction of genomic sequencing cost. Applied directly to the infection source (e.g., vector or host tissue), the method also avoids bias from cell purification and culturing steps typically involved prior to sequencing of trypanosomatid and other obligate parasite genomes. GLST scans genomic pilot data for hundreds of polymorphic sequence fragments whose thermodynamic properties permit simultaneous PCR amplification in a single reaction tube. For proof of principle, GLST is applied to metagenomic DNA extracts from various Chagas disease vector species collected in Colombia, Venezuela, and Ecuador. Epimastigote DNA from several T. cruzi reference clones is also analyzed. The method distinguishes 387 single-nucleotide polymorphisms (SNPs) in T. cruzi sub-lineage TcI and an additional 393 SNPs in non-TcI clones. Genetic distances calculated from these SNPs correlate with geographic distances among samples but also distinguish parasites from triatomines collected at common collection sites. The method thereby appears suitable for agent-based spatio-genetic (simulation) analyses left wanted by Chapter 3 – and further formulated in Chapter 5. The potential to survey parasite genetic diversity abundantly across landscapes compels deeper, more systematic exploration of how environmental variables influence the spread of disease. As environmental context is only marginally considered in the population genetic analyses of Chapters 2 – 4, Chapter 5 proposes a new, spatially explicit modelling framework to predict vector-borne parasite gene flow through heterogeneous environment. In this framework, remotely sensed environmental raster values are re-coded and merged into a composite ‘resistance surface’ that summarizes hypothesized effects of landscape features on parasite transmission among vectors and hosts. Parasite population genetic differentiation is then simulated on this surface and fitted to observed diversity patterns in order to evaluate original hypotheses on how environmental variables modulate parasite gene flow. The chapter thereby makes a maiden step from standard population genetic to ‘landscape genomic’ approaches in understanding the ecology and evolution of vector-borne disease. In summary, this dissertation first demonstrates the power of population genetics and genomics to understand fundamental biological properties of important protist parasites, then identifies areas where analytical tools are missing and creates new technical and conceptual frameworks to help fill these gaps. The general discussion (Chapter 6) also outlines several follow-up projects on the key finding of meiotic genetic signatures in T. cruzi. Exploiting recently developed T. cruzi genome-editing systems for the detection of meiotic gene expression and heterozygosis will help understand why and in which life cycle stage some parasite populations use sex and others do not. Long-read sequencing of parental and recombinant genomes will help understand the extent to which sex is diversifying T. cruzi phenotypes, especially virulence and drug resistance properties conferred by surface molecules with repetitive genetic bases intractable to short-read analysis. Chapter 6 also provides follow-up plans for all other research chapters. Emphasis is placed on advancing the complementarity, transferability and public health benefit of the many different methods and concepts employed in this work

    Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Daphnia </it>(Crustacea: Cladocera) plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. <it>Daphnia magna </it>is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP) marker development.</p> <p>Results</p> <p>We developed three expressed sequence tag (EST) libraries using clonal lineages of <it>D. magna </it>exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other <it>Daphnia </it>ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of <it>D. magna </it>distributed at regional scale.</p> <p>Conclusions</p> <p>A large proportion (47%) of the produced ESTs are <it>Daphnia </it>lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in <it>D. magna</it>.</p

    Unraveling Plague Ecology Through Vector and Host Genetics

    Get PDF
    The transmission of vector-borne diseases involves complex interactions between vectors and their host species. These complex host-parasite interactions can be difficult to study with traditional, field-based methods. My dissertation aims to use a population genomics approach to elucidate transmission pathways of plague among prairie dog colonies. Plague is a flea-borne, zoonotic disease caused by the bacterium Yersinia pestis. It is infamous for causing the Black Death (1347-1353), one of the most devastating pandemics in human history. Since its emergence in North America around 1900, plague has spread to native rodents, thus creating a sylvatic cycle. Prairie dogs (Cynomys spp.) are highly susceptible to the disease, experiencing \u3e90% mortality during outbreaks. Further, prairie dogs exacerbate the spread of plague by acting as an amplifying host, initiating epizootic events. In the first chapter of my dissertation, I examine how the landscape influences the connectivity of black-tailed prairie dog colonies in order to better understand the role of prairie dogs in plague transmission. I found that slope and bodies of water explain effective dispersal better than geographic distance alone. My second chapter describes patterns of connectivity for Oropsylla hirsuta, the main flea species found on prairie dogs and a known vector for plague. I compare those patterns of vector-mediated plague transmission to the host (prairie dogs) and potential alternative hosts [Northern grasshopper mice (Onychomys leucogaster) and deer mice (Peromyscus maniculatus)] to uncover alternative modes of transmission. I found that the best performing model used patterns of connectivity for prairie dogs and deer mice to explain patterns of connectivity for O. hirsuta. My third chapter uses both neutral and putatively adaptive loci to characterize patterns of genetic variation for the threatened Utah prairie dog in order to improve recovery efforts for this threatened species. I found low species-wide genetic variation and high population divergence among sampling sites, which suggests that this species is highly vulnerable to the effects of genetic drift. Overall, this dissertation not only improves the conservation and management of prairie dogs in light of devastating plague outbreaks, but also provides a more general population genomics framework suitable for elucidating transmission pathways of wildlife diseases

    Population genomics of the Asian tiger mosquito, Aedes albopictus. Insights into the recent worldwide invasion

    Get PDF
    Aedes albopictus, the “Asian tiger mosquito,” is an aggressive biting mosquito native to Asia that has colonized all continents except Antarctica during the last ~30–40 years. The species is of great public health concern as it can transmit at least 26 arboviruses, including dengue, chikungunya, and Zika viruses. In this study, using double- digest Restriction site-Associated DNA (ddRAD) sequencing, we developed a panel of ~58,000 single nucleotide polymorphisms (SNPs) based on 20 worldwide Ae. albopic-tus populations representing both the invasive and the native range. We used this genomic- based approach to study the genetic structure and the differentiation of Ae. albopictus populations and to understand origin(s) and dynamics of the recent inva-sions. Our analyses indicated the existence of two major genetically differentiated population clusters, each one including both native and invasive populations. The de-tection of additional genetic structure within each major cluster supports that these SNPs can detect differentiation at a global and local scale, while the similar levels of genomic diversity between native and invasive range populations support the scenario of multiple invasions or colonization by a large number of propagules. Finally, our re-sults revealed the possible source(s) of the recent invasion in Americas, Europe, and Africa, a finding with important implications for vector- control strategies

    Identifying barriers to gene flow and hierarchical conservation units from seascape genomics : a modelling framework applied to a marine predator

    Get PDF
    The ongoing decline of large marine vertebrates must be urgently mitigated, particularly under increasing levels of climate change and other anthropogenic pressures. However, characterizing the connectivity among populations remains one of the greatest challenges for the effective conservation of an increasing number of endangered species. Achieving conservation targets requires an understanding of which seascape features influence dispersal and subsequent genetic structure. This is particularly challenging for adult-disperser species, and when distribution-wide sampling is difficult. Here, we developed a two-step modelling framework to investigate how seascape features drive the genetic connectivity of marine species without larval dispersal, to better guide the design of marine protected area networks and corridors. We applied this framework to the endangered grey reef shark, Carcharhinus amblyrhynchos, a reef-associated shark distributed across the tropical Indo-Pacific. In the first step, we developed a seascape genomic approach based on isolation-by-resistance models involving circuit theory applied to 515 shark samples, genotyped for 4991 nuclear single-nucleotide polymorphisms. We show that deep oceanic areas act as strong barriers to dispersal, while proximity to habitat facilitates dispersal. In the second step, we predicted the resulting genetic differentiation across the entire distribution range of the species, providing both local and global-scale conservation units for future management guidance. We found that grey reef shark populations are more fragmented than expected for such a mobile species, raising concerns about the resilience of isolated populations under high anthropogenic pressures. We recommend the use of this framework to identify barriers to gene flow and to help in the delineation of conservation units at different scales, together with its integration across multiple species when considering marine spatial planning.Peer reviewe

    Infection dynamics, dispersal, and adaptation: understanding the lack of recovery in a remnant frog population following a disease outbreak

    Get PDF
    Emerging infectious diseases can cause dramatic declines in wildlife populations. Sometimes, these declines are followed by recovery, but many populations do not recover. Studying differential recovery patterns may yield important information for managing disease-afflicted populations and facilitating population recoveries. In the late 1980s, a chytridiomycosis outbreak caused multiple frog species in Australia's Wet Tropics to decline. Populations of some species (e.g., Litoria nannotis) subsequently recovered, while others (e.g., Litoria dayi) did not. We examined the population genetics and current infection status of L. dayi, to test several hypotheses regarding the failure of its populations to recover: (1) a lack of individual dispersal abilities has prevented recolonization of previously occupied locations, (2) a loss of genetic variation has resulted in limited adaptive potential, and (3) L. dayi is currently adapting to chytridiomycosis. We found moderate-to-high levels of gene flow and diversity (Fst range: <0.01-0.15; minor allele frequency (MAF): 0.192-0.245), which were similar to previously published levels for recovered L. nannotis populations. This suggests that dispersal ability and genetic diversity do not limit the ability of L. dayi to recolonize upland sites. Further, infection intensity and prevalence increased with elevation, suggesting that chytridiomycosis is still limiting the elevational range of L. dayi. Outlier tests comparing infected and uninfected individuals consistently identified 18 markers as putatively under selection, and several of those markers matched genes that were previously implicated in infection. This suggests that L. dayi has genetic variation for genes that affect infection dynamics and may be undergoing adaptation

    Methods For Robust Quantification Of Rna Alternative Splicing In Heterogeneous Rna-Seq Datasets

    Get PDF
    RNA alternative splicing is primarily responsible for transcriptome diversity and is relevant to human development and disease. However, current approaches to splicing quantication make simplifying assumptions which are violated when RNA sequencing data are heterogeneous. Influences from genetic and environmental background contribute to variability within a group of samples purported to represent the same biological condition. This work describes three methods which account for data heterogeneity when detecting differential RNA splicing between sample groups. First, a robust model is implemented for outlier detection within a group of purported replicates. Next, large RNA-seq datasets with high within-group variability are addressed with a statistical approach which retains power to detect changing splice junctions without sacricing specicity. Finally, applying these tools to call sQTLs in GTEx tissues has identified splicing variations associated with risk loci for cardiovascular disease and anomalous skeletal development. Each of these methods correctly handles the properties of heterogeneous RNA-seq data to improve precision and reduce false discovery rate

    Positive Selection in East Asians for an EDAR Allele that Enhances NF-ÎșB Activation

    Get PDF
    Genome-wide scans for positive selection in humans provide a promising approach to establish links between genetic variants and adaptive phenotypes. From this approach, lists of hundreds of candidate genomic regions for positive selection have been assembled. These candidate regions are expected to contain variants that contribute to adaptive phenotypes, but few of these regions have been associated with phenotypic effects. Here we present evidence that a derived nonsynonymous substitution (370A) in EDAR, a gene involved in ectodermal development, was driven to high frequency in East Asia by positive selection prior to 10,000 years ago. With an in vitro transfection assay, we demonstrate that 370A enhances NF-ÎșB activity. Our results suggest that 370A is a positively selected functional genetic variant that underlies an adaptive human phenotype

    Inter-individual variation of the human epigenome &amp; applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Inter-individual variation of the human epigenome &amp; applications

    Get PDF
    • 

    corecore