1,093 research outputs found

    Inter-individual variation of the human epigenome & applications

    Get PDF

    Exploring missing heritability in neurodevelopmental disorders:Learning from regulatory elements

    Get PDF
    In this thesis, I aimed to solve part of the missing heritability in neurodevelopmental disorders, using computational approaches. Next to the investigations of a novel epilepsy syndrome and investigations aiming to elucidate the regulation of the gene involved, I investigated and prioritized genomic sequences that have implications in gene regulation during the developmental stages of human brain, with the goal to create an atlas of high confidence non-coding regulatory elements that future studies can assess for genetic variants in genetically unexplained individuals suffering from neurodevelopmental disorders that are of suspected genetic origin

    Advanced sequencing technologies applied to human cytomegalovirus

    Get PDF
    The betaherpesvirus human cytomegalovirus (HCMV) is a ubiquitous viral pathogen. It is the most common cause of congenital infection in infants and of opportunistic infections in immunocompromised patients worldwide. The large double-stranded DNA genome of HCMV (236 kb) contains several genes that exhibit a high degree of variation among strains within an otherwise highly conserved sequence. These hypervariable genes encode immune escape, tropism or regulatory factors that may affect virulence. Variation arising from these genes and from an evolutionary history of recombination between strains has been hypothesised to be linked to disease severity. To investigate this, the HCMV genome has been scrutinised in detail over the years using a variety of molecular techniques, most looking only at one or a few of these genes at a time. The advent of high-throughput sequencing (HTS) technology 20 years ago then started to enable more in-depth whole-genome analyses. My study extends this field by using both HTS and the more recently developed long-read nanopore technology to determine HCMV genome sequences directly from clinical samples. Firstly, I used an Illumina HTS pipeline to sequence HCMV strains directly from formalin-fixed, paraffin-embedded (FFPE) tissues. FFPE samples are a valuable repository for the study of relatively rare diseases, such as congenital HCMV (cCMV). However, formalin fixation induces DNA fragmentation and cross-linking, making this a challenging sample type for DNA sequencing. I successfully sequenced five whole HCMV genomes from FFPE tissues. Next, I developed a pipeline utilising the single-molecule, long-read sequencer from Oxford Nanopore Technologies (ONT) to sequence HCMV initially from high-titre cellcultured laboratory strains and then from clinical samples with high HCMV loads. Finally, I utilised a direct RNA sequencing protocol with the ONT sequencer to characterise novel HCMV transcripts produced during infection in cell culture, demonstrating the existence of transcript isoforms with multiple splice sites. Overall, my findings demonstrate how advanced sequencing technologies can be used to characterise the genome and transcriptome of a large DNA virus, and will facilitate future studies on HCMV prognostic factors, novel antiviral targets and vaccine development

    Breeding Melons for Resistance to Viral and Fungal Diseases. Exploiting the Multi-Resistant Accession TGR-1551

    Full text link
    [ES] Las cucurbitáceas son la segunda familia de hortícolas más importante a nivel mundial, solo por detrás de las solanáceas. Tradicionalmente su cultivo se ha llevado a cabo en las zonas templadas del planeta. Sin embargo, las condiciones de cambio climático, el comercio internacional y los modelos de agricultura intensiva favorecen la aparición de nuevas virosis y enfermedades fúngicas en zonas donde antes no estaban presentes. En este sentido, resulta esencial el monitoreo periódico de las principales zonas productoras, para así poder detectar los virus y hongos emergentes en cada territorio y adaptar los programas de mejora a los objetivos específicos de cada zona. En el caso concreto del melón (Cucumis melo) existe una gran variabilidad intraespecífica que puede servir como fuente de alelos de resistencia frente a estos patógenos. Sin embargo, las fuentes de resistencia suelen encontrarse dentro del germoplasma silvestre, normalmente originario de África o Asia, y en el que el nivel de domesticación es reducido. Para un mejor aprovechamiento de las accesiones resistentes, resulta necesario un estudio del control genético de los caracteres de interés, que permita localizar las regiones asociadas a la resistencia y diseñar marcadores moleculares asociadas a las mismas. Esto facilita los programas de mejora orientados a la introgresión de las resistencias manteniendo el fondo genético de las variedades de interés En la presente tesis doctoral, durante las campañas de verano de 2019 y 2020, se ha llevado a cabo un estudio de la incidencia y diversidad genética de 9 especies virales potencialmente limitantes para el cultivo de cucurbitáceas en el sur este español. Se ha podido observar que los virus transmitidos por pulgones son prevalentes frente a los transmitidos por mosca blanca. Dentro del primer grupo destacó la presencia de watermelon mosaic virus (WMV), cucurbits aphid borne yellows virus (CABYV) y cucumber mosaic virus (CMV), ya que fueron detectados en todas las zonas y cultivos estudiados, apareciendo frecuentemente en infecciones mixtas. Moroccan watermelon mosaic virus (MWMV) y tomato leaf curl New Delhi virus (ToLCNDV) también fueron detectados en algunas zonas, pero con porcentajes de infección más bajos y normalmente en infecciones mixtas con WMV. Los análisis filogenéticos de los distintos aislados encontrados ha permitido la identificación de 7 nuevos perfiles moleculares de WMV y de aislados recombinantes de CMV, lo que es consistente con los resultados obtenidos en otros países y pone de manifiesto la gran variabilidad de estos patógenos. Las accesiones silvestres de melón recogidas en distintos bancos de germoplasma son un valioso recurso para los programas de mejora genética frente a estreses bióticos. La accesión africana TGR-1551 ha sido descrita previamente como resistente a WMV, CYSDV (cucurbit yellow stunting disorder virus), CABYV y el hongo Podosphaera xanthii (Px, razas 1, 2 y 5) agente causal del oídio en melón. Además, es tolerante a la mosca blanca (Bemisia tabaci) y portadora del gen Vat (virus aphid transmission), el cual limita la transmisión de virus por pulgón. Por lo tanto, esta accesión constituye una buena fuente de alelos de resistencia y, al poder utilizar un único parental donante, su uso acortaría los programas de mejora. En el marco de la presente tesis doctoral, mediante el desarrollo de poblaciones segregantes de mapeo y el aprovechamiento de las tecnologías de genotipado masivo se han podido cartografiar los QTLs asociados a la resistencia a CYSDV derivados de esta entrada. En el caso de la resistencia a CYSDV, se han detectado dos QTL en el cromosoma 5. El primero de ellos es de efecto mayor y herencia dominante, estando asociado al desarrollo de síntomas. El segundo QTL, de efecto menor y también de herencia dominante, no confiere resistencia por sí mismo y está asociado a la carga viral durante la infección. Siguiendo una estrategia similar se han podido cartografiar y estrecha[CA] Les cucurbitàcies són la segona família d'hortícoles més important a nivell mundial, només per darrere de les solanàcies. Tradicionalment el seu cultiu s'ha dut a terme a les zones temperades del planeta. No obstant això, les condicions de canvi climàtic, el comerç internacional i els models d'agricultura intensiva afavoreixen l'aparició de noves virosis i malalties fúngiques en zones on abans no estaven presents. En aquest sentit, resulta essencial el monitoratge periòdic de les principals zones productores, per a d'aquesta manera, poder detectar els virus i fongs emergents en cada territori i adaptar els programes de millora als objectius específics de cada zona. En el cas concret del meló (Cucumis melo) existeix una gran variabilitat intraespecífica que pot servir com a font d'al·lels de resistència enfront d'aquests patògens. No obstant això, les fonts de resistència solen trobar-se dins del germoplasma silvestre, normalment originari d'Àfrica o Àsia, i en el qual el nivell de domesticació és reduït. Per a un millor aprofitament de les accessions resistents, resulta necessari un estudi del control genètic dels caràcters d'interés, que permeta localitzar les regions associades a la resistència i dissenyar marcadors moleculars associats a aquestes. Això facilita els programes de millora orientats a la introgressió de les resistències mantenint el fons genètic de les varietats d'interés. En la present tesi doctoral, durant les campanyes d'estiu de 2019 i 2020, s'ha dut a terme un estudi de la incidència i diversitat genètica de nou espècies virals potencialment limitants per al cultiu de cucurbitàcies en el sud-est espanyol. S'ha pogut observar que els virus transmesos per pugons són prevalents enfront dels transmesos per mosca blanca. Dins del primer grup va destacar la presència de watermelon mosaic virus (WMV), cucurbits aphid born yellows virus (CABYV) i cucumber mosaic virus (CMV), ja que van ser detectats en totes les zones i cultius estudiats, apareixent sovint en infeccions mixtes. Moroccan watermelon mosaic virus (MWMV) i tomatoleaf curl New Delhi virus (ToLCNDV) també van ser detectats en algunes zones, però amb percentatges d'infecció més baixos i normalment en infeccions mixtes amb WMV. Les anàlisis filogenètiques dels diferents aïllats trobats ha permés la identificació de set nous perfils moleculars de WMV i d'aïllats recombinants de CMV, la qual cosa és consistent amb els resultats obtinguts en altres països i posa de manifest la gran variabilitat d'aquests patògens. Les accessions silvestres de meló recollides en diferents bancs de germoplasma són un valuós recurs per als programes de millora genètica enfront d'estressos biòtics. L'accessió africana *TGR-1551 ha sigut descrita prèviament com a resistent a WMV, CYSDV (cucurbit yellow stunting disorder virus), CABYV i el fong Podosphaera xanthii (Px, races 1, 2 i 5) agent causal de l'oïdi en meló. A més, és tolerant a la mosca blanca (Bemisia tabaci) i portadora del gen Vat (virus aphid transmission), el qual limita la transmissió de virus per pugó. Per tant, aquesta accessió constitueix una bona font d'al·lels de resistència i, en poder utilitzar un únic parental donant, el seu ús acurtaria els programes de millora. En el marc de la present tesi doctoral, mitjançant el desenvolupament de poblacions segregants de mapatge i l'aprofitament de les tecnologies de genotipat massiu s'ha pogut cartografiar els QTLs associats a la resistència a CYSDV derivats d'aquesta entrada. En el cas de la resistència a CYSDV, s'han detectat dues QTL en el cromosoma cinc. El primer d'ells és d'efecte major i herència dominant, estant associat al desenvolupament de símptomes. El segon QTL, d'efecte menor i també d'herència dominant, no confereix resistència per si mateix i està associat a la càrrega viral durant la infecció. Seguint una estratègia similar s'han pogut cartografiar i estrényer els *QTLs de resistència enfront de Px. En aquest cas es tracta d'una epistàsia dominant-re[EN] Cucurbits represent the second most important horticultural family worldwide, second only the Solanaceae family. Traditionally, their cultivation has been concentrated in temperate regions across the globe. However, climate change conditions, international trade, and intensive agricultural practices are contributing to the emergence of new viral and fungal diseases in regions where they were previously absent. In this regard, it is crucial to regularly monitor major production areas to detect emerging viruses and fungi specific to each region. This monitoring allows for the adaptation of breeding programs to the unique goals of each area. In the case of melon (Cucumis melo), it exists significant intraspecific variability that can serve as a source of resistance alleles against these pathogens. However, sources of resistance are often found within wild germplasm, typically originating from Africa or Asia, and characterized by limited domestication. To better utilize these resistant accessions, a study of the genetic control of desirable traits is necessary. This study aims to locate regions associated with resistance and design molecular markers linked to these regions. Such an approach streamlines breeding programs focused on introgressing resistance traits while preserving the genetic background of the desired varieties. During the summer campaigns of 2019 and 2020, this doctoral thesis conducted a study on the incidence and genetic diversity of nine viral species potentially affecting cucurbit cultivation in southeastern Spain. It was observed that viruses transmitted by aphids were more prevalent than those transmitted by whiteflies. Within the first group, the presence of watermelon mosaic virus (WMV), cucurbits aphid borne yellows virus (CABYV), and cucumber mosaic virus (CMV) stood out, as they were detected in all the studied areas and crops, often in mixed infections. Moroccan watermelon mosaic virus (MWMV) and tomato leaf curl New Delhi virus (ToLCNDV) were also detected in some areas but with lower infection percentages, typically in mixed infections with WMV. Phylogenetic analyses of the found isolates have identified seven new molecular profiles of WMV and recombinant CMV isolates, which is consistent with results from other countries, highlighting the extensive variability of these pathogens. Wild melon accessions preserved in various germplasm banks represent a valuable resource for breeding programs against biotic stresses. The African accession TGR-1551 has been previously described as resistant to WMV, CYSDV (cucurbit yellow stunting disorder virus), CABYV, and the fungus Podosphaera xanthii (Px, races 1, 2, and 5), which causes powdery mildew in melons. Additionally, it is tolerant to whiteflies (Bemisia tabaci) and carries the Vat gene (Virus Aphid Transmission), limiting virus transmission by aphids. Therefore, this accession constitutes as an excellent source of resistance alleles, and its use, as a single donor parent, can expedite breeding programs. Within the scope of this doctoral thesis, through the development of segregating mapping populations and the utilization of high-throughput genotyping technologies, the QTLs associated with CYSDV resistance from this accession have been mapped. In the case of CYSDV resistance, two QTLs have been detected on chromosome 5. The first of these, with major effects and dominant inheritance, is associated with symptom development. The second QTL, with minor effects and also dominant inheritance, does not confer resistance by itself and is linked to viral load during infection. A similar strategy was employed to map and narrow down the QTLs for resistance against Px. In this case, it involves a dominant-recessive epistasis, with the recessive gene located on chromosome 12 and the dominant gene on chromosome 5, specifically in the same region where the major CYSDV resistance QTL is located. Regarding resistance against WMV, previous studies conducted by the researchThis research was funded by the Spanish Ministerio de Ciencia e Innovación (MCIN/AEI/10.13039/501100011033), grant number PID2020-116055RB (C21 and C22), and by the Conselleria d’Educació, Investigació, Cultura i Esports de la Generalitat Valenciana, grant number PROMETEO/2021/072 (to promote excellence groups, cofinanced with FEDER funds). M.L. is a recipient of a predoctoral fellowship (PRE2018-083466) of the Spanish Ministerio de Ciencia, Innovación y Universidades co-financed with FSE funds.López Martín, M. (2023). Breeding Melons for Resistance to Viral and Fungal Diseases. Exploiting the Multi-Resistant Accession TGR-1551 [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/20206

    Exploring missing heritability in neurodevelopmental disorders:Learning from regulatory elements

    Get PDF

    Inter-individual variation of the human epigenome & applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Developing methods to assess evolutionary and functional equivalence of single nucleotide variants for improved clinical interpretation of human genetic variation

    Get PDF
    With advancements in sequencing technology there has been an unprecedented rise in human single nucleotide variant data in recent years. One of the key challenges within clinical genetics is distinguishing truly pathogenic from rare but benign variants. Many in silico tools have been developed with this aim but they often over predict pathogenicity particularly on novel variants. Here, I demonstrate how a framework designed to identify variants with functional equivalence by using information from variants in known related genes can help pathogenic variant interpretation. Using sequence alignments of human paralogues, known pathogenic variants within aligned positions can be used to transfer their annotations across to aligned variants. This Paralogue Annotation method is shown to be widely applicable exome-wide, with 71% of disease genes having at least one paralogue. As a classifier it performs more precisely than other contemporary variant predictors, having a precision of 94% or higher depending on the data. This however comes at the cost of limited sensitivity (17% and lower). But this is rescued when the framework was improved by altering the alignments to protein domains instead of whole gene sequences. The sensitivity was increased by 74% with a marginal 6% precision decrease. By expanding the framework to explore the usage of structural protein alignments instead of sequence alignments there is potential to further improve sensitivity, but current limited structural data means that predicted protein models must be relied on leading to further assumptions to be taken. In structural space, pathogenic variants across aligned models are statistically more likely to be closer together than benign and pathogenic variants. This framework can be used as a precise pathogenic variant classifier in sequence space, but overall, it can be used to search for functionally equivalent variants to variants of interest, which is a line of information not used by many.Open Acces
    corecore