2,059 research outputs found

    Inter-individual variation of the human epigenome & applications

    Get PDF

    Exploring missing heritability in neurodevelopmental disorders:Learning from regulatory elements

    Get PDF
    In this thesis, I aimed to solve part of the missing heritability in neurodevelopmental disorders, using computational approaches. Next to the investigations of a novel epilepsy syndrome and investigations aiming to elucidate the regulation of the gene involved, I investigated and prioritized genomic sequences that have implications in gene regulation during the developmental stages of human brain, with the goal to create an atlas of high confidence non-coding regulatory elements that future studies can assess for genetic variants in genetically unexplained individuals suffering from neurodevelopmental disorders that are of suspected genetic origin

    Towards personalized medicine for metastatic urothelial cancer

    Get PDF

    Charting genomic heterogeneity in tumours : from bulk to single cell

    Get PDF
    Tumours do not consist of a single homogeneous population but are complex heterogeneous systems that contain billions of ever-evolving cells with no two tumours being the same. Tumour heterogeneity is present at three levels, 1) inter-patient heterogeneity; 2) intra-patient heterogeneity; and 3) intra-tumour heterogeneity (ITH). Understanding all levels of heterogeneity is crucial for patient prognosis and treatment choice. To this end, we aimed to improve our understanding of all three levels of tumour heterogeneity. In paper I we investigated the prevalence, type, length, and genomic distribution of 853.218 somatic copy number alterations (SCNAs) across 20.249 tumours belonging to 32 cancer types. Based on the 1) number of SCNAs; 2) percentage of the genome altered; and 3) average SCNA size, we found high levels of inter-patient heterogeneity, both between and within cancer types. We found that specific chromosomes were preferentially lost or gained depending on cancer type. Lastly, we detected co-alterations of key oncogenes and TSGs. Taken together, we provided a comprehensive analysis on SCNAs across many cancer types as a valuable resource for the community. In paper II we sought to elucidate intra-patient heterogeneity in non-small cell lung cancer (NSCLC) and their matched brain metastasis (BM). We performed shallow wholegenome sequencing (WGS) on 51 primary NSCLC and matched BM, whole exome sequencing on 40 of the pairs, multi-region sequencing of 15 BMs, and shallow WGS on an additional cohort of 115 BMs. We showed that there is significant intra-patient heterogeneity at the SCNA level, with BM samples showing, on average, more SCNAs compared to their matched NSCLC. In contrast, multi-region sequencing of 15 BMs did not show significant ITH at the level of SCNAs. Finally, we identified putative metastatic driver SCNAs and singlenucleotide variants in key tumour suppressor genes (TSGs) and oncogenes. In paper III we aimed to assess the level of ITH in early localized prostate cancer. We performed organ-wide, multi-region, single-cell DNA sequencing on two prostate midsections. We found transient chromosomal instability (CIN) both in tumour and normal prostate tissue, evidenced by a large number of cells with unique chromosomal (arm) losses and or gains. Furthermore, we found three distinct groups of cells within the prostate: 1) diploid cells; 2) pseudo-diploid cells; and 3) monster cells. We observed an enrichment of diploid cells in normal regions and pseudo-diploid cells in tumour-rich regions, while monster cells were equally distributed over the entire prostate, again suggesting that there were elevated CIN levels across the prostate. Lastly, we detected highly localized subclones that were exclusive to tumour-rich regions and harboured deletions in TSGs that are known to be frequently deleted in prostate cancer. Taken together, with this thesis, I have contributed to advance the understanding of inter-patient, intra-patient, and intra-tumour heterogeneity

    Towards personalized medicine for metastatic urothelial cancer

    Get PDF

    Exploring missing heritability in neurodevelopmental disorders:Learning from regulatory elements

    Get PDF

    Inter-individual variation of the human epigenome & applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Long-Molecule Assessment of Ribosomal DNA and RNA

    Get PDF
    The genes encoding ribosomal RNA and their transcriptional products are essential for life, however, remain poorly understood. Even with the advent of long-range sequencing methodologies, rDNA loci are difficult to study and remain obscure, prompting the consideration of alternative methods to probing this critical region of the genome. The research outlined in this thesis utilises molecular combing, a fibre stretching technique, to isolate DNA molecules measuring more than 5 Mbp in length. The capture of DNA molecules of this size should assist in exploring the architecture of entire rDNA clusters at the single-molecule level. Combining molecular combing with SNP targeting probes, this study aims to distinguish and assess the arrangement of rDNA promoter variants which have been shown to exhibit dramatically different environmental sensitivity. Additionally, through the application of Oxford Nanopore Technologies direct RNA sequencing, the work here has demonstrated the capture of near full-length rRNA primary transcripts, which will allow for assessing post-transcriptional modification across the length of multiple coding subunits within a single molecule, for the first time. Furthermore, an exploration of RNA modification profiles across sample types representative of different developmental stages has been conducted. This study predicts many sites to be differentially modified across these different developmental conditions, several of which are known to be important for, if not crucial in ribosome biogenesis and function. The work outlined in this thesis provides a framework for future studies to conduct long-molecule, genetic, and epitranscriptome profiling of this vital region of the genome, and its dynamic response to a changing environment

    Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features.

    Get PDF
    The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia
    • …
    corecore