467 research outputs found

    Optimized design and data analysis of tag-based cytosine methylation assays

    Get PDF
    Genome-wide, tag-based cytosine methylation analysis is optimized

    High-resolution genome-wide cytosine methylation profiling with simultaneous copy number analysis and optimization for limited cell numbers

    Get PDF
    Many genome-wide assays involve the generation of a subset (or representation) of the genome following restriction enzyme digestion. The use of enzymes sensitive to cytosine methylation allows high-throughput analysis of this epigenetic regulatory process. We show that the use of a dual-adapter approach allows us to generate genomic representations that includes fragments of <200 bp in size, previously not possible when using the standard approach of using a single adapter. By expanding the representation to smaller fragments using HpaII or MspI, we increase the representation by these isoschizomers to more than 1.32 million loci in the human genome, representing 98.5% of CpG islands and 91.1% of refSeq promoters. This advance allows the development of a new, high-resolution version of our HpaII-tiny fragment Enrichment by Ligation-mediated PCR (HELP) assay to study cytosine methylation. We also show that the MspI representation generates information about copy-number variation, that the assay can be used on as little as 10 ng of DNA and that massively parallel sequencing can be used as an alternative to microarrays to read the output of the assay, making this a powerful discovery platform for studies of genomic and epigenomic abnormalities

    Cytosine Methylation Dysregulation in Neonates Following Intrauterine Growth Restriction

    Get PDF
    Perturbations of the intrauterine environment can affect fetal development during critical periods of plasticity, and can increase susceptibility to a number of age-related diseases (e.g., type 2 diabetes mellitus; T2DM), manifesting as late as decades later. We hypothesized that this biological memory is mediated by permanent alterations of the epigenome in stem cell populations, and focused our studies specifically on DNA methylation in CD34+ hematopoietic stem and progenitor cells from cord blood from neonates with intrauterine growth restriction (IUGR) and control subjects.Our epigenomic assays utilized a two-stage design involving genome-wide discovery followed by quantitative, single-locus validation. We found that changes in cytosine methylation occur in response to IUGR of moderate degree and involving a restricted number of loci. We also identify specific loci that are targeted for dysregulation of DNA methylation, in particular the hepatocyte nuclear factor 4alpha (HNF4A) gene, a well-known diabetes candidate gene not previously associated with growth restriction in utero, and other loci encoding HNF4A-interacting proteins.Our results give insights into the potential contribution of epigenomic dysregulation in mediating the long-term consequences of IUGR, and demonstrate the value of this approach to studies of the fetal origin of adult disease

    Integrative Model-based clustering of microarray methylation and expression data

    Full text link
    In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and discover biologically distinct groups. In this article we develop a model-based method for clustering such data. The basis of our method involves the construction of a likelihood for any given partition of the subjects. We introduce cluster specific latent indicators that, along with some standard assumptions, impose a specific mixture distribution on each cluster. Estimation is carried out using the EM algorithm. The methods extend naturally to multiple data types of a similar nature, which leads to an integrated analysis over multiple data platforms, resulting in higher discriminating power.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS533 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Widespread Hypomethylation Occurs Early and Synergizes with Gene Amplification during Esophageal Carcinogenesis

    Get PDF
    Although a combination of genomic and epigenetic alterations are implicated in the multistep transformation of normal squamous esophageal epithelium to Barrett esophagus, dysplasia, and adenocarcinoma, the combinatorial effect of these changes is unknown. By integrating genome-wide DNA methylation, copy number, and transcriptomic datasets obtained from endoscopic biopsies of neoplastic progression within the same individual, we are uniquely able to define the molecular events associated progression of Barrett esophagus. We find that the previously reported global hypomethylation phenomenon in cancer has its origins at the earliest stages of epithelial carcinogenesis. Promoter hypomethylation synergizes with gene amplification and leads to significant upregulation of a chr4q21 chemokine cluster and other transcripts during Barrett neoplasia. In contrast, gene-specific hypermethylation is observed at a restricted number of loci and, in combination with hemi-allelic deletions, leads to downregulatation of selected transcripts during multistep progression. We also observe that epigenetic regulation during epithelial carcinogenesis is not restricted to traditionally defined “CpG islands,” but may also occur through a mechanism of differential methylation outside of these regions. Finally, validation of novel upregulated targets (CXCL1 and 3, GATA6, and DMBT1) in a larger independent panel of samples confirms the utility of integrative analysis in cancer biomarker discovery

    Diametrically opposite methylome-transcriptome relationships in high- and low-CpG promoter genes in postmitotic neural rat tissue

    Get PDF
    DNA methylation can control some CpG-poor genes but unbiased studies have not found a consistent genome-wide association with gene activity outside of CpG islands or shores possibly due to use of cell lines or limited bioinformatics analyses. We performed reduced representation bisulfite sequencing (RRBS) of rat dorsal root ganglia encompassing postmitotic primary sensory neurons (n = 5, r > 0.99; orthogonal validation p < 10−19). The rat genome suggested a dichotomy of genes previously reported in other mammals: low CpG content (< 3.2%) promoter (LCP) genes and high CpG content (≥ 3.2%) promoter (HCP) genes. A genome-wide integrated methylome-transcriptome analysis showed that LCP genes were markedly hypermethylated when repressed, and hypomethylated when active with a 40% difference in a broad region at the 5′ of the transcription start site (p < 10−87 for -6000 bp to -2000 bp, p < 10−73 for -2000 bp to +2000 bp, no difference in gene body p = 0.42). HCP genes had minimal TSS-associated methylation regardless of transcription status, but gene body methylation appeared to be lost in repressed HCP genes. Therefore, diametrically opposite methylome-transcriptome associations characterize LCP and HCP genes in postmitotic neural tissue in vivo

    Inter-individual variation of the human epigenome &amp; applications

    Get PDF
    Genome-wide association studies (GWAS) have led to the discovery of genetic variants influencing human phenotypes in health and disease. However, almost two decades later, most human traits can still not be accurately predicted from common genetic variants. Moreover, genetic variants discovered via GWAS mostly map to the non-coding genome and have historically resisted interpretation via mechanistic models. Alternatively, the epigenome lies in the cross-roads between genetics and the environment. Thus, there is great excitement towards the mapping of epigenetic inter-individual variation since its study may link environmental factors to human traits that remain unexplained by genetic variants. For instance, the environmental component of the epigenome may serve as a source of biomarkers for accurate, robust and interpretable phenotypic prediction on low-heritability traits that cannot be attained by classical genetic-based models. Additionally, its research may provide mechanisms of action for genetic associations at non-coding regions that mediate their effect via the epigenome. The aim of this thesis was to explore epigenetic inter-individual variation and to mitigate some of the methodological limitations faced towards its future valorisation.Chapter 1 is dedicated to the scope and aims of the thesis. It begins by describing historical milestones and basic concepts in human genetics, statistical genetics, the heritability problem and polygenic risk scores. It then moves towards epigenetics, covering the several dimensions it encompasses. It subsequently focuses on DNA methylation with topics like mitotic stability, epigenetic reprogramming, X-inactivation or imprinting. This is followed by concepts from epigenetic epidemiology such as epigenome-wide association studies (EWAS), epigenetic clocks, Mendelian randomization, methylation risk scores and methylation quantitative trait loci (mQTL). The chapter ends by introducing the aims of the thesis.Chapter 2 focuses on stochastic epigenetic inter-individual variation resulting from processes occurring post-twinning, during embryonic development and early life. Specifically, it describes the discovery and characterisation of hundreds of variably methylated CpGs in the blood of healthy adolescent monozygotic (MZ) twins showing equivalent variation among co-twins and unrelated individuals (evCpGs) that could not be explained only by measurement error on the DNA methylation microarray. DNA methylation levels at evCpGs were shown to be stable short-term but susceptible to aging and epigenetic drift in the long-term. The identified sites were significantly enriched at the clustered protocadherin loci, known for stochastic methylation in neurons in the context of embryonic neurodevelopment. Critically, evCpGs were capable of clustering technical and longitudinal replicates while differentiating young MZ twins. Thus, discovered evCpGs can be considered as a first prototype towards universal epigenetic fingerprint, relevant in the discrimination of MZ twins for forensic purposes, currently impossible with standard DNA profiling. Besides, DNA methylation microarrays are the preferred technology for EWAS and mQTL mapping studies. However, their probe design inherently assumes that the assayed genomic DNA is identical to the reference genome, leading to genetic artifacts whenever this assumption is not fulfilled. Building upon the previous experience analysing microarray data, Chapter 3 covers the development and benchmarking of UMtools, an R-package for the quantification and qualification of genetic artifacts on DNA methylation microarrays based on the unprocessed fluorescence intensity signals. These tools were used to assemble an atlas on genetic artifacts encountered on DNA methylation microarrays, including interactions between artifacts or with X-inactivation, imprinting and tissue-specific regulation. Additionally, to distinguish artifacts from genuine epigenetic variation, a co-methylation-based approach was proposed. Overall, this study revealed that genetic artifacts continue to filter through into the reported literature since current methodologies to address them have overlooked this challenge.Furthermore, EWAS, mQTL and allele-specific methylation (ASM) mapping studies have all been employed to map epigenetic variation but require matching phenotypic/genotypic data and can only map specific components of epigenetic inter-individual variation. Inspired by the previously proposed co-methylation strategy, Chapter 4 describes a novel method to simultaneously map inter-haplotype, inter-cell and inter-individual variation without these requirements. Specifically, binomial likelihood function-based bootstrap hypothesis test for co-methylation within reads (Binokulars) is a randomization test that can identify jointly regulated CpGs (JRCs) from pooled whole genome bisulfite sequencing (WGBS) data by solely relying on joint DNA methylation information available in reads spanning multiple CpGs. Binokulars was tested on pooled WGBS data in whole blood, sperm and combined, and benchmarked against EWAS and ASM. Our comparisons revealed that Binokulars can integrate a wide range of epigenetic phenomena under the same umbrella since it simultaneously discovered regions associated with imprinting, cell type- and tissue-specific regulation, mQTL, ageing or even unknown epigenetic processes. Finally, we verified examples of mQTL and polymorphic imprinting by employing another novel tool, JRC_sorter, to classify regions based on epigenotype models and non-pooled WGBS data in cord blood. In the future, we envision how this cost-effective approach can be applied on larger pools to simultaneously highlight regions of interest in the methylome, a highly relevant task in the light of the post-GWAS era.Moving towards future applications of epigenetic inter-individual variation, Chapters 5 and 6 are dedicated to solving some of methodological issues faced in translational epigenomics.Firstly, due to its simplicity and well-known properties, linear regression is the starting point methodology when performing prediction of a continuous outcome given a set of predictors. However, linear regression is incompatible with missing data, a common phenomenon and a huge threat to the integrity of data analysis in empirical sciences, including (epi)genomics. Chapter 5 describes the development of combinatorial linear models (cmb-lm), an imputation-free, CPU/RAM-efficient and privacy-preserving statistical method for linear regression prediction on datasets with missing values. Cmb-lm provide prediction errors that take into account the pattern of missing values in the incomplete data, even at extreme missingness. As a proof-of-concept, we tested cmb-lm in the context of epigenetic ageing clocks, one of the most popular applications of epigenetic inter-individual variation. Overall, cmb-lm offer a simple and flexible methodology with a wide range of applications that can provide a smooth transition towards the valorisation of linear models in the real world, where missing data is almost inevitable. Beyond microarrays, due to its high accuracy, reliability and sample multiplexing capabilities, massively parallel sequencing (MPS) is currently the preferred methodology of choice to translate prediction models for traits of interests into practice. At the same time, tobacco smoking is a frequent habit sustained by more than 1.3 billion people in 2020 and a leading (and preventable) health risk factor in the modern world. Predicting smoking habits from a persistent biomarker, such as DNA methylation, is not only relevant to account for self-reporting bias in public health and personalized medicine studies, but may also allow broadening forensic DNA phenotyping. Previously, a model to predict whether someone is a current, former, or never smoker had been published based on solely 13 CpGs from the hundreds of thousands included in the DNA methylation microarray. However, a matching lab tool with lower marker throughput, and higher accuracy and sensitivity was missing towards translating the model in practice. Chapter 6 describes the development of an MPS assay and data analysis pipeline to quantify DNA methylation on these 13 smoking-associated biomarkers for the prediction of smoking status. Though our systematic evaluation on DNA standards of known methylation levels revealed marker-specific amplification bias, our novel tool was still able to provide highly accurate and reproducible DNA methylation quantification and smoking habit prediction. Overall, our MPS assay allows the technological transfer of DNA methylation microarray findings and models to practical settings, one step closer towards future applications.Finally, Chapter 7 provides a general discussion on the results and topics discussed across Chapters 2-6. It begins by summarizing the main findings across the thesis, including proposals for follow-up studies. It then covers technical limitations pertaining bisulfite conversion and DNA methylation microarrays, but also more general considerations such as restricted data access. This chapter ends by covering the outlook of this PhD thesis, including topics such as bisulfite-free methods, third-generation sequencing, single-cell methylomics, multi-omics and systems biology.<br/

    Inter-individual variation of the human epigenome &amp; applications

    Get PDF
    corecore