32 research outputs found

    Controlling for conservation in genome-wide DNA methylation studies

    Get PDF
    BACKGROUND: A commonplace analysis in high-throughput DNA methylation studies is the comparison of methylation extent between different functional regions, computed by averaging methylation states within region types and then comparing averages between regions. For example, it has been reported that methylation is more prevalent in coding regions as compared to their neighboring introns or UTRs, leading to hypotheses about novel forms of epigenetic regulation. RESULTS: We have identified and characterized a bias present in these seemingly straightforward comparisons that results in the false detection of differences in methylation intensities across region types. This bias arises due to differences in conservation rates, rather than methylation rates, and is broadly present in the published literature. When controlling for conservation at coding start sites the differences in DNA methylation rates disappear. Moreover, a re-evaluation of methylation rates at intronexon junctions reveals that the magnitude of previously reported differences is greatly exaggerated. We introduce two correction methods to address this bias, an inference-based matrix completion algorithm and an averaging approach, tailored to address different underlying biological questions. We evaluate how analysis using these corrections affects the detection of differences in DNA methylation across functional boundaries. CONCLUSIONS: We report here on a bias in DNA methylation comparative studies that originates in conservation rate differences and manifests itself in the false discovery of differences in DNA methylation intensities and their extents. We have characterized this bias and its broad implications, and show how to control for it so as to enable the study of a variety of biological questions

    Identification and correction of systematic error in high-throughput sequence data

    Get PDF
    A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments

    A diverse epigenetic landscape at human exons with implication for expression

    Get PDF
    DNA methylation is an important epigenetic marker associated with gene expression regulation in eukaryotes. While promoter methylation is relatively well characterized, the role of intragenic DNA methylation remains unclear. Here, we investigated the relationship of DNA methylation at exons and flanking introns with gene expression and histone modifications generated from a human fibroblast cell-line and primary B cells. Consistent with previous work we found that intragenic methylation is positively correlated with gene expression and that exons are more highly methylated than their neighboring intronic environment. Intriguingly, in this study we identified a unique subset of hypomethylated exons that demonstrate significantly lower methylation levels than their surrounding introns. Furthermore, we observed a negative correlation between exon methylation and the density of the majority of histone modifications. Specifically, we demonstrate that hypo-methylated exons at highly expressed genes are associated with open chromatin and have a characteristic histone code comprised of significantly high levels of histone markings. Overall, our comprehensive analysis of the human exome supports the presence of regulatory hypomethylated exons in protein coding genes. In particular our results reveal a previously unrecognized diverse and complex role of the epigenetic landscape within the gene body

    MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations

    Get PDF
    The ability to assay genome-scale methylation patterns using high-throughput sequencing makes it possible to carry out association studies to determine the relationship between epigenetic variation and phenotype. While bisulfite sequencing can determine a methylome at high resolution, cost inhibits its use in comparative and population studies. MethylSeq, based on sequencing of fragment ends produced by a methylation-sensitive restriction enzyme, is a method for methyltyping (survey of methylation states) and is a site-specific and cost-effective alternative to whole-genome bisulfite sequencing. Despite its advantages, the use of MethylSeq has been restricted by biases in MethylSeq data that complicate the determination of methyltypes. Here we introduce a statistical method, MetMap, that produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. MetMap integrates genome sequence information with experimental data, in a statistically sound and cohesive Bayesian Network. It infers the extent of methylation at individual CGs and across regions, and serves as a framework for comparative methylation analysis within and among species. We validated MetMap's inferences with direct bisulfite sequencing, showing that the methylation status of sites and islands is accurately inferred. We used MetMap to analyze MethylSeq data from four human neutrophil samples, identifying novel, highly unmethylated islands that are invisible to sequence-based annotation strategies. The combination of MethylSeq and MetMap is a powerful and cost-effective tool for determining genome-scale methyltypes suitable for comparative and association studies

    A Distinct Gene Module for Dysfunction Uncoupled from Activation in Tumor-Infiltrating T Cells

    Get PDF
    Reversing the dysfunctional T cell state that arises in cancer and chronic viral infections is the focus of therapeutic interventions; however, current therapies are effective in only some patients and some tumor types. To gain a deeper molecular understanding of the dysfunctional T cell state, we analyzed population and single-cell RNA profiles of CD8+tumor-infiltrating lymphocytes (TILs) and used genetic perturbations to identify a distinct gene module for T cell dysfunction that can be uncoupled from T cell activation. This distinct dysfunction module is downstream of intracellular metallothioneins that regulate zinc metabolism and can be identified at single-cell resolution. We further identify Gata-3, a zinc-finger transcription factor in the dysfunctional module, as a regulator of dysfunction, and we use CRISPR-Cas9 genome editing to show that it drives a dysfunctional phenotype in CD8+TILs. Our results open novel avenues for targeting dysfunctional T cell states while leaving activation programs intact

    Purine synthesis promotes maintenance of brain tumor initiating cells in glioma

    Get PDF
    Brain tumor initiating cells (BTICs), also known as cancer stem cells, hijack high-affinity glucose uptake active normally in neurons to maintain energy demands. Here we link metabolic dysregulation in human BTICs to a nexus between MYC and de novo purine synthesis, mediating glucose-sustained anabolic metabolism. Inhibiting purine synthesis abrogated BTIC growth, self-renewal and in vivo tumor formation by depleting intracellular pools of purine nucleotides, supporting purine synthesis as a potential therapeutic point of fragility. In contrast, differentiated glioma cells were unaffected by the targeting of purine biosynthetic enzymes, suggesting selective dependence of BTICs. MYC coordinated the control of purine synthetic enzymes, supporting its role in metabolic reprogramming. Elevated expression of purine synthetic enzymes correlated with poor prognosis in glioblastoma patients. Collectively, our results suggest that stem-like glioma cells reprogram their metabolism to self-renew and fuel the tumor hierarchy, revealing potential BTIC cancer dependencies amenable to targeted therapy

    Induction and transcriptional regulation of the co-inhibitory gene module in T cells

    Get PDF
    Expression of co-inhibitory receptors, such as CTLA-4 and PD-1, on effector T cells is a key mechanism for ensuring immune homeostasis. Dysregulated co-inhibitory receptor expression on CD4+ T cells promotes autoimmunity while sustained overexpression on CD8+ T cells promotes T cell dysfunction or exhaustion, leading to impaired ability to clear chronic viral infections and cancer1,2. Here, we used RNA and protein expression profiling at single-cell resolution to identify a module of co-inhibitory receptors that includes not only several known co-inhibitory receptors (PD-1, Tim-3, Lag-3, and TIGIT), but also a number of novel surface receptors. We functionally validated two novel co-inhibitory receptors, Activated protein C receptor (Procr) and Podoplanin (Pdpn). The module of co-inhibitory receptors is co-expressed in both CD4+ and CD8+ T cells and is part of a larger co-inhibitory gene program that is shared by non-responsive T cells in multiple physiological contexts and is driven by the immunoregulatory cytokine IL-27. Computational analysis identified the transcription factors Prdm1 and c-Maf as cooperative regulators of the co-inhibitory module, which we validated experimentally. This molecular circuit underlies the co-expression of co-inhibitory receptors in T cells and identifies novel regulators of T cell function with the potential to regulate autoimmunity and tumor immunity
    corecore