79 research outputs found

    Identification and correction of systematic error in high-throughput sequence data

    Get PDF
    A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments

    MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations

    Get PDF
    The ability to assay genome-scale methylation patterns using high-throughput sequencing makes it possible to carry out association studies to determine the relationship between epigenetic variation and phenotype. While bisulfite sequencing can determine a methylome at high resolution, cost inhibits its use in comparative and population studies. MethylSeq, based on sequencing of fragment ends produced by a methylation-sensitive restriction enzyme, is a method for methyltyping (survey of methylation states) and is a site-specific and cost-effective alternative to whole-genome bisulfite sequencing. Despite its advantages, the use of MethylSeq has been restricted by biases in MethylSeq data that complicate the determination of methyltypes. Here we introduce a statistical method, MetMap, that produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. MetMap integrates genome sequence information with experimental data, in a statistically sound and cohesive Bayesian Network. It infers the extent of methylation at individual CGs and across regions, and serves as a framework for comparative methylation analysis within and among species. We validated MetMap's inferences with direct bisulfite sequencing, showing that the methylation status of sites and islands is accurately inferred. We used MetMap to analyze MethylSeq data from four human neutrophil samples, identifying novel, highly unmethylated islands that are invisible to sequence-based annotation strategies. The combination of MethylSeq and MetMap is a powerful and cost-effective tool for determining genome-scale methyltypes suitable for comparative and association studies

    Genome methylation in D. melanogaster is found at specific short motifs and is independent of DNMT2 activity

    Get PDF
    Cytosine methylation in the genome of Drosophila melanogaster has been elusive and controversial: Its location and function have not been established. We have used a novel and highly sensitive genomewide cytosine methylation assay to detect and map genome methylation in stage 5 Drosophila embryos. The methylation we observe with this method is highly localized and strand asymmetrical, limited to regions covering ∼1% of the genome, dynamic in early embryogenesis, and concentrated in specific 5-base sequence motifs that are CA- and CT-rich but depleted of guanine. Gene body methylation is associated with lower expression, and many genes containing methylated regions have developmental or transcriptional functions. The only known DNA methyltransferase in Drosophila is the DNMT2 homolog MT2, but lines deficient for MT2 retain genomic methylation, implying the presence of a novel methyltransferase. The association of methylation with a lower expression of specific developmental genes at stage 5 raises the possibility that it participates in controlling gene expression during the maternal-zygotic transition

    Novel Protein Kinase Signaling Systems Regulating Lifespan Identified by Small Molecule Library Screening Using Drosophila

    Get PDF
    Protein kinase signaling cascades control most aspects of cellular function. The ATP binding domains of signaling protein kinases are the targets of most available inhibitors. These domains are highly conserved from mammals to flies. Herein we describe screening of a library of small molecule inhibitors of protein kinases for their ability to increase Drosophila lifespan. We developed an assay system which allowed screening using the small amounts of materials normally present in commercial chemical libraries. The studies identified 17 inhibitors, the majority of which targeted tyrosine kinases associated with the epidermal growth factor receptor (EGFR), platelet-derived growth factor (PDGF)/vascular endothelial growth factor (VEGF) receptors, G-protein coupled receptor (GPCR), Janus kinase (JAK)/signal transducer and activator of transcription (STAT), the insulin and insulin-like growth factor (IGFI) receptors. Comparison of the protein kinase signaling effects of the inhibitors in vitro defined a consensus intracellular signaling profile which included decreased signaling by p38MAPK (p38), c-Jun N-terminal kinase (JNK) and protein kinase C (PKC). If confirmed, many of these kinases will be novel additions to the signaling cascades known to regulate metazoan longevity

    Statin Treatment Increases Lifespan and Improves Cardiac Health in Drosophila by Decreasing Specific Protein Prenylation

    Get PDF
    Statins such as simvastatin are 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase inhibitors and standard therapy for the prevention and treatment of cardiovascular diseases in mammals. Here we show that simvastatin significantly increased the mean and maximum lifespan of Drosophila melanogaster (Drosophila) and enhanced cardiac function in aging flies by significantly reducing heart arrhythmias and increasing the contraction proportion of the contraction/relaxation cycle. These results appeared independent of internal changes in ubiquinone or juvenile hormone levels. Rather, they appeared to involve decreased protein prenylation. Simvastatin decreased the membrane association (prenylation) of specific small Ras GTPases in mice. Both farnesyl (L744832) and type 1 geranylgeranyl transferase (GGTI-298) inhibitors increased Drosophila lifespan. These data are the most direct evidence to date that decreased protein prenylation can increase cardiac health and lifespan in any metazoan species, and may explain the pleiotropic (non-cholesterol related) health effects of statins

    Caloric Restriction Impacts Plasma Micrornas In Rhesus Monkeys

    No full text
    Caloric restriction (CR) is one of the most robust interventions shown to delay aging in diverse species, including rhesus monkeys (Macaca mulatta). Identification of factors involved in CR brings a promise of translatability to human health and aging. Here, we show that CR induced a profound change in abundance of circulating microRNAs (miRNAs) linked to growth and insulin signaling pathway, suggesting that miRNAs are involved in CR\u27s mechanisms of action in primates. Deep sequencing of plasma RNA extracts enriched for short species revealed a total of 243 unique species of miRNAs including 47 novel species. Approximately 70% of the plasma miRNAs detected were conserved between rhesus monkeys and humans. CR induced or repressed 24 known and 10 novel miRNA species. Regression analysis revealed correlations between bodyweight, adiposity, and insulin sensitivity for 10 of the CR-regulated known miRNAs. Sequence alignment and target identification for these 10 miRNAs identify a role in signaling downstream of the insulin receptor. The highly abundant miR-125a-5p correlated positively with adiposity and negatively with insulin sensitivity and was negatively regulated by CR. Putative target pathways of CR-associated miRNAs were highly enriched for growth and insulin signaling that have previously been implicated in delayed aging. Clustering analysis further pointed to CR-induced miRNA regulation of ribosomal, mitochondrial, and spliceosomal pathways. These data are consistent with a model where CR recruits miRNA-based homeostatic mechanisms to coordinate a program of delayed aging
    • …
    corecore