68 research outputs found

    Identification and correction of systematic error in high-throughput sequence data

    Get PDF
    A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments

    Functionally conserved enhancers with divergent sequences in distant vertebrates

    Get PDF
    Conserved transcription factor binding motifs in the five zebrafish/mouse syntenic enhancers. Identical n-mers (n ≥ 7) identified in the zebrafish, mouse, and human sequences of the five syntenic CNS were examined for the presence of transcription factor binding motifs; only motifs with E-value E ≤ 0.1 are shown. (XLSX 15 kb

    MetMap Enables Genome-Scale Methyltyping for Determining Methylation States in Populations

    Get PDF
    The ability to assay genome-scale methylation patterns using high-throughput sequencing makes it possible to carry out association studies to determine the relationship between epigenetic variation and phenotype. While bisulfite sequencing can determine a methylome at high resolution, cost inhibits its use in comparative and population studies. MethylSeq, based on sequencing of fragment ends produced by a methylation-sensitive restriction enzyme, is a method for methyltyping (survey of methylation states) and is a site-specific and cost-effective alternative to whole-genome bisulfite sequencing. Despite its advantages, the use of MethylSeq has been restricted by biases in MethylSeq data that complicate the determination of methyltypes. Here we introduce a statistical method, MetMap, that produces corrected site-specific methylation states from MethylSeq experiments and annotates unmethylated islands across the genome. MetMap integrates genome sequence information with experimental data, in a statistically sound and cohesive Bayesian Network. It infers the extent of methylation at individual CGs and across regions, and serves as a framework for comparative methylation analysis within and among species. We validated MetMap's inferences with direct bisulfite sequencing, showing that the methylation status of sites and islands is accurately inferred. We used MetMap to analyze MethylSeq data from four human neutrophil samples, identifying novel, highly unmethylated islands that are invisible to sequence-based annotation strategies. The combination of MethylSeq and MetMap is a powerful and cost-effective tool for determining genome-scale methyltypes suitable for comparative and association studies

    Single-Cell Transcriptome Analysis of CD34+ Stem Cell-Derived Myeloid Cells Infected With Human Cytomegalovirus

    Get PDF
    Myeloid cells are important sites of lytic and latent infection by human cytomegalovirus (CMV). We previously showed that only a small subset of myeloid cells differentiated from CD34+ hematopoietic stem cells is permissive to CMV replication, underscoring the heterogeneous nature of these populations. The exact identity of resistant and permissive cell types, and the cellular features characterizing the latter, however, could not be dissected using averaging transcriptional analysis tools such as microarrays and, hence, remained enigmatic. Here, we profile the transcriptomes of ∼7000 individual cells at day 1 post-infection using the 10× genomics platform. We show that viral transcripts are detectable in the majority of the cells, suggesting that virion entry is unlikely to be the main target of cellular restriction mechanisms. We further show that viral replication occurs in a small but specific sub-group of cells transcriptionally related to, and likely derived from, a cluster of cells expressing markers of Colony Forming Unit – Granulocyte, Erythrocyte, Monocyte, Megakaryocyte (CFU-GEMM) oligopotent progenitors. Compared to the remainder of the population, CFU-GEMM cells are enriched in transcripts with functions in mitochondrial energy production, cell proliferation, RNA processing and protein synthesis, and express similar or higher levels of interferon-related genes. While expression levels of the former are maintained in infected cells, the latter are strongly down-regulated. We thus propose that the preferential infection of CFU-GEMM cells may be due to the presence of a pre-established pro-viral environment, requiring minimal optimization efforts from viral effectors, rather than to the absence of specific restriction factors. Together, these findings identify a potentially new population of myeloid cells permissive to CMV replication, and provide a possible rationale for their preferential infection

    Genome methylation in D. melanogaster is found at specific short motifs and is independent of DNMT2 activity

    Get PDF
    Cytosine methylation in the genome of Drosophila melanogaster has been elusive and controversial: Its location and function have not been established. We have used a novel and highly sensitive genomewide cytosine methylation assay to detect and map genome methylation in stage 5 Drosophila embryos. The methylation we observe with this method is highly localized and strand asymmetrical, limited to regions covering ∼1% of the genome, dynamic in early embryogenesis, and concentrated in specific 5-base sequence motifs that are CA- and CT-rich but depleted of guanine. Gene body methylation is associated with lower expression, and many genes containing methylated regions have developmental or transcriptional functions. The only known DNA methyltransferase in Drosophila is the DNMT2 homolog MT2, but lines deficient for MT2 retain genomic methylation, implying the presence of a novel methyltransferase. The association of methylation with a lower expression of specific developmental genes at stage 5 raises the possibility that it participates in controlling gene expression during the maternal-zygotic transition

    CRISPR-Cas9 interrogation of a putative fetal globin repressor in human erythroid cells

    Get PDF
    Sickle Cell Disease and beta-thalassemia, which are caused by defective or deficient adult beta-globin (HBB) respectively, are the most common serious genetic blood diseases in the world. Persistent expression of the fetal beta-like globin, also known gamma-globin, can ameliorate both disorders by serving in place of the adult beta-globin as a part of the fetal hemoglobin tetramer (HbF). Here we use CRISPR-Cas9 gene editing to explore a potential gamma-globin silencer region upstream of the delta-globin gene identified by comparison of naturally-occurring deletion mutations associated with up-regulated gamma-globin. We find that deletion of a 1.7 kb consensus element or select 350 bp sub-regions from bulk populations of cells increases levels of HbF. Screening of individual sgRNAs in one sub-region revealed three single guides that caused increases gamma-globin expression. Deletion of the 1.7 kb region in HUDEP-2 clonal sublines, and in colonies derived from CD34+ hematopoietic stem/progenitor cells (HSPCs), does not cause significant up-regulation of gamma-globin. These data suggest that the 1.7 kb region is not an autonomous gamma-globin silencer, and thus by itself is not a suitable therapeutic target for gene editing treatment of beta-hemoglobinopathies.Peer reviewe

    The AGILE Mission

    Get PDF
    AGILE is an Italian Space Agency mission dedicated to observing the gamma-ray Universe. The AGILE's very innovative instrumentation for the first time combines a gamma-ray imager (sensitive in the energy range 30 MeV-50 GeV), a hard X-ray imager (sensitive in the range 18-60 keV), a calorimeter (sensitive in the range 350 keV-100 MeV), and an anticoincidence system. AGILE was successfully launched on 2007 April 23 from the Indian base of Sriharikota and was inserted in an equatorial orbit with very low particle background. Aims. AGILE provides crucial data for the study of active galactic nuclei, gamma-ray bursts, pulsars, unidentified gamma-ray sources, galactic compact objects, supernova remnants, TeV sources, and fundamental physics by microsecond timing. Methods. An optimal sky angular positioning (reaching 0.1 degrees in gamma- rays and 1-2 arcmin in hard X-rays) and very large fields of view (2.5 sr and 1 sr, respectively) are obtained by the use of Silicon detectors integrated in a very compact instrument. Results. AGILE surveyed the gamma- ray sky and detected many Galactic and extragalactic sources during the first months of observations. Particular emphasis is given to multifrequency observation programs of extragalactic and galactic objects. Conclusions. AGILE is a successful high-energy gamma-ray mission that reached its nominal scientific performance. The AGILE Cycle-1 pointing program started on 2007 December 1, and is open to the international community through a Guest Observer Program
    corecore