14,595 research outputs found

    A new approach to bias correction in RNA-Seq

    Get PDF
    Motivation: Quantification of sequence abundance in RNA-Seq experiments is often conflated by protocol-specific sequence bias. The exact sources of the bias are unknown, but may be influenced by polymerase chain reaction amplification, or differing primer affinities and mixtures, for example. The result is decreased accuracy in many applications, such as de novo gene annotation and transcript quantification

    Synthetic Standards Combined With Error and Bias Correction Improve the Accuracy and Quantitative Resolution of Antibody Repertoire Sequencing in Human NaĆÆve and Memory B Cells

    Get PDF
    High-throughput sequencing of immunoglobulin (Ig) repertoires (Ig-seq) is a powerful method for quantitatively interrogating B cell receptor sequence diversity. When applied to human repertoires, Ig-seq provides insight into fundamental immunological questions, and can be implemented in diagnostic and drug discovery projects. However, a major challenge in Ig-seq is ensuring accuracy, as library preparation protocols and sequencing platforms can introduce substantial errors and bias that compromise immunological interpretation. Here, we have established an approach for performing highly accurate human Ig-seq by combining synthetic standards with a comprehensive error and bias correction pipeline. First, we designed a set of 85 synthetic antibody heavy-chain standards (in vitro transcribed RNA) to assess correction workflow fidelity. Next, we adapted a library preparation protocol that incorporates unique molecular identifiers (UIDs) for error and bias correction which, when applied to the synthetic standards, resulted in highly accurate data. Finally, we performed Ig-seq on purified human circulating B cell subsets (naĆÆve and memory), combined with a cellular replicate sampling strategy. This strategy enabled robust and reliable estimation of key repertoire features such as clonotype diversity, germline segment, and isotype subclass usage, and somatic hypermutation. We anticipate that our standards and error and bias correction pipeline will become a valuable tool for researchers to validate and improve accuracy in human Ig-seq studies, thus leading to potentially new insights and applications in human antibody repertoire profiling

    Identification and correction of systematic error in high-throughput sequence data

    Get PDF
    A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments

    Differential expression analysis for sequence count data

    Get PDF
    *Motivation:* High-throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq) or cell counting (barcode sequencing). Statistical inference of differential signal in such data requires estimation of their variability throughout the dynamic range. When the number of replicates is small, error modelling is needed to achieve statistical power.

*Results:* We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. 

*Availability:* A free open-source R software package, _DESeq_, is available from the Bioconductor project and from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq

    Profiling allele-specific gene expression in brains from individuals with autism spectrum disorder reveals preferential minor allele usage.

    Get PDF
    One fundamental but understudied mechanism of gene regulation in disease is allele-specific expression (ASE), the preferential expression of one allele. We leveraged RNA-sequencing data from human brain to assess ASE in autism spectrum disorder (ASD). When ASE is observed in ASD, the allele with lower population frequency (minor allele) is preferentially more highly expressed than the major allele, opposite to the canonical pattern. Importantly, genes showing ASE in ASD are enriched in those downregulated in ASD postmortem brains and in genes harboring de novo mutations in ASD. Two regions, 14q32 and 15q11, containing all known orphan C/D box small nucleolar RNAs (snoRNAs), are particularly enriched in shifts to higher minor allele expression. We demonstrate that this allele shifting enhances snoRNA-targeted splicing changes in ASD-related target genes in idiopathic ASD and 15q11-q13 duplication syndrome. Together, these results implicate allelic imbalance and dysregulation of orphan C/D box snoRNAs in ASD pathogenesis
    • ā€¦
    corecore