10 research outputs found

    Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression.

    Get PDF
    Single-cell RNA-sequencing (scRNA-seq) facilitates identification of new cell types and gene regulatory networks as well as dissection of the kinetics of gene expression and patterns of allele-specific expression. However, to facilitate such analyses, separating biological variability from the high level of technical noise that affects scRNA-seq protocols is vital. Here we describe and validate a generative statistical model that accurately quantifies technical noise with the help of external RNA spike-ins. Applying our approach to investigate stochastic allele-specific expression in individual cells, we demonstrate that a large fraction of stochastic allele-specific expression can be explained by technical noise, especially for lowly and moderately expressed genes: we predict that only 17.8% of stochastic allele-specific expression patterns are attributable to biological noise with the remainder due to technical noise

    Corrigendum: Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression.

    Get PDF
    Nature Communications 6: Article number: 8687 (2015); Published: 22 October 2015; Updated: 11 January 2016. The original version of this Article contained an error in the spelling of the author Tomislav Ilicic, which was incorrectly given as Tomislav Illicic. This has now been corrected in both the PDF and HTML versions of the Article.</jats:p

    Signatures of mutational processes in human cancer.

    Get PDF
    All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy

    Additional file 6: Figure S3. of Classification of low quality cells from single-cell RNA-seq data

    No full text
    Post-QC outliers and SVM performance evaluation. (A) Visualization of low and high quality cells after outlier detection with traditional and with our PCA feature-based methods (B) Schematic of nested cross-validation. The training set was split twice into 10 folds. The inner folds were important to estimate optimal hyperparameters, whereas the outer folds served to measure accuracy. Optimal hyperparameters were saved for later use. (C) Sensitivity and specificity of feature-based PCA and SVM using TPM values. (PDF 558 kb

    Additional file 7: Figure S4. of Classification of low quality cells from single-cell RNA-seq data

    No full text
    Datasets distant from mES training data. (A) Comparing log normalized UMI counts (y-axis) and log normalized read counts (x-axis) for each gene in 960 mESCs. (B) PCA of first two principal components of all features. Low quality cells separate from high quality cells. (C, D) PCA plot of features of two published human cancer cell datasets [28, 53]. Boxplots on the left and bottom show the top three features separating low from high quality cells for PC1 and PC2, respectively. They align with our previous findings that the mtDNA and ERCC to mapped reads ratios are upregulated in low quality cells. (E) Feature-based PCA combining mouse ES training set and two published human cancer datasets. ‘Cytoplasm’ separates not only the human from the mouse but also the two different cancer samples from each other, meaning that the features trained on mouse cells are not directly transferrable to human cancer cells. (PDF 591 kb

    Additional file 1: Figure S1. of Classification of low quality cells from single-cell RNA-seq data

    No full text
    Overview of single cell RNA sequencing datasets. (A) Total number of cells per dataset. (B) Number of high quality and low quality cells per dataset. (C) Proportion of each type of low quality cells (broken, empty, multiple). (D) Number of cells for 2i/LIF, alternative 2i/LIF, and serum/LIF condition for the training dataset (960 mESCs). (PDF 441 kb

    Additional file 5: Figure S2. of Classification of low quality cells from single-cell RNA-seq data

    No full text
    Additional technical features and subsets of data. Boxplots comparing (A) ratio of of duplicated reads/exonic (B) ratio spike-in/exonic expression between high quality and multiple, broken, empty cells. (C) PCA of features using only 25 % of data shows identical results compared to using all data. (D) Comparison of two microscopic images of a single C1 capturing site containing one intact and one deceptive cell, respectively. (PDF 1026 kb
    corecore