11,658 research outputs found
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications
Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq
Recommended from our members
scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data. RESULTS: We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools. CONCLUSIONS: scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization
Interactive single cell RNA-Seq analysis with Single Cell Toolkit (SCTK)
I will present the Single Cell Toolkit (SCTK), an R package and interactive single cell RNA-sequencing (scRNA-Seq) analysis package that provides the first complete workflow for scRNA-Seq data analysis and visualization using a set of R functions and an interactive web interface. Users can perform analysis with modules for filtering raw results, clustering, batch correction, differential expression, pathway enrichment, and scRNA-Seq study design. The toolkit supports command line or pipeline data processing, and results can be loaded into the GUI for additional exploration and downstream analysis. We demonstrate the effectiveness of the SCTK on multiple scRNA-seq examples, including data from mucosal-associated invariant T cells, induced pluripotent stem cells, and breast cancer tumor cells. While other scRNA-Seq analysis tools exist, the SCTK is the first fully interactive analysis toolkit for scRNA-Seq data available within the R language.NIH U01CA22041
Recommended from our members
Inferring spatial and signaling relationships between cells from single cell transcriptomic data.
Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell-cell communications are then obtained by "optimally transporting" signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene-gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell-cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues
Recommended from our members
Alternative splicing and single-cell RNA-sequencing: a feasibility assessment
We know little about how isoform choice is regulated in individual cells for most spliced genes. In theory, single-cell RNA-sequencing (scRNA-seq) could enable us to investigate isoform choice at cellular resolution. Therefore, scRNA-seq could give insight into the fundamental molecular biology process of how alternative splicing is regulated within cells. However, scRNA-seq is a relatively new technology, and at the start of my PhD it was not clear whether existing bioinformatics approaches would enable accurate splicing analyses. In my PhD I consider what the limitations are when attempting to study alternative splicing using scRNA-seq and what can be done to overcome them.
Alternative splicing is commonly analysed using bulk RNA sequencing (bulk RNA-seq) data with isoform quantification software. It was not clear whether isoform quantification software designed for bulk RNA-seq would perform well when run on scRNA-seq data. To address this, I performed a simulation-based benchmark of isoform quantification software developed for bulk RNA-seq when run on scRNA-seq. I made two important findings. Firstly, I found that isoform quantification software performs poorly when run on Drop-seq data, but performs better when run on scRNA-seq data generated using full-length transcript protocols (eg. SMART-seq and SMART-seq2). Secondly, I found that for the most part, isoform quantification software performs almost as well when run on full-length scRNA-seq as it does when run on bulk RNA-seq. Based on these findings, I concluded that software tools to accurately quantify the reads from full-length scRNA-seq experiments exist, theoretically enabling alternative splicing to be analysed using scRNA-seq.
Encouraged by this result, I embarked on a series of experiments designed to answer questions such as ‘How many isoforms does a gene typically produce per cell?’. This is a key basic biology question that could in theory be answered using scRNA-seq. Unfortunately, I found that the results of these experiments were largely impossible to interpret because I was unable to distinguish between biological signal and technical noise. I realised that without a solid understanding of the technical noise and confounding factors associated with scRNA-seq, distinguishing biological signal from technical noise would be challenging and might not be possible. To address this, I embarked on a second simulation-based study, this time investigating the impact of technical noise on our ability to study alternative splicing using scRNA-seq. I simulated four situations: a situation where every gene expressed one isoform per cell, a situation where all genes expressed two isoforms per cell, a situation where all genes expressed three isoforms per cell and a situation where all genes expressed four isoforms per cell. Importantly, I explicitly simulated isoform choice, dropouts and quantification errors. The results of the four simulated situations were not trivial to distinguish from each other, raising concerns about the feasibility of resolving the more complex splicing patterns that probably exist in reality using scRNA-seq data. I concluded that attempts to study alternative splicing using scRNA-seq are currently substantially confounded by a high rate of dropouts and a lack of understanding about the mechanism of isoform choice. Importantly, improvements to isoform quantification software accuracy alone were insufficient to correct for confounding effects caused by dropouts. I propose that to enable accurate alternative splicing analyses using scRNA-seq, further research into accurately modelling dropouts is required, or alternatively, scRNA-seq technologies should be improved to increase their capture efficiency. Additionally, research into how isoform choice is regulated at a cellular level is necessary to enable accurate analyses. Overall, I find that it is not currently possible to accurately perform alternative splicing analyses using scRNA-seq. However, I am optimistic that with further research, it may become possible in the future
Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data
Many functional analysis tools have been developed to extract functional and mechanistic insight from bulk transcriptome data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events and low library sizes. It is thus not clear if functional TF and pathway analysis tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way.To address this question, we perform benchmark studies on simulated and real scRNA-seq data. We include the bulk-RNA tools PROGENy, GO enrichment, and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compare them against the tools SCENIC/AUCell and metaVIPER, designed for scRNA-seq. For the in silico study, we simulate single cells from TF/pathway perturbation bulk RNA-seq experiments. We complement the simulated data with real scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on simulated and real data reveal comparable performance to the original bulk data. Additionally, we show that the TF and pathway activities preserve cell type-specific variability by analyzing a mixture sample sequenced with 13 scRNA-seq protocols. We also provide the benchmark data for further use by the community.Our analyses suggest that bulk-based functional analysis tools that use manually curated footprint gene sets can be applied to scRNA-seq data, partially outperforming dedicated single-cell tools. Furthermore, we find that the performance of functional analysis tools is more sensitive to the gene sets than to the statistic used
- …