Search CORE

14 research outputs found

SPsimSeq : semi-parametric simulation of bulk and single-cell RNA-sequencing data

Author: Assefa Alemu Takele
Thas Olivier
Vandesompele Jo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects

Ghent University Academic Bibliography

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Author: Clement Lieven
Dudoit Sandrine
Love Michael I
Perraudeau Fanny
Risso Davide
Robinson Mark D
Soneson Charlotte
Van den Berge Koen
Vert Jean-Philippe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq

Ghent University Academic Bibliography

Directory of Open Access Journals

Carolina Digital Repository

eScholarship - University of California

Archivsystem Ask23

ZORA

HAL-MINES ParisTech

Archivio istituzionale della ricerca - Università di Padova

Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data

Author: Assefa Alemu Takele
De Paepe Katrijn
Everaert Celine
Mestdagh Pieter
Thas Olivier
Vandesompele Jo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background: Long non-coding RNAs (lncRNAs) are typically expressed at low levels and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 25 pipelines for testing DE in RNA-seq data is comprehensively evaluated, with a particular focus on lncRNAs and low-abundance mRNAs. Fifteen performance metrics are used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Results: Gene expression data are simulated using non-parametric procedures in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, results for mRNA and lncRNA were tracked separately. All the pipelines exhibit inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and benchmark RNA-seq datasets. The substandard performance of DE tools for lncRNAs applies also to low-abundance mRNAs. No single tool uniformly outperformed the others. Variability, number of samples, and fraction of DE genes markedly influenced DE tool performance. Conclusions: Overall, linear modeling with empirical Bayes moderation (limma) and a non-parametric approach (SAMSeq) showed good control of the false discovery rate and reasonable sensitivity. Of note, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in realistic settings such as in clinical cancer research. About half of the methods showed a substantial excess of false discoveries, making these methods unreliable for DE analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, giving guidance on selection of the optimal DE tool (http://statapps.ugent.be/tools/AppDGE/)

Ghent University Academic Bibliography

Directory of Open Access Journals

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P.
Hapfelmeier Alexander
Robinson Mark D.
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M.
Publication venue
Publication date: 01/01/2019
Field of study

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Open Access LMU

ZORA

The shaky foundations of simulating single-cell RNA sequencing data

Author: Crowell Helena L
Morillo Leonardo Sarah X
Robinson Mark D
Soneson Charlotte
Publication venue: BioMed Central
Publication date: 29/03/2023
Field of study

BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons

ZORA

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P
Hapfelmeier Alexander
Robinson Mark D
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Open Access LMU

ZORA

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Author: A Colin Cameron
Charlotte Soneson
Davide Risso
Fanny Perraudeau
Jean-Philippe Vert
Koen Van den Berge
Lieven Clement
Mark D. Robinson
Michael I. Love
Sandrine Dudoit
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Orchestrating single-cell analysis with Bioconductor

Author: Amezquita Robert A.
Becht Etienne
Carey Vince J.
Carpp Lindsay N.
Geistlinger Ludwig
Gottardo Raphael
Hicks Stephanie C.
Huber Wolfgang
Lun Aaron T. L.
Martini Federico
Morgan Martin
Pag\ue8s Herv\ue9
Risso Davide
Rue-Albrecht Kevin
Smith Mike L.
Soneson Charlotte
Waldron Levi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Archivio istituzionale della ricerca - Università di Padova

A systematic performance evaluation of clustering methods for single-cell RNA-seq data [version 2; referees: 2 approved]

Author: Angelo Duò
Charlotte Soneson
Mark D. Robinson
Publication venue: 'F1000 Research Ltd'
Publication date: 01/09/2018
Field of study

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub (https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor (https://bioconductor.org/packages/DuoClustering2018)

Directory of Open Access Journals

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification [version 2; referees: 1 approved, 2 approved with reservations]

Author: Charlotte Soneson
Michael I. Love
Rob Patro
Publication venue: 'F1000 Research Ltd'
Publication date: 01/09/2018
Field of study

Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data

Directory of Open Access Journals