Search CORE

119 research outputs found

A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities

Author: Fontes Magnus
Soneson Charlotte
Publication venue
Publication date: 15/03/2011
Field of study

Analysis of multivariate data sets from e.g. microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists with respect to sampling variations, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability attributable to sampling variations. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward robust comparison between any two lists. It can also be used to generate new, more stable gene rankings incorporating more information from the experimental data. Using a microarray data set from lung cancer patients we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance

arXiv.org e-Print Archive

Lund University Publications

A method for visual identification of small sample subgroups and potential biomarkers

Author: Fontes Magnus
Soneson Charlotte
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

In order to find previously unknown subgroups in biomedical data and generate testable hypotheses, visually guided exploratory analysis can be of tremendous importance. In this paper we propose a new dissimilarity measure that can be used within the Multidimensional Scaling framework to obtain a joint low-dimensional representation of both the samples and variables of a multivariate data set, thereby providing an alternative to conventional biplots. In comparison with biplots, the representations obtained by our approach are particularly useful for exploratory analysis of data sets where there are small groups of variables sharing unusually high or low values for a small group of samples.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS460 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Lund University Publications

iSEE: Interactive SummarizedExperiment Explorer

Author: Lun ATL
Marini Federico
Rue-Albrecht Kevin
Soneson Charlotte
Publication venue: F1000Research
Publication date: 01/01/2018
Field of study

Data exploration is critical to the comprehension of large biological data sets generated by high-throughput assays such as sequencing. However, most existing tools for interactive visualisation are limited to specific assays or analyses. Here, we present the iSEE (Interactive SummarizedExperiment Explorer) software package, which provides a general visual interface for exploring data in a SummarizedExperiment object. iSEE is directly compatible with many existing R/Bioconductor packages for analysing high-throughput biological data, and provides useful features such as simultaneous examination of (meta)data and analysis results, dynamic linking between plots and code tracking for reproducibility. We demonstrate the utility and flexibility of iSEE by applying it to explore a range of real transcriptomics and proteomics data sets.German Federal Ministry of Education and Researc

Crossref

Oxford University Research Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Apollo (Cambridge)

IFRS effekt på noterade företags redovisning - en statistisk studie av företagen på Stockholms O-lista

Author: Hansson Tobias
Rafsten Axel
Soneson Charlotte
Publication venue: Lunds universitet/Företagsekonomiska institutionen
Publication date: 01/01/2006
Field of study

Övergången från att redovisa enligt Redovisningsrådets rekommendationer till att redovisa enligt IFRS/IAS har föranlett stora förändringar i de redovisade siffrorna för många noterade företag. Därför vill vi undersöka hur övergången till IFRS har förändrat redovisningspraxis för noterade svenska företag, samt hur representanter för dessa företag anser att redovisningens kvalitet har påverkats

The shaky foundations of simulating single-cell RNA sequencing data

Author: Crowell Helena L
Morillo Leonardo Sarah X
Robinson Mark D
Soneson Charlotte
Publication venue: BioMed Central
Publication date: 29/03/2023
Field of study

BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons

ZORA

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P
Hapfelmeier Alexander
Robinson Mark D
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology

Ghent University Academic Bibliography

Open Access LMU

ZORA

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P.
Hapfelmeier Alexander
Robinson Mark D.
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M.
Publication venue
Publication date: 01/01/2019
Field of study

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Open Access LMU

ZORA

A systematic performance evaluation of clustering methods for single-cell RNA-seq data [version 2; referees: 2 approved]

Author: Angelo Duò
Charlotte Soneson
Mark D. Robinson
Publication venue: 'F1000 Research Ltd'
Publication date: 01/09/2018
Field of study

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub (https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor (https://bioconductor.org/packages/DuoClustering2018)

Directory of Open Access Journals

Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification [version 2; referees: 1 approved, 2 approved with reservations]

Author: Charlotte Soneson
Michael I. Love
Rob Patro
Publication venue: 'F1000 Research Ltd'
Publication date: 01/09/2018
Field of study

Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data

Directory of Open Access Journals

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Author: Clement Lieven
Dudoit Sandrine
Love Michael I
Perraudeau Fanny
Risso Davide
Robinson Mark D
Soneson Charlotte
Van den Berge Koen
Vert Jean-Philippe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq

Crossref

Ghent University Academic Bibliography

Directory of Open Access Journals

Carolina Digital Repository

eScholarship - University of California

Archivsystem Ask23

ZORA

HAL-MINES ParisTech

Archivio istituzionale della ricerca - Università di Padova