19 research outputs found
Revisiting the thorny issue of missing values in single-cell proteomics
Missing values are a notable challenge when analysing mass spectrometry-based
proteomics data. While the field is still actively debating on the best
practices, the challenge increased with the emergence of mass
spectrometry-based single-cell proteomics and the dramatic increase in missing
values. A popular approach to deal with missing values is to perform
imputation. Imputation has several drawbacks for which alternatives exist, but
currently imputation is still a practical solution widely adopted in
single-cell proteomics data analysis. This perspective discusses the advantages
and drawbacks of imputation. We also highlight 5 main challenges linked to
missing value management in single-cell proteomics. Future developments should
aim to solve these challenges, whether it is through imputation or data
modelling. The perspective concludes with recommendations for reporting missing
values, for reporting methods that deal with missing values and for proper
encoding of missing values.Comment: The code to reproduce the images presented in the manuscript is
available in the Github repository:
https://github.com/UCLouvain-CBIO/2023_scp_n
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package
Mass spectrometry (MS) based single-cell proteomics (SCP) explores cellular
heterogeneity by focusing on the functional effectors of the cells - proteins.
However, extracting meaningful biological information from MS data is far from
trivial, especially with single cells. Currently, data analysis workflows are
substantially different from one research team to another. Moreover,it is
difficult to evaluate pipelines as ground truths are missing. Our team has
developed the R/Bioconductor package called scp to provide a standardised
framework for SCP data analysis. It relies on the widely used QFeatures and
SingleCellExperiment data structures. In addition, we used a design containing
cell lines mixed in known proportions to generate controlled variability for
data analysis benchmarking. In this work, we provide a flexible data analysis
protocol for SCP data using the scp package together with comprehensive
explanations at each step of the processing. Our main steps are quality control
on the feature and cell level, aggregation of the raw data into peptides and
proteins, normalisation and batch correction. We validate our workflow using
our ground truth data set. We illustrate how to use this modular, standardised
framework and highlight some crucial steps
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments
Analyzing proteins from single cells by tandem mass spectrometry (MS) has
become technically feasible. While such analysis has the potential to
accurately quantify thousands of proteins across thousands of single cells, the
accuracy and reproducibility of the results may be undermined by numerous
factors affecting experimental design, sample preparation, data acquisition,
and data analysis. Broadly accepted community guidelines and standardized
metrics will enhance rigor, data quality, and alignment between laboratories.
Here we propose best practices, quality controls, and data reporting
recommendations to assist in the broad adoption of reliable quantitative
workflows for single-cell proteomics.Comment: Supporting website: https://single-cell.net/guideline
A principled approach and standardised software for mass spectrometry-based single-cell proteomics data analysis
Recent advances in sample preparation and mass spectrometry (MS) have enabled the emergence of quantitative MS-based single-cell proteomics (SCP). However, the analysis of SCP data is challenging and must address numerous problems that are inherent to both MS-based proteomics technologies and single-cell experiments. Through the development of standardised software and data, this work establishes the foundation for SCP data analysis. Our efforts have led to a comprehensive identification and understanding of the obstacles hindering the accurate extraction of biologically meaningful information from these complex data. Consequently, we have developed a computational approach explicitly designed to address these challenges, facilitating a seamless analysis of SCP data. This work reshapes the analysis of SCP data by moving efforts from dealing with the technical aspects of data analysis to focusing on answering biologically relevant questions.(BIFA - Sciences biomédicales et pharmaceutiques) -- UCL, 202
Replication of single-cell proteomics data reveals important computational challenges
Introduction: Mass spectrometry-based proteomics is actively embracing quantitative, single-cell level analyses. Indeed, recent advances in sample preparation and mass spectrometry (MS) have enabled the emergence of quantitative MS-based single-cell proteomics (SCP). While exciting and promising, SCP still has many rough edges. The current analysis workflows are custom and built from scratch. The field is therefore craving for standardized software that promotes principled and reproducible SCP data analyses. Areas covered: This special report is the first step toward the formalization and standardization of SCP data analysis. scp, the software that accompanies this work, successfully replicates one of the landmark SCP studies and is applicable to other experiments and designs. We created a repository containing the replicated workflow with comprehensive documentation in order to favor further dissemination and improvements of SCP data analyses. Expert opinion: Replicating SCP data analyses uncovers important challenges in SCP data analysis. We describe two such challenges in detail: batch correction and data missingness. We provide the current state-of-the-art and illustrate the associated limitations. We also highlight the intimate dependence that exists between batch effects and data missingness and offer avenues for dealing with these exciting challenges
The Current State of SingleâCell Proteomics Data Analysis
Sound data analysis is essential to retrieve meaningful biological information from single-cell proteomics experiments. This analysis is carried out by computational methods that are assembled into workflows, and their implementations influence the conclusions that can be drawn from the data. In this work, we explore and compare the computational workflows that have been used over the last four years and identify a profound lack of consensus on how to analyze single-cell proteomics data. We highlight the need for benchmarking of computational workflows and standardization of computational tools and data, as well as carefully designed experiments. Finally, we cover the current standardization efforts that aim to fill the gap, list the remaining missing pieces, and conclude with lessons learned from the replication of published single-cell proteomics analyses
Replication of single-cell proteomics data reveals important computational challenges
Introduction Mass spectrometry-based proteomics is actively embracing quantitative, single-cell level analyses. Indeed, recent advances in sample preparation and mass spectrometry (MS) have enabled the emergence of quantitative MS-based single-cell proteomics (SCP). While exciting and promising, SCP still has many rough edges. The current analysis workflows are custom and built from scratch. The field is therefore craving for standardized software that promotes principled and reproducible SCP data analyses. Areas covered This special report is the first step toward the formalization and standardization of SCP data analysis. scp, the software that accompanies this work, successfully replicates one of the landmark SCP studies and is applicable to other experiments and designs. We created a repository containing the replicated workflow with comprehensive documentation in order to favor further dissemination and improvements of SCP data analyses. Expert opinion Replicating SCP data analyses uncovers important challenges in SCP data analysis. We describe two such challenges in detail: batch correction and data missingness. We provide the current state-of-the-art and illustrate the associated limitations. We also highlight the intimate dependence that exists between batch effects and data missingness and offer avenues for dealing with these exciting challenges
Revisiting the Thorny Issue of Missing Values in Single-Cell Proteomics
Missing values are a notable challenge when analyzing mass spectrometry-based proteomics data. While the field is still actively debating the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently, imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modeling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values, and for proper encoding of missing values
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package
Mass spectrometry (MS) based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells - proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover,it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardised framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this work, we provide a flexible data analysis protocol for SCP data using the `scp` package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalisation and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardised framework and highlight some crucial steps
Identification and implication of tissue-enriched ligands in epithelialâendothelial crosstalk during pancreas development
Development of the pancreas is driven by an intrinsic program coordinated with signals from other cell types in the epithelial environment. These intercellular communications have been so far challenging to study because of the low concentration, localized production and diversity of the signals released. Here, we combined scRNAseq data with a computational interactomic approach to identify signals involved in the reciprocal interactions between the various cell types of the developing pancreas. This in silico approach yielded 40,607 potential ligandâtarget interactions between the different main pancreatic cell types. Among this vast network of interactions, we focused on three ligands potentially involved in communications between epithelial and endothelial cells. BMP7 and WNT7B, expressed by pancreatic epithelial cells and predicted to target endothelial cells, and SEMA6D, involved in the reverse interaction. In situ hybridization confirmed the localized expression of Bmp7 in the pancreatic epithelial tip cells and of Wnt7b in the trunk cells. On the contrary, Sema6d was enriched in endothelial cells. Functional experiments on ex vivo cultured pancreatic explants indicated that tip cellâproduced BMP7 limited development of endothelial cells. This work identified ligands with a restricted tissular and cellular distribution and highlighted the role of BMP7 in the intercellular communications contributing to vessel development and organization during pancreas organogenesis