42 research outputs found
Recommended from our members
A Bioconductor workflow for processing and analysing spatial proteomics data
Spatial proteomics is the systematic study of protein sub-cellular localisation. In this workflow, we describe the analysis of a typical quantitative mass spectrometry-based spatial proteomics experiment using the MSnbase and pRoloc Bioconductor package suite. To walk the user through the computational pipeline, we use a recently published experiment predicting protein sub-cellular localisation in pluripotent embryonic mouse stem cells. We describe the software infrastructure at hand, importing and processing data, quality control, sub-cellular marker definition, visualisation and interactive exploration. We then demonstrate the application and interpretation of statistical learning methods, including novelty detection using semi-supervised learning, classification, clustering and transfer learning and conclude the pipeline with data export. The workflow is aimed at beginners who are familiar with proteomics in general and spatial proteomics in particular.LMB and CMM are supported by a Wellcome Trust Technology Development Grant (grant number 108441/Z/15/Z). KSL is a Wellcome Trust Joint Investigator (110170/Z/15/Z). LG is supported by the BBSRC Strategic Longer and Larger grant (Award BB/L002817/1)
A Bioconductor workflow for processing and analysing spatial proteomics data [version 2; referees: 2 approved]
Spatial proteomics is the systematic study of protein sub-cellular localisation. In this workflow, we describe the analysis of a typical quantitative mass spectrometry-based spatial proteomics experiment using the MSnbase and pRoloc Bioconductor package suite. To walk the user through the computational pipeline, we use a recently published experiment predicting protein sub-cellular localisation in pluripotent embryonic mouse stem cells. We describe the software infrastructure at hand, importing and processing data, quality control, sub-cellular marker definition, visualisation and interactive exploration. We then demonstrate the application and interpretation of statistical learning methods, including novelty detection using semi-supervised learning, classification, clustering and transfer learning and conclude the pipeline with data export. The workflow is aimed at beginners who are familiar with proteomics in general and spatial proteomics in particular
A Bioconductor workflow for processing, evaluating, and interpreting expression proteomics data [version 1; peer review: 2 approved]
Background: Expression proteomics involves the global evaluation of protein abundances within a system. In turn, differential expression analysis can be used to investigate changes in protein abundance upon perturbation to such a system. Methods: Here, we provide a workflow for the processing, analysis and interpretation of quantitative mass spectrometry-based expression proteomics data. This workflow utilizes open-source R software packages from the Bioconductor project and guides users end-to-end and step-by-step through every stage of the analyses. As a use-case we generated expression proteomics data from HEK293 cells with and without a treatment. Of note, the experiment included cellular proteins labelled using tandem mass tag (TMT) technology and secreted proteins quantified using label-free quantitation (LFQ). Results: The workflow explains the software infrastructure before focusing on data import, pre-processing and quality control. This is done individually for TMT and LFQ datasets. The application of statistical differential expression analysis is demonstrated, followed by interpretation via gene ontology enrichment analysis. Conclusions: A comprehensive workflow for the processing, analysis and interpretation of expression proteomics is presented. The workflow is a valuable resource for the proteomics community and specifically beginners who are at least familiar with R who wish to understand and make data-driven decisions with regards to their analyses
Recommended from our members
The subcellular organisation of Saccharomyces cerevisiae.
Subcellular protein localisation is essential for the mechanisms that govern cellular homeostasis. The ability to understand processes leading to this phenomenon will therefore enhance our understanding of cellular function. Here we review recent developments in this field with regard to mass spectrometry, fluorescence microscopy and computational prediction methods. We highlight relative strengths and limitations of current methodologies focussing particularly on studies in the yeast Saccharomyces cerevisiae. We further present the first cell-wide spatial proteome map of S. cerevisiae, generated using hyperLOPIT, a mass spectrometry-based protein correlation profiling technique. We compare protein subcellular localisation assignments from this map, with two published fluorescence microscopy studies and show that confidence in localisation assignment is attained using multiple orthogonal methods that provide complementary data.BBSRC and Wellcome Trus
A draft map of the mouse pluripotent stem cell spatial proteome.
Knowledge of the subcellular distribution of proteins is vital for understanding cellular mechanisms. Capturing the subcellular proteome in a single experiment has proven challenging, with studies focusing on specific compartments or assigning proteins to subcellular niches with low resolution and/or accuracy. Here we introduce hyperLOPIT, a method that couples extensive fractionation, quantitative high-resolution accurate mass spectrometry with multivariate data analysis. We apply hyperLOPIT to a pluripotent stem cell population whose subcellular proteome has not been extensively studied. We provide localization data on over 5,000 proteins with unprecedented spatial resolution to reveal the organization of organelles, sub-organellar compartments, protein complexes, functional networks and steady-state dynamics of proteins and unexpected subcellular locations. The method paves the way for characterizing the impact of post-transcriptional and post-translational modification on protein location and studies involving proteome-level locational changes on cellular perturbation. An interactive open-source resource is presented that enables exploration of these data.The authors thank Andreas Hühmer, Philip Remes, Jesse Canterbury and Graeme McAlister of Thermo Fisher Scientific, San Jose, CA, USA, for their advice regarding operation of the Orbitrap Fusion. We also thank Mike Deery for assistance with checking sample integrity on the mass spectrometers in the Cambridge Centre for Proteomics on equipment purchased via a Wellcome Trust grant (099135/Z/12/Z ), and Brian Hendrich of the Wellcome Trust-MRC Stem Cell Institute in Cambridge and Sean Munro of the MRC Laboratory of Molecular Biology in Cambridge for insightful comments about the data. AC was supported by BBSRC grant (BB/D526088/1). C.M.M. and L.G. were supported by European Union 7th Framework Program (PRIMEXS project, grant agreement number 262067), L.M.B was supported by a BBSRC Tools and Resources Development Fund (Award BB/K00137X/1), and P.C.H. was supported by an ERC Advanced Investigator grant to A.M.A. A.G. was funded through the Alexander S. Onassis Public Benefit Foundation, the Foundation for Education and European Culture (IPEP) and the Embiricos Trust Scholarship of Jesus College Cambridge. T.H. was supported by Commonwealth Split Site PhD Scholarship. T.N. was supported by an ERASMUS Placement scholarshipThis is the final version of the article. It was first available from NPG via http://dx.doi.org/10.1038/ncomms999
A foundation for reliable spatial proteomics data analysis.
Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis..G., C.M.M., and M.F. were supported by the European Union 7th Framework Program (PRIME-XS Project, Grant No. 262067). L.M.B. was supported by a BBSRC Tools and Resources Development Fund (Award No. BB/K00137X/1). T.B. was supported by the Proteomics French Infrastructure (ProFI, ANR-10-INBS-08). A.C. was supported by BBSRC Grant No. BB/D526088/1. A.J.G. was supported by BBSRC Grant No. BB/E024777/ and a generous gift from King Abdullah University for Science and Technology, Saudi Arabia. D.J.N.H. was supported by a BBSRC CASE studentship (BB/I016147/1)
Recommended from our members
Spatial proteomics defines the content of trafficking vesicles captured by golgin tethers.
Intracellular traffic between compartments of the secretory and endocytic pathways is mediated by vesicle-based carriers. The proteomes of carriers destined for many organelles are ill-defined because the vesicular intermediates are transient, low-abundance and difficult to purify. Here, we combine vesicle relocalisation with organelle proteomics and Bayesian analysis to define the content of different endosome-derived vesicles destined for the trans-Golgi network (TGN). The golgin coiled-coil proteins golgin-97 and GCC88, shown previously to capture endosome-derived vesicles at the TGN, were individually relocalised to mitochondria and the content of the subsequently re-routed vesicles was determined by organelle proteomics. Our findings reveal 45 integral and 51 peripheral membrane proteins re-routed by golgin-97, evidence for a distinct class of vesicles shared by golgin-97 and GCC88, and various cargoes specific to individual golgins. These results illustrate a general strategy for analysing intracellular sub-proteomes by combining acute cellular re-wiring with high-resolution spatial proteomics
Spatiotemporal proteomic profiling of the pro-inflammatory response to lipopolysaccharide in the THP-1 human leukaemia cell line.
Protein localisation and translocation between intracellular compartments underlie almost all physiological processes. The hyperLOPIT proteomics platform combines mass spectrometry with state-of-the-art machine learning to map the subcellular location of thousands of proteins simultaneously. We combine global proteome analysis with hyperLOPIT in a fully Bayesian framework to elucidate spatiotemporal proteomic changes during a lipopolysaccharide (LPS)-induced inflammatory response. We report a highly dynamic proteome in terms of both protein abundance and subcellular localisation, with alterations in the interferon response, endo-lysosomal system, plasma membrane reorganisation and cell migration. Proteins not previously associated with an LPS response were found to relocalise upon stimulation, the functional consequences of which are still unclear. By quantifying proteome-wide uncertainty through Bayesian modelling, a necessary role for protein relocalisation and the importance of taking a holistic overview of the LPS-driven immune response has been revealed. The data are showcased as an interactive application freely available for the scientific community
Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics.
Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.LMB was supported by a BBSRC Tools and Resources Development Fund (Award BB/K00137X/1) and a Wellcome Trust Technology Development Grant (108441/Z/15/Z). LG was supported by the European Union 7th Framework Program (PRIME-XS project, grant agreement number 262067) and a BBSRC Strategic Longer and Larger Award (Award BB/L002817/1). DW and OK acknowledge funding from the European Union (PRIME-XS, GA 262067) and Deutsche Forschungsgemeinschaft (KO-2313/6-1).This is the final version of the article. It first appeared from PLOS via https://doi.org/10.1371/journal.pcbi.100492
Recommended from our members
Proteome Mapping of a Cyanobacterium Reveals Distinct Compartment Organization and Cell-Dispersed Metabolism.
Cyanobacteria are complex prokaryotes, incorporating a Gram-negative cell wall and internal thylakoid membranes (TMs). However, localization of proteins within cyanobacterial cells is poorly understood. Using subcellular fractionation and quantitative proteomics, we produced an extensive subcellular proteome map of an entire cyanobacterial cell, identifying ∼67% of proteins in Synechocystis sp. PCC 6803, ∼1000 more than previous studies. Assigned to six specific subcellular regions were 1,712 proteins. Proteins involved in energy conversion localized to TMs. The majority of transporters, with the exception of a TM-localized copper importer, resided in the plasma membrane (PM). Most metabolic enzymes were soluble, although numerous pathways terminated in the TM (notably those involved in peptidoglycan monomer, NADP+, heme, lipid, and carotenoid biosynthesis) or PM (specifically, those catalyzing lipopolysaccharide, molybdopterin, FAD, and phylloquinol biosynthesis). We also identified the proteins involved in the TM and PM electron transport chains. The majority of ribosomal proteins and enzymes synthesizing the storage compound polyhydroxybuyrate formed distinct clusters within the data, suggesting similar subcellular distributions to one another, as expected for proteins operating within multicomponent structures. Moreover, heterogeneity within membrane regions was observed, indicating further cellular complexity. Cyanobacterial TM protein localization was conserved in Arabidopsis (Arabidopsis thaliana) chloroplasts, suggesting similar proteome organization in more developed photosynthetic organisms. Successful application of this technique in Synechocystis suggests it could be applied to mapping the proteomes of other cyanobacteria and single-celled organisms. The organization of the cyanobacterial cell revealed here substantially aids our understanding of these environmentally and biotechnologically important organisms