75,277 research outputs found
The age of data-driven proteomics : how machine learning enables novel workflows
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. In this viewpoint we therefore point out highly promising recent machine learning developments in proteomics, alongside some of the remaining challenges
Current challenges in software solutions for mass spectrometry-based quantitative proteomics
This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.
Streaming visualisation of quantitative mass spectrometry data based on a novel raw signal decomposition method
As data rates rise, there is a danger that informatics for high-throughput LC-MS becomes more opaque and inaccessible to practitioners. It is therefore critical that efficient visualisation tools are available to facilitate quality control, verification, validation, interpretation, and sharing of raw MS data and the results of MS analyses. Currently, MS data is stored as contiguous spectra. Recall of individual spectra is quick but panoramas, zooming and panning across whole datasets necessitates processing/memory overheads impractical for interactive use. Moreover, visualisation is challenging if significant quantification data is missing due to data-dependent acquisition of MS/MS spectra. In order to tackle these issues, we leverage our seaMass technique for novel signal decomposition. LC-MS data is modelled as a 2D surface through selection of a sparse set of weighted B-spline basis functions from an over-complete dictionary. By ordering and spatially partitioning the weights with an R-tree data model, efficient streaming visualisations are achieved. In this paper, we describe the core MS1 visualisation engine and overlay of MS/MS annotations. This enables the mass spectrometrist to quickly inspect whole runs for ionisation/chromatographic issues, MS/MS precursors for coverage problems, or putative biomarkers for interferences, for example. The open-source software is available from http://seamass.net/viz/
Recommended from our members
Quantitative plant proteomics using hydroponic isotope labeling of entire plants (HILEP)
Gene induction during differentiation of human monocytes into dendritic cells: an integrated study at the RNA and protein levels
Changes in gene expression occurring during differentiation of human
monocytes into dendritic cells were studied at the RNA and protein levels.
These studies showed the induction of several gene classes corresponding to
various biological functions. These functions encompass antigen processing and
presentation, cytoskeleton, cell signalling and signal transduction, but also
an increase in mitochondrial function and in the protein synthesis machinery,
including some, but not all, chaperones. These changes put in perspective the
events occurring during this differentiation process. On a more technical
point, it appears that the studies carried out at the RNA and protein levels
are highly complementary.Comment: website publisher:
http://www.springerlink.com/content/ha0d2c351qhjhjdm
The Proteomics of N-terminal Methionine Cleavage
Methionine aminopeptidase (MAP) is a ubiquitous, essential enzyme involved in protein N-terminal methionine excision. According to the generally accepted cleavage rules for MAP, this enzyme cleaves all proteins with small side chains on the residue in the second position (P1′), but many exceptions are known. The substrate specificity of Escherichia coli MAP1 was studied in vitro with a large (\u3e120) coherent array of peptides mimicking the natural substrates and kinetically analyzed in detail. Peptides with Val or Thr at P1′ were much less efficiently cleaved than those with Ala, Cys, Gly, Pro, or Ser in this position. Certain residues at P2′, P3′, and P4′ strongly slowed the reaction, and some proteins with Val and Thr at P1′ could not undergo Met cleavage. These in vitro data were fully consistent with data for 862 E. coli proteins with known N-terminal sequences in vivo. The specificity sites were found to be identical to those for the other type of MAPs, MAP2s, and a dedicated prediction tool for Met cleavage is now available. Taking into account the rules of MAP cleavage and leader peptide removal, the N termini of all proteins were predicted from the annotated genome and compared with data obtained in vivo. This analysis showed that proteins displaying N-Met cleavage are overrepresented in vivo. We conclude that protein secretion involving leader peptide cleavage is more frequent than generally thought
Bacterial riboproteogenomics : the era of N-terminal proteoform existence revealed
With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome re-annotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms
A metaproteomic approach to study human-microbial ecosystems at the mucosal luminal interface
Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI. © 2011 Li et al
Peptide mass fingerprinting using field-programmable gate arrays
The reconfigurable computing paradigm, which exploits the flexibility and versatility of field-programmable gate arrays (FPGAs), has emerged as a powerful solution for speeding up time-critical algorithms. This paper describes a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-TOF instruments. The hardware-implemented algorithms for denoising, baseline correction, peak identification, and deisotoping, running on a Xilinx Virtex-2 FPGA at 180 MHz, generate a mass fingerprint that is over 100 times faster than an equivalent algorithm written in C, running on a Dual 3-GHz Xeon server. The results obtained using the FPGA implementation are virtually identical to those generated by a commercial software package MassLynx
- …