3,223 research outputs found

    Evaluation of peak-picking algorithms for protein mass spectrometry

    Get PDF
    Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves

    Computational approaches in high-throughput proteomics data analysis

    Get PDF
    Proteins are key components in biological systems as they mediate the signaling responsible for information processing in a cell and organism. In biomedical research, one goal is to elucidate the mechanisms of cellular signal transduction pathways to identify possible defects that cause disease. Advancements in technologies such as mass spectrometry and flow cytometry enable the measurement of multiple proteins from a system. Proteomics, or the large-scale study of proteins of a system, thus plays an important role in biomedical research. The analysis of all high-throughput proteomics data requires the use of advanced computational methods. Thus, the combination of bioinformatics and proteomics has become an important part in research of signal transduction pathways. The main objective in this study was to develop and apply computational methods for the preprocessing, analysis and interpretation of high-throughput proteomics data. The methods focused on data from tandem mass spectrometry and single cell flow cytometry, and integration of proteomics data with gene expression microarray data and information from various biological databases. Overall, the methods developed and applied in this study have led to new ways of management and preprocessing of proteomics data. Additionally, the available tools have successfully been used to help interpret biomedical data and to facilitate analysis of data that would have been cumbersome to do without the use of computational methods.Proteiineilla on tÀrkeÀ merkitys biologisissa systeemeissÀ sillÀ ne koordinoivat erilaisia solujen ja organismien prosesseja. Yksi biolÀÀketieteellisen tutkimuksen tavoitteista on valottaa solujen viestintÀreittejÀ ja niiden toiminnassa tapahtuvia muutoksia eri sairauksien yhteydessÀ, jotta tÀllaisia muutoksia voitaisiin korjata. Proteomiikka on proteiinien laajamittaista tutkimista solusta, kudoksesta tai organismista. Proteomiikan menetelmÀt kuten massaspektrometria ja virtaussytometria ovat keskeisiÀ biolÀÀketieteellisen tutkimuksen menetelmiÀ, joilla voidaan mitata nÀytteestÀ samanaikaisesti useita proteiineja. Nykyajan kehittyneet proteomiikan mittausteknologiat tuottavat suuria tulosaineistoja ja edellyttÀvÀt laskennallisten menetelmien kÀyttöÀ aineiston analyysissÀ. Bioinformatiikan menetelmÀt ovatkin nousseet tÀrkeÀksi osaksi proteomiikka-analyysiÀ ja viestintÀreittien tutkimusta. TÀmÀn tutkimuksen pÀÀtavoite oli kehittÀÀ ja soveltaa tehokkaita laskennallisia menetelmiÀ laajamittaisten proteomiikka-aineistojen esikÀsittelyyn, analyysiin ja tulkintaan. TÀssÀ tutkimuksessa kehitettiin esikÀsittelymenetelmÀ massaspektrometria-aineistolle sekÀ automatisoitu analyysimenetelmÀ virtaussytometria-aineistolle. Proteiinitason tietoa yhdistettiin mittauksiin geenien transkriptiotasoista ja olemassaolevaan biologisista tietokannoista poimittuun tietoon. VÀitöskirjatyö osoittaa, ettÀ laskennallisilla menetelmillÀ on keskeinen merkitys proteomiikan aineistojen hallinnassa, esikÀsittelyssÀ ja analyysissÀ. Tutkimuksessa kehitetyt analyysimenetelmÀt edistÀvÀt huomattavasti biolÀÀketieteellisen tiedon laajempaa hyödyntÀmistÀ ja ymmÀrtÀmistÀ

    A metaproteomic approach to study human-microbial ecosystems at the mucosal luminal interface

    Get PDF
    Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI. © 2011 Li et al

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

    Get PDF
    Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets

    An Interpretable Deep Learning Approach for Biomarker Detection in LC-MS Proteomics Data

    Get PDF
    Analyzing mass spectrometry-based proteomics data with deep learning (DL) approaches poses several challenges due to the high dimensionality, low sample size, and high level of noise. Additionally, DL-based workflows are often hindered to be integrated into medical settings due to the lack of interpretable explanation. We present DLearnMS, a DL biomarker detection framework, to address these challenges on proteomics instances of liquid chromatography-mass spectrometry (LC-MS) - a well-established tool for quantifying complex protein mixtures. Our DLearnMS framework learns the clinical state of LC-MS data instances using convolutional neural networks. Based on the trained neural networks, we show how biomarkers can be identified using layer-wise relevance propagation. This enables detecting discriminating regions of the data and the design of more robust networks. One of the main advantages over other established methods is that no explicit preprocessing step is needed in our DLearnMS framework. Our evaluation shows that DLearnMS outperforms conventional LC-MS biomarker detection approaches in identifying fewer false positive peaks while maintaining a comparable amount of true positives peaks. Code availability: The code is available from the following GIT repository: https://github.com/SaharIravani/DlearnM

    Apex Peptide Elution Chain Selection: A New Strategy for Selecting Precursors in 2D-LC-MALDI-TOF/TOF Experiments on Complex Biological Samples

    Get PDF
    LC-MALDI provides an often overlooked opportunity to exploit the separation between LC-MS and MS/MS stages of a 2D-LC-MS-based proteomics experiment, that is, by making a smarter selection for precursor fragmentation. Apex Peptide Elution Chain Selection (APECS) is a simple and powerful method for intensity-based peptide selection in a complex sample separated by 2D-LC, using a MALDI-TOF/TOF instrument. It removes the peptide redundancy present in the adjacent first-dimension (typically strong cation exchange, SCX) fractions by constructing peptide elution profiles that link the precursor ions of the same peptide across SCX fractions. Subsequently, the precursor ion most likely to fragment successfully in a given profile is selected for fragmentation analysis, selecting on precursor intensity and absence of adjacent ions that may cofragment. To make the method independent of experiment-specific tolerance criteria, we introduce the concept of the branching factor, which measures the likelihood of false clustering of precursor ions based on past experiments. By validation with a complex proteome sample of Arabidopsis thaliana, APECS identified an equivalent number of peptides as a conventional data-dependent acquisition method but with a 35% smaller work load. Consequently, reduced sample depletion allowed further selection of lower signal-to-noise ratio precursor ions, leading to a larger number of identified unique peptides.
