1,315 research outputs found

    Peptide Identification: Refining a Bayesian Stochastic Model

    Get PDF
    Notwithstanding the challenges associated with different methods of peptide identification, other methods have been explored over the years. The complexity, size and computational challenges of peptide-based data sets calls for more intrusion into this sphere. By relying on the prior information about the average relative abundances of bond cleavages and the prior probability of any specific amino acid sequence, we refine an already developed Bayesian approach in identifying peptides. The likelihood function is improved by adding additional ions to the model and its size is driven by two overall goodness of fit measures. In the face of the complexities associated with our posterior density, a Markov chain Monte Carlo algorithm coupled with simulated annealing is used to simulate candidate choices from the posterior distribution of the peptide sequence, where the peptide with the largest posterior density is estimated as the true peptide

    Isotopic envelope identification by analysis of the spatial distribution of components in MALDI-MSI data

    Full text link
    One of the significant steps in the process leading to the identification of proteins is mass spectrometry, which allows for obtaining information about the structure of proteins. Removing isotope peaks from the mass spectrum is vital and it is done in a process called deisotoping. There are different algorithms for deisotoping, but they have their limitations, they are dedicated to different methods of mass spectrometry. Data from experiments performed with the MALDI-ToF technique are characterized by high dimensionality. This paper presents a method for identifying isotope envelopes in MALDI-ToF molecular imaging data based on the Mamdani-Assilan fuzzy system and spatial maps of the molecular distribution of peaks included in the isotopic envelope. Several image texture measures were used to evaluate spatial molecular distribution maps. The algorithm was tested on eight datasets obtained from the MALDI-ToF experiment on samples from the National Institute of Oncology in Gliwice from patients with cancer of the head and neck region. The data were subjected to pre-processing and feature extraction. The results were collected and compared with three existing deisotoping algorithms. The analysis of the obtained results showed that the method for identifying isotopic envelopes proposed in this paper enables the detection of overlapping envelopes by using the approach oriented to study peak pairs. Moreover, the proposed algorithm enables the analysis of large data sets

    A Comprehensive Analysis of MALDI-TOF Spectrometry Data

    Get PDF

    De novo sequencing of proteins by mass spectrometry

    Get PDF
    Introduction Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. Areas covered De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. Expert opinion As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.publishe

    Improving phylogeny reconstruction at the strain level using peptidome datasets

    Get PDF
    Typical bacterial strain differentiation methods are often challenged by high genetic similarity between strains. To address this problem, we introduce a novel in silico peptide fingerprinting method based on conventional wet-lab protocols that enables the identification of potential strain-specific peptides. These can be further investigated using in vitro approaches, laying a foundation for the development of biomarker detection and application-specific methods. This novel method aims at reducing large amounts of comparative peptide data to binary matrices while maintaining a high phylogenetic resolution. The underlying case study concerns the Bacillus cereus group, namely the differentiation of Bacillus thuringiensis, Bacillus anthracis and Bacillus cereus strains. Results show that trees based on cytoplasmic and extracellular peptidomes are only marginally in conflict with those based on whole proteomes, as inferred by the established Genome-BLAST Distance Phylogeny (GBDP) method. Hence, these results indicate that the two approaches can most likely be used complementarily even in other organismal groups. The obtained results confirm previous reports about the misclassification of many strains within the B. cereus group. Moreover, our method was able to separate the B. anthracis strains with high resolution, similarly to the GBDP results as benchmarked via Bayesian inference and both Maximum Likelihood and Maximum Parsimony. In addition to the presented phylogenomic applications, whole-peptide fingerprinting might also become a valuable complementary technique to digital DNA-DNA hybridization, notably for bacterial classification at the species and subspecies level in the future.This research was funded by Grant AGL2013-44039-R from the Spanish “Plan Estatal de I+D+I”, and by Grant EM2014/046 from the “Plan Galego de investigación, innovación e crecemento 2011-2015”. BS was recipient of a Ramón y Cajal postdoctoral contractfrom the Spanish Ministry of Economyand Competitiveness. This work was also partially funded by the [14VI05] Contract-Programme from the University of Vigo and the Agrupamento INBIOMED from DXPCTSUG-FEDER unha maneira de facer Europa (2012/273).The research leading to these results has also received funding from the European Union’s Seventh Framework Programme FP7/REGPOT-2012-2013.1 under grant agreement n˚ 316265, BIOCAPS. This document reflects only the authors’ views and the European Union is not liable for any use that may be made of the information contained herein. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Outcome Prediction in Pneumonia Induced ALI/ARDS by Clinical Features and Peptide Patterns of BALF Determined by Mass Spectrometry

    Get PDF
    BACKGROUND: Peptide patterns of bronchoalveolar lavage fluid (BALF) were assumed to reflect the complex pathology of acute lung injury (ALI)/acute respiratory distress syndrome (ARDS) better than clinical and inflammatory parameters and may be superior for outcome prediction. METHODOLOGY/PRINCIPAL FINDINGS: A training group of patients suffering from ALI/ARDS was compiled from equal numbers of survivors and nonsurvivors. Clinical history, ventilation parameters, Murray's lung injury severity score (Murray's LISS) and interleukins in BALF were gathered. In addition, samples of bronchoalveolar lavage fluid were analyzed by means of hydrophobic chromatography and MALDI-ToF mass spectrometry (MALDI-ToF MS). Receiver operating characteristic (ROC) analysis for each clinical and cytokine parameter revealed interleukin-6>interleukin-8>diabetes mellitus>Murray's LISS as the best outcome predictors. Outcome predicted on the basis of BALF levels of interleukin-6 resulted in 79.4% accuracy, 82.7% sensitivity and 76.1% specificity (area under the ROC curve, AUC, 0.853). Both clinical parameters and cytokines as well as peptide patterns determined by MALDI-ToF MS were analyzed by classification and regression tree (CART) analysis and support vector machine (SVM) algorithms. CART analysis including Murray's LISS, interleukin-6 and interleukin-8 in combination was correct in 78.0%. MALDI-ToF MS of BALF peptides did not reveal a single identifiable biomarker for ARDS. However, classification of patients was successfully achieved based on the entire peptide pattern analyzed using SVM. This method resulted in 90% accuracy, 93.3% sensitivity and 86.7% specificity following a 10-fold cross validation (AUC = 0.953). Subsequent validation of the optimized SVM algorithm with a test group of patients with unknown prognosis yielded 87.5% accuracy, 83.3% sensitivity and 90.0% specificity. CONCLUSIONS/SIGNIFICANCE: MALDI-ToF MS peptide patterns of BALF, evaluated by appropriate mathematical methods can be of value in predicting outcome in pneumonia induced ALI/ARDS

    A Bayesian framework for statistical signal processing and knowledge discovery in proteomic engineering

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, February 2006.Includes bibliographical references (leaves 73-85).Proteomics has been revolutionized in the last couple of years through integration of new mass spectrometry technologies such as -Enhanced Laser Desorption/Ionization (SELDI) mass spectrometry. As data is generated in an increasingly rapid and automated manner, novel and application-specific computational methods will be needed to deal with all of this information. This work seeks to develop a Bayesian framework in mass-based proteomics for protein identification. Using the Bayesian framework in a statistical signal processing manner, mass spectrometry data is filtered and analyzed in order to estimate protein identity. This is done by a multi-stage process which compares probabilistic networks generated from mass spectrometry-based data with a mass-based network of protein interactions. In addition, such models can provide insight on features of existing models by identifying relevant proteins. This work finds that the search space of potential proteins can be reduced such that simple antibody-based tests can be used to validate protein identity. This is done with real proteins as a proof of concept. Regarding protein interaction networks, the largest human protein interaction meta-database was created as part of this project, containing over 162,000 interactions. A further contribution is the implementation of the massome network database of mass-based interactions- which is used in the protein identification process.(cont.) This network is explored in terms potential usefulness for protein identification. The framework provides an approach to a number of core issues in proteomics. Besides providing these tools, it yields a novel way to approach statistical signal processing problems in this domain in a way that can be adapted as proteomics-based technologies mature.by Gil Alterovitz.Ph.D

    Automated peak identification for time -of -flight mass spectroscopy

    Get PDF
    The high throughput capabilities of protein mass fingerprints measurements have made mass spectrometry one of the standard tools for proteomic research, such as biomarker discovery. However, the analysis of large raw data sets produced by the time-of-flight (TOF) spectrometers creates a bottleneck in the discovery process. One specific challenge is the preprocessing and identification of mass peaks corresponding to important biological molecules. The accuracy of mass assignment is another limitation when comparing mass fingerprints with databases.;We have developed an automated peak picking algorithm based on a maximum likelihood approach that effectively and efficiently detects peaks in a time-of-flight secondary ion mass spectrum. This approach produces maximum likelihood estimates of peak positions and amplitudes, and simultaneously develops estimates of the uncertainties in each of these quantities. We demonstrate that a Poisson process is involved for time-of-flight secondary ion mass spectrometry (TOF-SIMS) and the algorithm takes the character of the Poisson noise into account.;Though this peak picking algorithm was initially developed for TOF-SIMS spectra, it can be extended to other types of TOF spectra as soon as the correct noise characteristics are considered. We have developed a peak alignment procedure that aligns peaks in different spectra. This is a crucial step for multivariate analysis. Multivariate analysis is often used to distill useful information from complex spectra.;We have designed a TOF-SIMS experiment that consists of various mixtures of three bio-molecules as a model for more complicated biomarker discovery. The peak picking algorithm is applied to the collected spectra. The algorithm detects peaks in the spectra repeatably and accurately. We also show that there are patterns in the spectra of pure biomolecules samples. Furthermore, we show it is possible to infer the concentration ratios in the mixture samples by checking the strength of the patterns

    P4P: a peptidome-based strain-level genome comparison web tool

    Get PDF
    Peptidome similarity analysis enables researchers to gain insights into differential peptide profiles, providing a robust tool to discriminate strain-specific peptides, true intra-species differences among biological replicates or even microorganism-phenotype variations. However, no in silico peptide fingerprinting software existed to facilitate such phylogeny inference. Hence, we developed the Peptidomes for Phylogenies (P4P) web tool, which enables the survey of similarities between microbial proteomes and simplifies the process of obtaining new biological insights into their phylogeny. P4P can be used to analyze different peptide datasets, i.e. bacteria, viruses, eukaryotic species or even metaproteomes. Also, it is able to work with whole proteome datasets and experimental mass-to-charge lists originated from mass spectrometers. The ultimate aim is to generate a valid and manageable list of peptides that have phylogenetic signal and are potentially sample-specific. Sample-to-sample comparison is based on a consensus peak set matrix, which can be further submitted to phylogenetic analysis. P4P holds great potential for improving phylogenetic analyses in challenging taxonomic groups, biomarker identification or epidemiologic studies. Notably, P4P can be of interest for applications handling large proteomic datasets, which it is able to reduce to small matrices while maintaining high phylogenetic resolution. The web server is available at http://sing-group.org/p4p.Spanish ‘Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad’ [AGL2013-44039R]; Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020[POCI-01-0145-FEDER-006684];INOU16-05project from the University of Vigo; Fundación AECC. Funding for open access charge: Spanish ‘Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad’ [AGL2013-44039R].info:eu-repo/semantics/publishedVersio
    corecore