34 research outputs found

    Integrative Analysis Frameworks for Improved Peptide and Protein Identifications from Tandem Mass Spectrometry Data.

    Full text link
    Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for high throughput protein identification in modern proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. The major focus of these statistical methods is to assign probability scores to the identifications to distinguish between high confidence, reliable identifications that may be accepted (typically corresponding to a false discovery rate, FDR, of 1% or 5%) and lower confidence, spurious identifications that are rejected. These identification probabilities are determined, in general, considering only evidence from the MS/MS data. However, considering the wealth of external (orthogonal) data available for most biological systems, integrating such orthogonal information into proteomics analysis pipelines can be a promising approach to improve the sensitivity of these analysis pipelines and rescue true positive identifications that were rejected for want of sufficient evidence supporting their presence. In this dissertation, approaches based on naive bayes rescoring, search space restriction, and a hybrid approach that combines both are described for integrating orthogonal information in proteomic analysis pipelines. These methods have been applied for integrating transcript abundance data from RNA-seq and identification frequency data from the Global Proteome Machine database, GPMDB (one of the largest repositories of proteomic experiment results), into analysis pipelines, improving the number of peptide and protein identifications from MS/MS data. Further, estimation of false discovery rates in very large proteomic datasets was also investigated. In very large datasets, usually resulting from integrating data from multiple experiments, some assumptions used in typical target-decoy based FDR estimation in smaller datasets no longer hold true, resulting in artificially inflated error rates. Alternative approaches that would allow accurate FDR estimation in these large scale datasets have been described and benchmarked.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116717/1/avinashs_1.pd

    Highlights from the ISCB Student Council Symposium 2013

    Get PDF
    This report summarizes the scientific content and activities of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) / European Conference on Computational Biology (ECCB) conference in Berlin, Germany, on July 19, 2013

    Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics

    No full text
    In shotgun proteomics, peptides are typically identified using database searching, which involves scoring acquired tandem mass spectra against peptides derived from standard protein sequence databases such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity of peptide identification is known to be affected by the size of the search space. Therefore, creating a targeted sequence database containing only peptides likely to be present in the analyzed sample can be a useful technique for improving the sensitivity of peptide identification. In this study, we describe how targeted peptide databases can be created based on the frequency of identification in the global proteome machine database (GPMDB), the largest publicly available repository of peptide and protein identification data. We demonstrate that targeted peptide databases can be easily integrated into existing proteome analysis workflows and describe a computational strategy for minimizing any loss of peptide identifications arising from potential search space incompleteness in the targeted search spaces. We demonstrate the performance of our workflow using several data sets of varying size and sample complexity

    Utility of RNA-seq and GPMDB Protein Observation Frequency for Improving the Sensitivity of Protein Identification by Tandem MS

    No full text
    Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for protein identification in proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. In general, these methods do not utilize any information other than spectral data for protein identification. However, considering the wealth of external data available for many biological systems, analysis methods can incorporate such information to improve the sensitivity of protein identification. In this study, we present a method to utilize Global Proteome Machine Database identification frequencies and RNA-seq transcript abundances to adjust the confidence scores of protein identifications. The method described is particularly useful for samples with low-to-moderate proteome coverage (i.e., <2000–3000 proteins), where we observe up to an 8% improvement in the number of proteins identified at a 1% false discovery rate

    Highlights from the 1st ISCB Latin American Student Council Symposium 2014

    Get PDF
    This report summarizes the scientific content and activities of the first edition of the Latin American Symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Third Latin American conference from the International Society for Computational Biology (ISCB-LA 2014) in Belo Horizonte, Brazil, on October 27, 2014.Fil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Hasenahuer, Marcia Anahí. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB | Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB | Universidad Nacional de la Plata. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB; ArgentinaFil: Olguin-Orellana, Gabriel J.. Pontificia Universidad Católica de Chile; ChileFil: Shanmugam, Avinash K .. University of Michigan; Estados Unido
    corecore