41 research outputs found

    Sesión de Bioinformática

    Get PDF
    Comunicaciones a congreso

    Métodos de validación de identificaciones a gran escala de proteínas y desarrollo e implementación de estándares en Proteómica

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 12-10-2013High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from singlespectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single‐spectrum distributions. Our analysis leads to a novel indicator, the probability ratio, a non‐parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. We also developed another method based on the construction of single‐spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments. On the other hand, recent developments of HUPO‐PSI (Proteomics Standards Initiative) standard data formats and MIAPE guidelines (Minimum Information About a Proteomics Experiment) are certainly contributing to proteomics data‐sharing within the scientific community. In addition, specialized journals have emphasized the use of these standards and guidelines to facilitate the evaluation and publication of new articles. However, there is an evident lack of bioinformatics tools specifically designed to manage these standards containing the required information and its connectivity with the proteomics pipeline. In this work we describe the development of a set tools based on PSI standards and MIAPE guidelines, such as semantic and MIAPE validators of proteomics standard data files, a proteomics experiment repository based on MIAPE guidelines, a Java library for the management and extraction of MIAPE information from standard data files and a tool for a complete proteomics data analysis workflow allowing the aggregation, filtering and inspection of large amount of data, as well as its dissemination by preparing a complete ProteomeXchange submission. Additionally, here we also present the contribution for the definition of the MIAPE guidelines for quantitative Proteomics experiments, receptly accepted as a new global standard for the Proteomics community

    Java API for MiAPE Generation

    Get PDF
    Comunicaciones a congreso
    corecore