67 research outputs found

    Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    Get PDF
    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/

    Development of data representation standards by the human proteome organization proteomics standards initiative.

    Get PDF
    OBJECTIVE: To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. MATERIALS AND METHODS: The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. RESULTS: We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. The PSI has produced a series of standard formats covering mass spectrometer input, mass spectrometer output, results of informatics analysis (both qualitative and quantitative analyses), reports of molecular interaction data, and gel electrophoresis analyses. We have produced controlled vocabularies that ensure that concepts are uniformly annotated in the formats and engaged in extensive software development and dissemination efforts so that the standards can efficiently be used by the community.Conclusion In its first dozen years of operation, the PSI has produced many standards that have accelerated the field of proteomics by facilitating data exchange and deposition to data repositories. We look to the future to continue developing standards for new proteomics technologies and workflows and mechanisms for integration with other omics data types. Our products facilitate the translation of genomics and proteomics findings to clinical and biological phenotypes. The PSI website can be accessed at http://www.psidev.info

    Insights from the first phosphopeptide challenge of the MS resource pillar of the HUPO human proteome project

    Get PDF
    Mass spectrometry has greatly improved the analysis of phosphorylation events in complex biological systems and on a large scale. Despite considerable progress, the correct identification of phosphorylated sites, their quantification, and their interpretation regarding physiological relevance remain challenging. The MS Resource Pillar of the Human Proteome Organization (HUPO) Human Proteome Project (HPP) initiated the Phosphopeptide Challenge as a resource to help the community evaluate methods, learn procedures and data analysis routines, and establish their own workflows by comparing results obtained from a standard set of 94 phosphopeptides (serine, threonine, tyrosine) and their nonphosphorylated counterparts mixed at different ratios in a neat sample and a yeast background. Participants analyzed both samples with their method(s) of choice to report the identification and site localization of these peptides, determine their relative abundances, and enrich for the phosphorylated peptides in the yeast background. We discuss the results from 22 laboratories that used a range of different methods, instruments, and analysis software. We reanalyzed submitted data with a single software pipeline and highlight the successes and challenges in correct phosphosite localization. All of the data from this collaborative endeavor are shared as a resource to encourage the development of even better methods and tools for diverse phosphoproteomic applications. All submitted data and search results were uploaded to MassIVE (littps://massive.ucsd.edu/) as data set MSV000085932 with ProteomeXchange identifier PXD020801.Proteomic

    Evolutionary molecular medicine

    No full text
    Evolution has long provided a foundation for population genetics, but some major advances in evolutionary biology from the twentieth century that provide foundations for evolutionary medicine are only now being applied in molecular medicine. They include the need for both proximate and evolutionary explanations, kin selection, evolutionary models for cooperation, competition between alleles, co-evolution, and new strategies for tracing phylogenies and identifying signals of selection. Recent advances in genomics are transforming evolutionary biology in ways that create even more opportunities for progress at its interfaces with genetics, medicine, and public health. This article reviews 15 evolutionary principles and their applications in molecular medicine in hopes that readers will use them and related principles to speed the development of evolutionary molecular medicine
    corecore