22 research outputs found

    The PeptideAtlas project

    Get PDF
    The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas (http://www.peptideatlas.org/) addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas build

    The PeptideAtlas project

    Get PDF
    The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas () addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas builds

    Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics

    Get PDF
    BACKGROUND: Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. RESULTS: We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. CONCLUSION: The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field

    Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas

    Get PDF
    We present the Saccharomyces cerevisiae PeptideAtlas composed from 47 diverse experiments and 4.9 million tandem mass spectra. The observed peptides align to 61% of Saccharomyces Genome Database (SGD) open reading frames (ORFs), 49% of the uncharacterized SGD ORFs, 54% of S. cerevisiae ORFs with a Gene Ontology annotation of 'molecular function unknown', and 76% of ORFs with Gene names. We highlight the use of this resource for data mining, construction of high quality lists for targeted proteomics, validation of proteins, and software development

    The Generation R Study: design and cohort update 2010

    Get PDF
    The Generation R Study is a population-based prospective cohort study from fetal life until young adulthood. The study is designed to identify early environmental and genetic causes of normal and abnormal growth, development and health during fetal life, childhood and adulthood. The study focuses on four primary areas of research: (1) growth and physical development; (2) behavioural and cognitive development; (3) diseases in childhood; and (4) health and healthcare for pregnant women and children. In total, 9,778 mothers with a delivery date from April 2002 until January 2006 were enrolled in the study. General follow-up rates until the age of 4 years exceed 75%. Data collection in mothers, fathers and preschool children included questionnaires, detailed physical and ultrasound examinations, behavioural observations, and biological samples. A genome wide association screen is available in the participating children. Regular detailed hands on assessment are performed from the age of 5 years onwards. Eventually, results forthcoming from the Generation R Study have to contribute to the development of strategies for optimizing health and healthcare for pregnant women and children

    Building consensus spectral libraries for peptide identification in proteomics

    No full text
    Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. We developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-analysis pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about observed proteomes into a concise and retrievable format for future data analyses

    Tryptic Peptide Reference Data Sets for MALDI Imaging Mass Spectrometry on Formalin-fixed Ovarian Cancer Tissues

    No full text
    MALDI imaging mass spectrometry is a powerful tool for morphology-based proteomic tissue analysis. However, peptide identification is still a major challenge due to low S/N ratios, low mass accuracy and difficulties in correlating observed <i>m</i>/<i>z</i> species with peptide identities. To address this, we have analyzed tryptic digests of formalin-fixed paraffin-embedded tissue microarray cores, from 31 ovarian cancer patients, by LC–MS/MS. The sample preparation closely resembled the MALDI imaging workflow in order to create representative reference data sets containing peptides also observable in MALDI imaging experiments. This resulted in 3844 distinct peptide sequences, at a false discovery rate of 1%, for the entire cohort and an average of 982 distinct peptide sequences per sample. From this, a total of 840 proteins and, on average, 297 proteins per sample could be inferred. To support the efforts of the Chromosome-centric Human Proteome Project Consortium, we have annotated these proteins with their respective chromosome location. In the presented work, the benefit of using a large cohort of data sets was exemplified by correct identification of several <i>m</i>/<i>z</i> species observed in a MALDI imaging experiment. The tryptic peptide data sets generated will facilitate peptide identification in future MALDI imaging studies on ovarian cancer

    Tryptic Peptide Reference Data Sets for MALDI Imaging Mass Spectrometry on Formalin-fixed Ovarian Cancer Tissues

    No full text
    MALDI imaging mass spectrometry is a powerful tool for morphology-based proteomic tissue analysis. However, peptide identification is still a major challenge due to low S/N ratios, low mass accuracy and difficulties in correlating observed <i>m</i>/<i>z</i> species with peptide identities. To address this, we have analyzed tryptic digests of formalin-fixed paraffin-embedded tissue microarray cores, from 31 ovarian cancer patients, by LC–MS/MS. The sample preparation closely resembled the MALDI imaging workflow in order to create representative reference data sets containing peptides also observable in MALDI imaging experiments. This resulted in 3844 distinct peptide sequences, at a false discovery rate of 1%, for the entire cohort and an average of 982 distinct peptide sequences per sample. From this, a total of 840 proteins and, on average, 297 proteins per sample could be inferred. To support the efforts of the Chromosome-centric Human Proteome Project Consortium, we have annotated these proteins with their respective chromosome location. In the presented work, the benefit of using a large cohort of data sets was exemplified by correct identification of several <i>m</i>/<i>z</i> species observed in a MALDI imaging experiment. The tryptic peptide data sets generated will facilitate peptide identification in future MALDI imaging studies on ovarian cancer
    corecore