22 research outputs found
The PeptideAtlas project
The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas (http://www.peptideatlas.org/) addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas build
The PeptideAtlas project
The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas () addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas builds
Protein Cross-Linking Analysis Using Mass Spectrometry, Isotope-Coded Cross-Linkers, and Integrated Computational Data Processing
Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics
BACKGROUND: Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.
RESULTS: We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.
CONCLUSION: The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field
Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas
We present the Saccharomyces cerevisiae PeptideAtlas composed from 47 diverse experiments and 4.9 million tandem mass spectra. The observed peptides align to 61% of Saccharomyces Genome Database (SGD) open reading frames (ORFs), 49% of the uncharacterized SGD ORFs, 54% of S. cerevisiae ORFs with a Gene Ontology annotation of 'molecular function unknown', and 76% of ORFs with Gene names. We highlight the use of this resource for data mining, construction of high quality lists for targeted proteomics, validation of proteins, and software development
The Generation R Study: design and cohort update 2010
The Generation R Study is a population-based prospective cohort study from fetal life until young adulthood. The study is designed to identify early environmental and genetic causes of normal and abnormal growth, development and health during fetal life, childhood and adulthood. The study focuses on four primary areas of research: (1) growth and physical development; (2) behavioural and cognitive development; (3) diseases in childhood; and (4) health and healthcare for pregnant women and children. In total, 9,778 mothers with a delivery date from April 2002 until January 2006 were enrolled in the study. General follow-up rates until the age of 4 years exceed 75%. Data collection in mothers, fathers and preschool children included questionnaires, detailed physical and ultrasound examinations, behavioural observations, and biological samples. A genome wide association screen is available in the participating children. Regular detailed hands on assessment are performed from the age of 5 years onwards. Eventually, results forthcoming from the Generation R Study have to contribute to the development of strategies for optimizing health and healthcare for pregnant women and children
Building consensus spectral libraries for peptide identification in proteomics
Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. We developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-analysis pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about observed proteomes into a concise and retrievable format for future data analyses
Tryptic Peptide Reference Data Sets for MALDI Imaging Mass Spectrometry on Formalin-fixed Ovarian Cancer Tissues
MALDI imaging mass spectrometry is a powerful tool for
morphology-based
proteomic tissue analysis. However, peptide identification is still
a major challenge due to low S/N ratios, low mass accuracy and difficulties
in correlating observed <i>m</i>/<i>z</i> species
with peptide identities. To address this, we have analyzed tryptic
digests of formalin-fixed paraffin-embedded tissue microarray cores,
from 31 ovarian cancer patients, by LC–MS/MS. The sample preparation
closely resembled the MALDI imaging workflow in order to create representative
reference data sets containing peptides also observable in MALDI imaging
experiments. This resulted in 3844 distinct peptide sequences, at
a false discovery rate of 1%, for the entire cohort and an average
of 982 distinct peptide sequences per sample. From this, a total of
840 proteins and, on average, 297 proteins per sample could be inferred.
To support the efforts of the Chromosome-centric Human Proteome Project
Consortium, we have annotated these proteins with their respective
chromosome location. In the presented work, the benefit of using a
large cohort of data sets was exemplified by correct identification
of several <i>m</i>/<i>z</i> species observed
in a MALDI imaging experiment. The tryptic peptide data sets generated
will facilitate peptide identification in future MALDI imaging studies
on ovarian cancer
Tryptic Peptide Reference Data Sets for MALDI Imaging Mass Spectrometry on Formalin-fixed Ovarian Cancer Tissues
MALDI imaging mass spectrometry is a powerful tool for
morphology-based
proteomic tissue analysis. However, peptide identification is still
a major challenge due to low S/N ratios, low mass accuracy and difficulties
in correlating observed <i>m</i>/<i>z</i> species
with peptide identities. To address this, we have analyzed tryptic
digests of formalin-fixed paraffin-embedded tissue microarray cores,
from 31 ovarian cancer patients, by LC–MS/MS. The sample preparation
closely resembled the MALDI imaging workflow in order to create representative
reference data sets containing peptides also observable in MALDI imaging
experiments. This resulted in 3844 distinct peptide sequences, at
a false discovery rate of 1%, for the entire cohort and an average
of 982 distinct peptide sequences per sample. From this, a total of
840 proteins and, on average, 297 proteins per sample could be inferred.
To support the efforts of the Chromosome-centric Human Proteome Project
Consortium, we have annotated these proteins with their respective
chromosome location. In the presented work, the benefit of using a
large cohort of data sets was exemplified by correct identification
of several <i>m</i>/<i>z</i> species observed
in a MALDI imaging experiment. The tryptic peptide data sets generated
will facilitate peptide identification in future MALDI imaging studies
on ovarian cancer