Search CORE

Ghent University Academic Bibliography

ZORA

A golden age for working with public proteomics data

Author: Martens Lennart
Vizcaíno Juan Antonio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature 'omics' disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets

The contribution of Alu exons to the human proteome.

Author: Jiang Peng
Lam Maggie PY
Lin Lan
Lu Zhi-Xiang
Park Juw Won
Ping Peipei
Wang Jinkai
Xing Yi
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

BackgroundAlu elements are major contributors to lineage-specific new exons in primate and human genomes. Recent studies indicate that some Alu exons have high transcript inclusion levels or tissue-specific splicing profiles, and may play important regulatory roles in modulating mRNA degradation or translational efficiency. However, the contribution of Alu exons to the human proteome remains unclear and controversial. The prevailing view is that exons derived from young repetitive elements, such as Alu elements, are restricted to regulatory functions and have not had adequate evolutionary time to be incorporated into stable, functional proteins.ResultsWe adopt a proteotranscriptomics approach to systematically assess the contribution of Alu exons to the human proteome. Using RNA sequencing, ribosome profiling, and proteomics data from human tissues and cell lines, we provide evidence for the translational activities of Alu exons and the presence of Alu exon derived peptides in human proteins. These Alu exon peptides represent species-specific protein differences between primates and other mammals, and in certain instances between humans and closely related primates. In the case of the RNA editing enzyme ADARB1, which contains an Alu exon peptide in its catalytic domain, RNA sequencing analyses of A-to-I editing demonstrate that both the Alu exon skipping and inclusion isoforms encode active enzymes. The Alu exon derived peptide may fine tune the overall editing activity and, in limited cases, the site selectivity of ADARB1 protein products.ConclusionsOur data indicate that Alu elements have contributed to the acquisition of novel protein sequences during primate and human evolution

eScholarship - University of California

ISPIDER Central: an integrated database web-server for proteomics

Author: Belhajjame K.
Côté R.
Embury S.M.
Goble C.A.
Hermjakob H.
Hubbard S.J.
Jones D.T.
Jones P.
Martin Nigel
Oliver S.G.
Orengo C.A.
Paton N.W.
Pentony M.M.
Poulovassilis Alexandra
Selley J.N.
Siepen J.A.
Stevens R.
Zamboulis Lucas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 25/04/2008
Field of study

Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics

Birkbeck Institutional Research Online

The Australian National University

The University of Manchester - Institutional Repository

The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation

Abstract Background Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments. Results In this manuscript, we present the <it>Drosophila melanogaster </it>PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal <url>http://www.drosophila-peptideatlas.org</url> allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s) in which it was observed. Conclusion PeptideAtlas is an open access database for the <it>Drosophila </it>community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed <it>Drosophila </it>peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.</p

Repository for Publications and Research Data

Directory of Open Access Journals

Repository for Publications and Research Data

Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry

Author: Aderem Alan
Aebersold Ruedi
Boyle Rose
Brunner Erich
Desiere Frank
Deutsch Eric W
Donohoe Samuel
Eng Jimmy K
Fausto Nelson
Hafen Ernst
Hood Lee
Katze Michael G
Kennedy Kathleen A
King Nichole L
Kregenow Floyd
Lee Hookeun
Lin Biaoyang
Mallick Parag
Martin Dan
Nesvizhskii Alexey I
Ranish Jeffrey A
Rawlings David J
Samelson Lawrence E
Shiio Yuzuru
Watts Julian D
Wollscheid Bernd
Wright Michael E
Yan Wei
Yang Lihong
Yi Eugene C
Zhang Hui
Publication venue: BioMed Central
Publication date: 10/12/2004
Field of study

A crucial aim upon the completion of the human genome is the verification and functional annotation of all predicted genes and their protein products. Here we describe the mapping of peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data to eukaryotic genomes and the generation of an expandable resource for integration of data from many diverse proteomics experiments. Furthermore, we demonstrate that peptide identifications obtained from high-throughput proteomics can be integrated on a large scale with the human genome. This resource could serve as an expandable repository for MS-derived proteome information

ZORA

mspecLINE: bridging knowledge of human disease with the proteome

Author: AM Cohen
B Ye
BJ Stapley
BT Alako
C Bennett
CC van der Eijk
DJ Slotta
E Keogh
Eric W Deutsch
EW Deutsch
F Desiere
H Liao
H Liu
HJ Lowe
J Boyle
J Saltz
Jeremy Handcock
John Boyle
M Li
M Li
M Li
MY Brusniak
P Khatri
P Mallick
P Picotti
P Shannon
PA Covitz
R Cilibrasi
R Cilibrasi
R Homayouni
RL Cilibrasi
S Deerwester
V Lange
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database. Results The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay. Conclusions Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p

Directory of Open Access Journals

Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

Author: Binz P.A.
Campbell D.S.
Deutsch E.W.
Farrah T.
Mendoza L.
Moritz R.L.
Omenn G.S.
Shteynberg D.
Sun Z.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 12/09/2016
Field of study

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/

Serveur académique lausannois