166 research outputs found
MAPU 2.0: high-accuracy proteomes mapped to genomes
The MAPU 2.0 database contains proteomes of organelles, tissues and cell types measured by mass spectrometry (MS)-based proteomics. In contrast to other databases it is meant to contain a limited number of experiments and only those with very high-resolution and -accuracy data. MAPU 2.0 displays the proteomes of organelles, tissues and body fluids or conversely displays the occurrence of proteins of interest in all these proteomes. The new release addresses MS-specific problems including ambiguous peptide-to-protein assignments and it provides insight into general functional features on the protein level ranging from gene ontology classification to comprehensive SwissProt annotation. Moreover, the derived proteomic data are used to annotate the genomes using Distributed Annotation Service (DAS) via EnsEMBL services. MAPU 2.0 is a model for a database specifically designed for high-accuracy proteomics and a member of the ProteomExchange Consortium. It is available on line at http://www.mapuproteome.com
From DNA sequence to application: possibilities and complications
The development of sophisticated genetic tools during the past 15 years have facilitated a tremendous increase of fundamental and application-oriented knowledge of lactic acid bacteria (LAB) and their bacteriophages. This knowledge relates both to the assignments of open reading frames (ORF’s) and the function of non-coding DNA sequences. Comparison of the complete nucleotide sequences of several LAB bacteriophages has revealed that their chromosomes have a fixed, modular structure, each module having a set of genes involved in a specific phase of the bacteriophage life cycle. LAB bacteriophage genes and DNA sequences have been used for the construction of temperature-inducible gene expression systems, gene-integration systems, and bacteriophage defence systems.
The function of several LAB open reading frames and transcriptional units have been identified and characterized in detail. Many of these could find practical applications, such as induced lysis of LAB to enhance cheese ripening and re-routing of carbon fluxes for the production of a specific amino acid enantiomer. More knowledge has also become available concerning the function and structure of non-coding DNA positioned at or in the vicinity of promoters. In several cases the mRNA produced from this DNA contains a transcriptional terminator-antiterminator pair, in which the antiterminator can be stabilized either by uncharged tRNA or by interaction with a regulatory protein, thus preventing formation of the terminator so that mRNA elongation can proceed. Evidence has accumulated showing that also in LAB carbon catabolite repression in LAB is mediated by specific DNA elements in the vicinity of promoters governing the transcription of catabolic operons.
Although some biological barriers have yet to be solved, the vast body of scientific information presently available allows the construction of tailor-made genetically modified LAB. Today, it appears that societal constraints rather than biological hurdles impede the use of genetically modified LAB.
PRIDE: new developments and new datasets
The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML
A DIGE study on the effects of salbutamol on the rat muscle proteome - an exemplar of best practice for data sharing in proteomics
BACKGROUND: Proteomic techniques allow researchers to perform detailed analyses of cellular states and many studies are published each year, which highlight large numbers of proteins quantified in different samples. However, currently few data sets make it into public databases with sufficient metadata to allow other groups to verify findings, perform data mining or integrate different data sets. The Proteomics Standards Initiative has released a series of "Minimum Information About a Proteomics Experiment" guideline documents (MIAPE modules) and accompanying data exchange formats. This article focuses on proteomic studies based on gel electrophoresis and demonstrates how the corresponding MIAPE modules can be fulfilled and data deposited in public databases, using a new experimental data set as an example. FINDINGS: We have performed a study of the effects of an anabolic agent (salbutamol) at two different time points on the protein complement of rat skeletal muscle cells, quantified by difference gel electrophoresis. In the DIGE study, a total of 31 non-redundant proteins were identified as being potentially modulated at 24 h post treatment and 110 non redundant proteins at 96 h post-treatment. Several categories of function have been highlighted as strongly enriched, providing candidate proteins for further study. We also use the study as an example of best practice for data deposition. CONCLUSIONS: We have deposited all data sets from this study in public databases for further analysis by the community. We also describe more generally how gel-based protein identification data sets can now be deposited in the PRoteomics IDEntifications database (PRIDE), using a new software tool, the PRIDESpotMapper, which we developed to work in conjunction with the PRIDE Converter application. We also demonstrate how the ProteoRed MIAPE generator tool can be used to create and share a complete and compliant set of MIAPE reports for this experiment and others
The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation
<p>Abstract</p> <p>Background</p> <p>Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments.</p> <p>Results</p> <p>In this manuscript, we present the <it>Drosophila melanogaster </it>PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal <url>http://www.drosophila-peptideatlas.org</url> allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s) in which it was observed.</p> <p>Conclusion</p> <p>PeptideAtlas is an open access database for the <it>Drosophila </it>community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed <it>Drosophila </it>peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.</p
mspecLINE: bridging knowledge of human disease with the proteome
<p>Abstract</p> <p>Background</p> <p>Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database.</p> <p>Results</p> <p>The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay.</p> <p>Conclusions</p> <p>Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p
Tandem mass spectrometry data quality assessment by self-convolution
<p>Abstract</p> <p>Background</p> <p>Many algorithms have been developed for deciphering the tandem mass spectrometry (MS) data sets. They can be essentially clustered into two classes. The first performs searches on theoretical mass spectrum database, while the second based itself on <it>de novo </it>sequencing from raw mass spectrometry data. It was noted that the quality of mass spectra affects significantly the protein identification processes in both instances. This prompted the authors to explore ways to measure the quality of MS data sets before subjecting them to the protein identification algorithms, thus allowing for more meaningful searches and increased confidence level of proteins identified.</p> <p>Results</p> <p>The proposed method measures the qualities of MS data sets based on the symmetric property of b- and y-ion peaks present in a MS spectrum. Self-convolution on MS data and its time-reversal copy was employed. Due to the symmetric nature of b-ions and y-ions peaks, the self-convolution result of a good spectrum would produce a highest mid point intensity peak. To reduce processing time, self-convolution was achieved using Fast Fourier Transform and its inverse transform, followed by the removal of the "DC" (Direct Current) component and the normalisation of the data set. The quality score was defined as the ratio of the intensity at the mid point to the remaining peaks of the convolution result. The method was validated using both theoretical mass spectra, with various permutations, and several real MS data sets. The results were encouraging, revealing a high percentage of positive prediction rates for spectra with good quality scores.</p> <p>Conclusion</p> <p>We have demonstrated in this work a method for determining the quality of tandem MS data set. By pre-determining the quality of tandem MS data before subjecting them to protein identification algorithms, spurious protein predictions due to poor tandem MS data are avoided, giving scientists greater confidence in the predicted results. We conclude that the algorithm performs well and could potentially be used as a pre-processing for all mass spectrometry based protein identification tools.</p
Genomic structure and insertion sites of Helicobacter pylori prophages from various geographical origins
We present the full genomic sequences, insertion sites and phylogenetic analysis of 28 prophages found in H. pylori isolates from patients of distinct disease types, ranging from gastritis to gastric cancer, and geographic origins, covering most continents. The gentic diversity of H pylori is known to be influenced by these genomic elements including prophages who’s geneomes range from 22.6 to 33.0 Kbp. There was a high conservation of integration site shared in over 50% of cases with greater than 40% or prophage genomes harbouring insertion sequences (IS). Furthermore prophage genomes present a robust phylogeographic pattern, revealing four distinct clusters: one African, one Asian and two European prophage populations. There was evidence of recombination within the genome of some prophages, which resulted in genome mosaics composed by different populations, which may yield additional H. pylori phenotypes
- …