288 research outputs found
Present and future of proteomics data curation at the PRIDE database
Significant progress has been made in improving the accessibility and utility of the large amounts of generated high-throughput proteomics data by the introduction of publicly available proteomics repositories. One such repository is PRIDE (the ‘PRoteomics IDEntifications’ database, "http://www.ebi.ac.uk/pride":http://www.ebi.ac.uk/pride). PRIDE stores mass spectrometry related data, including peptide and protein identifications, mass spectra and valuable additional metadata.

At present, data curation in PRIDE is limited to data submission support. The format in which all submissions need to take place is PRIDE XML. Mass spectrometry derived data is very heterogeneous in terms of experimental approaches, instrumentation, data formats, etc. This is why conversion of all this different data to PRIDE XML is far from being trivial and can be very time consuming, since tailored submission pipelines must be often constructed. However, the situation has now ameliorated since some new tools like PRIDE converter ("http://code.google.com/p/pride-converter":http://code.google.com/p/pride-converter). are now available for submitters to convert their data to PRIDE XML.

In the near future, data curation in PRIDE will be significantly extended. High-quality data will be included in a new repository called PRIDE-plus. First of all, it will be necessary to create a set of minimal requirement rules to decide which datasets can be included in PRIDE-plus. Then, the design and implementation of new curation tools to perform data quality assessment will be essential. It will also be necessary to do research into the automation of these new curation and annotation tasks
The HUPO Proteomics Standards Initiative Meeting: Towards Common Standards for Exchanging Proteomics Data
The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange
and verification. Initially the fields of proteinâprotein interactions (PPI) and mass
spectroscopy have been targeted and the inaugural meeting of the PSI addressed the
questions of data storage and exchange in both of these areas. The PPI group rapidly
reached consensus as to the minimum requirements for a data exchange model; an
XML draft is now being produced. The mass spectroscopy group have achieved major
advances in the definition of a required data model and working groups are currently
taking these discussions further. A further meeting is planned in January 2003 to
advance both these projects
The ESF Programme on Functional Genomics Workshop on âData Integration in Functional Genomics: Application to Biological Pathwaysâ
We report from the second ESF Programme on Functional Genomics workshop on
Data Integration, which covered topics including the status of biological pathways
databases in existing consortia; pathways as part of bioinformatics infrastructures;
design, creation and formalization of biological pathways databases; generating
and supporting pathway data and interoperability of databases with other external
databases and standards. Key issues emerging from the discussions were the need for
continued funding to cover maintenance and curation of databases, the importance
of quality control of the data in these resources, and efforts to facilitate the exchange
of data and to ensure the interoperability of databases
The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries
BACKGROUND: With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries. RESULTS: The Ontology Lookup Service (OLS) was created to integrate publicly available biomedical ontologies into a single database. All modified ontologies are updated daily. A list of currently loaded ontologies is available online. The database can be queried to obtain information on a single term or to browse a complete ontology using AJAX. Auto-completion provides a user-friendly search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology or subsets of it. A programmatic interface is available to query the webservice using SOAP. The service is described by a WSDL descriptor file available online. A sample Java client to connect to the webservice using SOAP is available for download from SourceForge. All OLS source code is publicly available under the open source Apache Licence. CONCLUSION: The OLS provides a user-friendly single entry point for publicly available ontologies in the Open Biomedical Ontology (OBO) format. It can be accessed interactively or programmatically at
Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: [email protected] Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online
OntoDas â a tool for facilitating the construction of complex queries to the Gene Ontology
<p>Abstract</p> <p>Background</p> <p>Ontologies such as the Gene Ontology can enable the construction of complex queries over biological information in a conceptual way, however existing systems to do this are too technical. Within the biological domain there is an increasing need for software that facilitates the flexible retrieval of information. OntoDas aims to fulfil this need by allowing the definition of queries by selecting valid ontology terms.</p> <p>Results</p> <p>OntoDas is a web-based tool that uses information visualisation techniques to provide an intuitive, interactive environment for constructing ontology-based queries against the Gene Ontology Database. Both a comprehensive use case and the interface itself were designed in a participatory manner by working with biologists to ensure that the interface matches the way biologists work. OntoDas was further tested with a separate group of biologists and refined based on their suggestions.</p> <p>Conclusion</p> <p>OntoDas provides a visual and intuitive means for constructing complex queries against the Gene Ontology. It was designed with the participation of biologists and compares favourably with similar tools. It is available at <url>http://ontodas.nbn.ac.za</url></p
The Ontology Lookup Service: bigger and better
The Ontology Lookup Service (OLS; http://www.ebi .ac.uk/ols) has been providing several means to query, browse and navigate biomedical ontologies and controlled vocabularies since it first went into production 4 years ago, and usage statistics indicate that it has become a heavily accessed service with millions of hits monthly. The volume of data available for querying has increased 7-fold since its inception. OLS functionality has been integrated into several high-usage databases and data entry tools. Improvements in the data model and loaders, as well as interface enhancements have made the OLS easier to use and capture more annotations from the source data. In addition, newly released software packages now provide easy means to fully integrate OLS functionality in external applications.publishedVersio
easyDAS: Automatic creation of DAS servers
Background: The Distributed Annotation System (DAS) has proven to be a successful way to publish and share
biological data. Although there are more than 750 active registered servers from around 50 organizations, setting
up a DAS server comprises a fair amount of work, making it difficult for many research groups to share their
biological annotations. Given the clear advantage that the generalized sharing of relevant biological data is for the
research community it would be desirable to facilitate the sharing process.
Results: Here we present easyDAS, a web-based system enabling anyone to publish biological annotations with
just some clicks. The system, available at http://www.ebi.ac.uk/panda-srv/easydas is capable of reading different
standard data file formats, process the data and create a new publicly available DAS source in a completely
automated way. The created sources are hosted on the EBI systems and can take advantage of its high storage
capacity and network connection, freeing the data provider from any network management work. easyDAS is an
open source project under the GNU LGPL license.
Conclusions: easyDAS is an automated DAS source creation system which can help many researchers in sharing
their biological data, potentially increasing the amount of relevant biological data available to the scientific
community.Postprint (published version
Progress in Establishing Common Standards for Exchanging Proteomics Data: The Second Meeting of the HUPO Proteomics Standards Initiative
The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and
verification. Rapid progress has been made in the development of common standards
for data exchange in the fields of both mass spectrometry and proteinâprotein interactions
since the first PSI meeting [1]. Both hardware and software manufacturers
have agreed to work to ensure that a proteomics-specific extension is created for the
emerging ASTM mass spectrometry standard and the data model for a proteomics
experiment has advanced significantly. The ProteinâProtein Interactions (PPI) group
expects to publish the Level 1 PSI data exchange format for proteinâprotein interactions
by early summer this year, and discussion as to the additional content of Level
2 has been initiated
- âŚ