Significant progress has been made in improving the accessibility and utility of the large amounts of generated high-throughput proteomics data by the introduction of publicly available proteomics repositories. One such repository is PRIDE (the ‘PRoteomics IDEntifications’ database, "http://www.ebi.ac.uk/pride":http://www.ebi.ac.uk/pride). PRIDE stores mass spectrometry related data, including peptide and protein identifications, mass spectra and valuable additional metadata.

At present, data curation in PRIDE is limited to data submission support. The format in which all submissions need to take place is PRIDE XML. Mass spectrometry derived data is very heterogeneous in terms of experimental approaches, instrumentation, data formats, etc. This is why conversion of all this different data to PRIDE XML is far from being trivial and can be very time consuming, since tailored submission pipelines must be often constructed. However, the situation has now ameliorated since some new tools like PRIDE converter ("http://code.google.com/p/pride-converter":http://code.google.com/p/pride-converter). are now available for submitters to convert their data to PRIDE XML.

In the near future, data curation in PRIDE will be significantly extended. High-quality data will be included in a new repository called PRIDE-plus. First of all, it will be necessary to create a set of minimal requirement rules to decide which datasets can be included in PRIDE-plus. Then, the design and implementation of new curation tools to perform data quality assessment will be essential. It will also be necessary to do research into the automation of these new curation and annotation tasks