41 research outputs found

    The HUPO-PSI standardized spectral library format

    Get PDF
    More and more proteomics datasets are becoming available in public repositories. The knowledge embedded in these datasets can be used to improve peptide identification workflows. Spectral library searching provides a straightforward method to boost identification rates using previously identified spectra. Alternatively, machine learning methods can learn from these spectra to accurately predict the behavior of peptides in a liquid chromatography-mass spectrometry system. At the basis of both approaches are spectral libraries: Unified collections of previously identified spectra. Organizations and projects such as the National Institute of Standards and Technology (NIST), the Global Proteome Machine, PeptideAtlas, PRIDE Archive and MassIVE have all compiled spectral libraries for a multitude of species and experimental setups. A large obstacle, however, is that each organization provides libraries in a different file format. At the software level the problem propagates (if not expands), as different software tools require different file formats. The solution is a standardized spectral library format that is sufficiently flexible to meet all users' demands, but that is also standardized enough to be usable across environments and software packages. This balance is achieved by setting up a standardized framework and a controlled vocabulary with metadata terms, and allow the format to be represented in different forms, such as plain text, JSON and HDF. So far, the required (and optional) meta data has been compiled and added to the PSI-MS ontology, and versions of the text and JSON representations have been drafted. The tabular and HDF representations of the format are in development, as well as converters and validators in various programming languages

    qcML: an exchange format for quality control metrics from mass spectrometry experiments.

    Get PDF
    Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to extract these from the instrumental raw data. What has been missing, however, is a standard data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative). In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities. All information about qcML is available at http://code.google.com/p/qcml

    Towards a History of Mass Violence in the Etat Indépendant du Congo, 1885-1908

    No full text
    The present article provides an up-to-date scholarly introduction to mass violence in the Etat Indépendant du Congo (Congo Free State, EIC). Its aims are twofold: to offer a point of access to the extensive literature and historical debates on the subject, and to make the case for exchanging the currently prevalent top-down narrative, with its excessive focus on King Leopold's character and motives, for one which considers the EIC's culture of violence as a multicausal, broadly based and deeply engrained social phenomenon. The argument is divided into five sections. Following a general outline of the EIC's violent system of administration, I discuss its social and demographic impact (and the controversy which surrounds it) to bring out the need for more regionally focused and context sensitive studies. The dispute surrounding demographics demonstrates that what is fundamentally at stake is the place the EIC's extreme violence should occupy in the history of European ‘modernity’. Since approaches which hinge on Leopoldian exceptionalism are particularly unhelpful in clarifying this issue, I pause to reflect on how such approaches came to dominate the distinct historiographical traditions which emerged in Belgium and abroad before moving on to a more detailed exploration of a selection of causes underlying the EIC's violent nature. While state actors remain in the limelight, I shift the focus from the state as a singular, normative agent, towards the existence of an extremely violent society in which various individuals and social groups within and outside of the state apparatus committed violent acts for multiple reasons. As this argument is pitched at a high level of abstraction, I conclude with a discussion of available source material with which it can be further refined and updated

    A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics.

    No full text
    Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark

    A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics

    No full text
    Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark
    corecore