43 research outputs found

    Inferring Proteolytic Processes from Mass Spectrometry Time Series Data Using Degradation Graphs

    Get PDF
    Background: Proteases play an essential part in a variety of biological processes. Besides their importance under healthy conditions they are also known to have a crucial role in complex diseases like cancer. In recent years, it has been shown that not only the fragments produced by proteases but also their dynamics, especially ex vivo, can serve as biomarkers. But so far, only a few approaches were taken to explicitly model the dynamics of proteolysis in the context of mass spectrometry. Results: We introduce a new concept to model proteolytic processes, the degradation graph. The degradation graph is an extension of the cleavage graph, a data structure to reconstruct and visualize the proteolytic process. In contrast to previous approaches we extended the model to incorporate endoproteolytic processes and present a method to construct a degradation graph from mass spectrometry time series data. Based on a degradation graph and the intensities extracted from the mass spectra it is possible to estimate reaction rates of the underlying processes. We further suggest a score to rate different degradation graphs in their ability to explain the observed data. This score is used in an iterative heuristic to improve the structure of the initially constructed degradation graph. Conclusion: We show that the proposed method is able to recover all degraded and generated peptides, the underlying reactions, and the reaction rates of proteolytic processes based on mass spectrometry time series data. We use simulated and real data to demonstrate that a given process can be reconstructed even in the presence of extensive noise, isobaric signals and false identifications. While the model is currently only validated on peptide data it is also applicable to proteins, as long as the necessary time series data can be produced

    scalable bioinformatics via workflow conversion

    Get PDF
    Background Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free —an aspect that could potentially drive away members of the scientific community. Results We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Conclusions Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results

    Fission cross section measurements for 240Pu, 242Pu

    Get PDF
    This report comprises the deliverable 1.5 of the ANDES project (EURATOM contract FP7-249671) of Task 3 "High accuracy measurements for fission" of Work Package 1 entitled "Measurements for advanced reactor systems". This deliverables provide evidence of a successful completion of the objectives of Task 3.JRC.D.4-Standards for Nuclear Safety, Security and Safeguard

    Building ProteomeTools based on a complete synthetic human proteome.

    Get PDF
    We describe ProteomeTools, a project building molecular and digital tools from the human proteome to facilitate biomedical research. Here we report the generation and multimodal liquid chromatography-tandem mass spectrometry analysis of \u3e330,000 synthetic tryptic peptides representing essentially all canonical human gene products, and we exemplify the utility of these data in several applications. The resource (available at http://www.proteometools.org) will be extended to \u3e1 million peptides, and all data will be shared with the community via ProteomicsDB and ProteomeXchange

    qcML: an exchange format for quality control metrics from mass spectrometry experiments.

    Get PDF
    Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to extract these from the instrumental raw data. What has been missing, however, is a standard data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative). In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities. All information about qcML is available at http://code.google.com/p/qcml

    Bestimmen proteolytischer Prozesse auf Basis von Massenspektrometrie- Zeitreihen

    No full text
    Proteolysis, the catalyzed hydrolysis of peptide bonds, is an important post- translational modification, having a significant influence on the life cycle of protein and peptides. It is involved in numerous biological processes, like apoptosis, cell cycle progression, or blood coagulation. More then 500 genes were annotated as proteases, the enzymes catalyzing proteolytic cleavage of proteins and peptides, but many of them are still insufficiently characterized. Hence a profound understanding of proteolytic processes is essential for a detailed analysis of many biological processes. Furthermore proteolysis is associated with multiple complex diseases like cancer and Alzheimer’s disease and is known to be involved in the infection with the HI- virus. Beyond its implication in biological processes, proteolysis can also be utilized for diagnostic and treatment purposes. Proteases, the enzymes catalyzing proteolytic cleavage, are established drug targets and their potential as biomarkers has been postulated in 2006 by Villanueva et al. In this thesis we present a novel approach to the characterization of proteolytic processes using mass spectrometry data. We utilize the qualitative and quantitative information of the mass spectra to construct a model, the degradation graph, containing all involved peptides as well as the individual proteolytic reactions that connect them. We further propose a transformation of the degradation graph into a mathematical model that can be utilized in combination with the mass spectrometry data to estimate the rate constants of the individual reactions inside the degradation graph. Additionally we developed a score that can be used to rate different degradation graphs with respect to their ability to explain the observed mass spectrometry data. We use this score to iteratively improve the structure of an initially constructed degradation graph so as to account for errors during the construction of the degradation graph. While more and more mass spectrometry data is produced and is publicly available, there is a lack of well annotated, so called gold standard or ground truth datasets. Those datasets are required for a thorough benchmarking of novel algorithms and newly developed software. This problem is increasing as the experimental setups and scientific questions in computa- tional mass spectrometry get more and more complex. We therefore present MSSimulator, a comprehensive simulator for mass spectrometry data. Although using simulated data does not remove the need for testing on real datasets, it eases algorithm benchmarking and development, due to the availability of ground truth data which enables us to compare and validate the results more effectively. MSSimulator is the currently most comprehensive simulator for mass spectrometry data. It provides different types of experimental setups (e.g. labeled and label-free setups), simulation of tandem mass spectra, as well as numerous options to reflect different experimental conditions like noise, chromatographic conditions, or instrument type. It produces different levels of ground truth starting with the simulated raw data, to feature and peak locations, and relational information (e.g. grouping of charge states or labeled pairs). With the data generated by MSSimulator we benchmarked different existing applications for the analysis of mass spectrometry data as well as our own approach for the analysis of proteolytic processes.Proteolyse, die Hydrolyse von Peptidbindungen, ist eine wichtige post- translationale Modifikation, die maßgeblich den Lebenszyklus von Proteinen und Peptiden beeinflusst. Sie ist in zahlreichen biologischen Prozessen, wie z.B. der Regulation des Zellzyklus, der Apoptose oder der Blutgerinnung regulatorisch aktiv. Mehr als 500 Gene im menschlichen Genom wurden als Proteasen, Enzyme die den proteolytischen Verdau von Proteinen und Peptiden katalysieren, annotiert. Trotzdem sind viele bis heute nur unzureichend untersucht. Ein besseres Verständnis proteolytischer Prozesse, der komplexen Kaskaden von interagierenden Proteasen, ist folglich eine grundlegende Voraussetzung für eine detaillierte Analyse biologischer Prozesse. Bei der Entwicklung von komplexen Krankheiten wie Krebs und Alzheimer und der Infektion mit dem HI-Virus spielt die Proteolyse ebenfalls eine bedeutende Rolle und beeinflusst folglich sowohl deren Diagnose als auch die Behandlung. Proteasen sind etablierte Zielproteine für Arzneimittel. Ihr Potential als Biomarker wurde 2006 von Villanueva et al. beschrieben. In dieser Arbeit beschreiben wir einen neuen Ansatz zur Charakterisierung von proteolytischen Prozessen. Wir präsentieren eine Methode, die unter Ausnutzung der qualitativen und quantitativen Informationen in Massenspetrometriedaten, ein Modell - den Degradation Graph - konstruiert. Dieses Modell enthält sowohl alle involvierten Peptide als auch die proteolytischen Reaktionen, die diese mit einander verbinden. Zusätzlich beschreiben wir eine Transformation des degradation graphs in ein mathematisches Modell, welches zusammen mit den Massenspektrometriedaten dazu verwendet werden kann die Reaktionskonstanten der einzelnen proteolytischen Reaktionen zu schätzen. Darüber hinaus haben wir ein Bewertungsschema für den degradation graph entwickelt. Es dient dazu, verschiedene degradation graphs miteinander, im Bezug auf ihrer Fähigkeit die beobachteten Daten zu erklären, zu vergleichen. Dieses Bewertungsschema haben wir dazu verwendet die anfänglich konstruierten degradation graphs schrittweise zu verbessern um mögliche Fehler bei der Konstruktion auszugleichen. In den letzten Jahren ist die Menge an öffentlich verfügbaren Massenspetrometriedaten stetig angestiegen. Dennoch herrscht weiterhin ein Mangel an gut annotierten Datensätzen, so genannter Goldstandards. Die Goldstandards sind notwendig um neu entwickelte Programme und Algorithmen intensiv testen und mit bestehenden Ansätzen vergleichen zu können. Die zunehmende Komplexität der wissenschaftlichen Fragestellungen und experimentellen Techniken vergrößert den Bedarf an Goldstandards zusätzlich. Zur Lösung des Problems haben wir MSSimulator entwickelt, einen umfangreichen Simulator für Massenspetrometriedaten. Obwohl die Verwendung von simulierten Daten die Notwendigkeit der Validierung auf realen Daten nicht obsolet macht, so erleichtert es doch die Entwicklung und das Testen von neuen Methoden. Ein Vergleich mit bereits existierenden Methodiken wird ebenfalls stark vereinfacht. MSSimulator ermöglicht die Simulation von unterschiedlichen experimentellen Ansätzen sowie die Simulation von Tandem- Massenspektrometriedaten. Es bietet vielfältige Einstellmöglichkeiten um die generierten Daten unter anderem im Hinblick auf Rauschen, chromatographischen Bedingungen oder Auflösung, dem eigenen experimentellen Aufbau anzupassen. MSSimulator erzeugt mehrere Ebenen des Goldstandards, angefangen bei den simulierten Rohdaten über die exakten Peptide- und Peakpositionen bis hin zu Gruppierungsinformationen, z.B. unterschiedlicher Ladungsvarianten. Die simulierten Daten nutzen wir in dieser Arbeit zum Vergleich verschiedener existierender Applikationen, zur Analyse von Massenspektrometriedaten und zur Entwicklung und Validierung unseres Ansatzes zur Analyse von proteolytischen Prozessen

    From the desktop to the grid: scalable bioinformatics via workflow conversion

    No full text
    Background Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free —an aspect that could potentially drive away members of the scientific community. Results We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Conclusions Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results
    corecore