43 research outputs found
Inferring Proteolytic Processes from Mass Spectrometry Time Series Data Using Degradation Graphs
Background: Proteases play an essential part in a variety of biological
processes. Besides their importance under healthy conditions they are also
known to have a crucial role in complex diseases like cancer. In recent years,
it has been shown that not only the fragments produced by proteases but also
their dynamics, especially ex vivo, can serve as biomarkers. But so far, only
a few approaches were taken to explicitly model the dynamics of proteolysis in
the context of mass spectrometry. Results: We introduce a new concept to model
proteolytic processes, the degradation graph. The degradation graph is an
extension of the cleavage graph, a data structure to reconstruct and visualize
the proteolytic process. In contrast to previous approaches we extended the
model to incorporate endoproteolytic processes and present a method to
construct a degradation graph from mass spectrometry time series data. Based
on a degradation graph and the intensities extracted from the mass spectra it
is possible to estimate reaction rates of the underlying processes. We further
suggest a score to rate different degradation graphs in their ability to
explain the observed data. This score is used in an iterative heuristic to
improve the structure of the initially constructed degradation graph.
Conclusion: We show that the proposed method is able to recover all degraded
and generated peptides, the underlying reactions, and the reaction rates of
proteolytic processes based on mass spectrometry time series data. We use
simulated and real data to demonstrate that a given process can be
reconstructed even in the presence of extensive noise, isobaric signals and
false identifications. While the model is currently only validated on peptide
data it is also applicable to proteins, as long as the necessary time series
data can be produced
scalable bioinformatics via workflow conversion
Background Reproducibility is one of the tenets of the scientific method.
Scientific experiments often comprise complex data flows, selection of
adequate parameters, and analysis and visualization of intermediate and end
results. Breaking down the complexity of such experiments into the joint
collaboration of small, repeatable, well defined tasks, each with well defined
inputs, parameters, and outputs, offers the immediate benefit of identifying
bottlenecks, pinpoint sections which could benefit from parallelization, among
others. Workflows rest upon the notion of splitting complex work into the
joint effort of several manageable tasks. There are several engines that give
users the ability to design and execute workflows. Each engine was created to
address certain problems of a specific community, therefore each one has its
advantages and shortcomings. Furthermore, not all features of all workflow
engines are royalty-free —an aspect that could potentially drive away members
of the scientific community. Results We have developed a set of tools that
enables the scientific community to benefit from workflow interoperability. We
developed a platform-free structured representation of parameters, inputs,
outputs of command-line tools in so-called Common Tool Descriptor documents.
We have also overcome the shortcomings and combined the features of two
royalty-free workflow engines with a substantial user community: the Konstanz
Information Miner, an engine which we see as a formidable workflow editor, and
the Grid and User Support Environment, a web-based framework able to interact
with several high-performance computing resources. We have thus created a free
and highly accessible way to design workflows on a desktop computer and
execute them on high-performance computing resources. Conclusions Our work
will not only reduce time spent on designing scientific workflows, but also
make executing workflows on remote high-performance computing resources more
accessible to technically inexperienced users. We strongly believe that our
efforts not only decrease the turnaround time to obtain scientific results but
also have a positive impact on reproducibility, thus elevating the quality of
obtained scientific results
Fission cross section measurements for 240Pu, 242Pu
This report comprises the deliverable 1.5 of the ANDES project (EURATOM contract FP7-249671) of Task 3 "High accuracy measurements for fission" of Work Package 1 entitled "Measurements for advanced reactor systems". This deliverables provide evidence of a successful completion of the objectives of Task 3.JRC.D.4-Standards for Nuclear Safety, Security and Safeguard
Building ProteomeTools based on a complete synthetic human proteome.
We describe ProteomeTools, a project building molecular and digital tools from the human proteome to facilitate biomedical research. Here we report the generation and multimodal liquid chromatography-tandem mass spectrometry analysis of \u3e330,000 synthetic tryptic peptides representing essentially all canonical human gene products, and we exemplify the utility of these data in several applications. The resource (available at http://www.proteometools.org) will be extended to \u3e1 million peptides, and all data will be shared with the community via ProteomicsDB and ProteomeXchange
qcML: an exchange format for quality control metrics from mass spectrometry experiments.
Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to extract these from the instrumental raw data. What has been missing, however, is a standard data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based standard that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML standards from the HUPO-PSI (Proteomics Standards Initiative). In addition to the XML format, we also provide tools for the calculation of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent analysis possibilities. All information about qcML is available at http://code.google.com/p/qcml
Bestimmen proteolytischer Prozesse auf Basis von Massenspektrometrie- Zeitreihen
Proteolysis, the catalyzed hydrolysis of peptide bonds, is an important post-
translational modification, having a significant influence on the life cycle
of protein and peptides. It is involved in numerous biological processes, like
apoptosis, cell cycle progression, or blood coagulation. More then 500 genes
were annotated as proteases, the enzymes catalyzing proteolytic cleavage of
proteins and peptides, but many of them are still insufficiently
characterized. Hence a profound understanding of proteolytic processes is
essential for a detailed analysis of many biological processes. Furthermore
proteolysis is associated with multiple complex diseases like cancer and
Alzheimer’s disease and is known to be involved in the infection with the HI-
virus. Beyond its implication in biological processes, proteolysis can also be
utilized for diagnostic and treatment purposes. Proteases, the enzymes
catalyzing proteolytic cleavage, are established drug targets and their
potential as biomarkers has been postulated in 2006 by Villanueva et al. In
this thesis we present a novel approach to the characterization of proteolytic
processes using mass spectrometry data. We utilize the qualitative and
quantitative information of the mass spectra to construct a model, the
degradation graph, containing all involved peptides as well as the individual
proteolytic reactions that connect them. We further propose a transformation
of the degradation graph into a mathematical model that can be utilized in
combination with the mass spectrometry data to estimate the rate constants of
the individual reactions inside the degradation graph. Additionally we
developed a score that can be used to rate different degradation graphs with
respect to their ability to explain the observed mass spectrometry data. We
use this score to iteratively improve the structure of an initially
constructed degradation graph so as to account for errors during the
construction of the degradation graph. While more and more mass spectrometry
data is produced and is publicly available, there is a lack of well annotated,
so called gold standard or ground truth datasets. Those datasets are required
for a thorough benchmarking of novel algorithms and newly developed software.
This problem is increasing as the experimental setups and scientific questions
in computa- tional mass spectrometry get more and more complex. We therefore
present MSSimulator, a comprehensive simulator for mass spectrometry data.
Although using simulated data does not remove the need for testing on real
datasets, it eases algorithm benchmarking and development, due to the
availability of ground truth data which enables us to compare and validate the
results more effectively. MSSimulator is the currently most comprehensive
simulator for mass spectrometry data. It provides different types of
experimental setups (e.g. labeled and label-free setups), simulation of tandem
mass spectra, as well as numerous options to reflect different experimental
conditions like noise, chromatographic conditions, or instrument type. It
produces different levels of ground truth starting with the simulated raw
data, to feature and peak locations, and relational information (e.g. grouping
of charge states or labeled pairs). With the data generated by MSSimulator we
benchmarked different existing applications for the analysis of mass
spectrometry data as well as our own approach for the analysis of proteolytic
processes.Proteolyse, die Hydrolyse von Peptidbindungen, ist eine wichtige post-
translationale Modifikation, die maĂźgeblich den Lebenszyklus von Proteinen und
Peptiden beeinflusst. Sie ist in zahlreichen biologischen Prozessen, wie z.B.
der Regulation des Zellzyklus, der Apoptose oder der Blutgerinnung
regulatorisch aktiv. Mehr als 500 Gene im menschlichen Genom wurden als
Proteasen, Enzyme die den proteolytischen Verdau von Proteinen und Peptiden
katalysieren, annotiert. Trotzdem sind viele bis heute nur unzureichend
untersucht. Ein besseres Verständnis proteolytischer Prozesse, der komplexen
Kaskaden von interagierenden Proteasen, ist folglich eine grundlegende
Voraussetzung fĂĽr eine detaillierte Analyse biologischer Prozesse. Bei der
Entwicklung von komplexen Krankheiten wie Krebs und Alzheimer und der
Infektion mit dem HI-Virus spielt die Proteolyse ebenfalls eine bedeutende
Rolle und beeinflusst folglich sowohl deren Diagnose als auch die Behandlung.
Proteasen sind etablierte Zielproteine fĂĽr Arzneimittel. Ihr Potential als
Biomarker wurde 2006 von Villanueva et al. beschrieben. In dieser Arbeit
beschreiben wir einen neuen Ansatz zur Charakterisierung von proteolytischen
Prozessen. Wir präsentieren eine Methode, die unter Ausnutzung der
qualitativen und quantitativen Informationen in Massenspetrometriedaten, ein
Modell - den Degradation Graph - konstruiert. Dieses Modell enthält sowohl
alle involvierten Peptide als auch die proteolytischen Reaktionen, die diese
mit einander verbinden. Zusätzlich beschreiben wir eine Transformation des
degradation graphs in ein mathematisches Modell, welches zusammen mit den
Massenspektrometriedaten dazu verwendet werden kann die Reaktionskonstanten
der einzelnen proteolytischen Reaktionen zu schätzen. Darüber hinaus haben wir
ein Bewertungsschema fĂĽr den degradation graph entwickelt. Es dient dazu,
verschiedene degradation graphs miteinander, im Bezug auf ihrer Fähigkeit die
beobachteten Daten zu erklären, zu vergleichen. Dieses Bewertungsschema haben
wir dazu verwendet die anfänglich konstruierten degradation graphs
schrittweise zu verbessern um mögliche Fehler bei der Konstruktion
auszugleichen. In den letzten Jahren ist die Menge an öffentlich verfügbaren
Massenspetrometriedaten stetig angestiegen. Dennoch herrscht weiterhin ein
Mangel an gut annotierten Datensätzen, so genannter Goldstandards. Die
Goldstandards sind notwendig um neu entwickelte Programme und Algorithmen
intensiv testen und mit bestehenden Ansätzen vergleichen zu können. Die
zunehmende Komplexität der wissenschaftlichen Fragestellungen und
experimentellen Techniken vergrößert den Bedarf an Goldstandards zusätzlich.
Zur Lösung des Problems haben wir MSSimulator entwickelt, einen umfangreichen
Simulator fĂĽr Massenspetrometriedaten. Obwohl die Verwendung von simulierten
Daten die Notwendigkeit der Validierung auf realen Daten nicht obsolet macht,
so erleichtert es doch die Entwicklung und das Testen von neuen Methoden. Ein
Vergleich mit bereits existierenden Methodiken wird ebenfalls stark
vereinfacht. MSSimulator ermöglicht die Simulation von unterschiedlichen
experimentellen Ansätzen sowie die Simulation von Tandem-
Massenspektrometriedaten. Es bietet vielfältige Einstellmöglichkeiten um die
generierten Daten unter anderem im Hinblick auf Rauschen, chromatographischen
Bedingungen oder Auflösung, dem eigenen experimentellen Aufbau anzupassen.
MSSimulator erzeugt mehrere Ebenen des Goldstandards, angefangen bei den
simulierten Rohdaten ĂĽber die exakten Peptide- und Peakpositionen bis hin zu
Gruppierungsinformationen, z.B. unterschiedlicher Ladungsvarianten. Die
simulierten Daten nutzen wir in dieser Arbeit zum Vergleich verschiedener
existierender Applikationen, zur Analyse von Massenspektrometriedaten und zur
Entwicklung und Validierung unseres Ansatzes zur Analyse von proteolytischen
Prozessen
From the desktop to the grid: scalable bioinformatics via workflow conversion
Background
Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks.
There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free —an aspect that could potentially drive away members of the scientific community.
Results
We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources.
Conclusions
Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results