Search CORE

31 research outputs found

Software platforms for quantitative proteomics

Author: Schulz-Trieglaff Ole
Publication venue: Dagstuhl Seminar Proceedings. 05471 - Computational Proteomics
Publication date: 01/01/2006
Field of study

In recent years, it has become obvious that mRNA expression does not always correlate with protein expression. It seems that a full understanding of the complexity of life can only be obtained by examining abundances of proteins under varying conditions. Accurate measurements of these expression values is crucial. This field of research also requires new computational efforts since the data, often from mass spectrometry experiments, is very complex. We present two academic software platforms that offer means to reduce, analyse and compare protein expression data gained from liquid chromatography coupled with mass spectrometry. We outline their methodology and compare them to our own project, OpenMS, which is currently developed in our research group at the Free University Berlin in collaboration with the Kohlbacher group at Tuebingen University

Dagstuhl Research Online Publication Server

OpenMS - A Framework for Quantitative HPLC/MS-Based Proteomics

Author: Kohlbacher Oliver
Lange Eva
Pfeifer Nico
Reinert Knut
Schulz-Trieglaff Ole
Sturm Marc
Publication venue: Dagstuhl Seminar Proceedings. 05471 - Computational Proteomics
Publication date: 01/01/2006
Field of study

In the talk we describe the freely available software library OpenMS which is currently under development at the Freie UniversitÃ¤t Berlin and the Eberhardt-Karls UniversitÃ¤t TÃ¼bingen. We give an overview of the goals and problems in differential proteomics with HPLC and then describe in detail the implemented approaches for signal processing, peak detection and data reduction currently employed in OpenMS. After this we describe methods to identify the differential expression of peptides and propose strategies to avoid MS/MS identification of peptides of interest. We give an overview of the capabilities and design principles of OpenMS and demonstrate its ease of use. Finally we describe projects in which OpenMS will be or was already deployed and thereby demonstrate its versatility

Dagstuhl Research Online Publication Server

Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments

Author: A Fraser
A Prakash
AI Nesvizhskii
BM Mayr
C Croux
CS Brown
DA Stead
E Machtejevas
Egidijus Machtejevas
F Model
GV Cohen Freue
Hartmut Schlüter
J Harezlak
J Listgarten
Joachim Thiemann
K Choo
K Flikka
K Pearson
KC Leptos
Klaus Unger
Knut Reinert
KR Coombes
M Bern
M Mann
M Sturm
M Xu
O Hössjer
O Kohlbacher
O Schulz-Trieglaff
O Schulz-Trieglaff
Ole Schulz-Trieglaff
P Mahalanobis
RE Moore
S Cappadona
S Na
T Whistler
W Windig
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important. Results We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis. Conclusion We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.</p

Crossref

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

OpenMS – An open-source software framework for mass spectrometry

Author: A Keller
A Savitzky
Alexandra Zerck
Andreas Bertsch
Andreas Hildebrandt
BM Mayr
C Gröpl
CA Smith
CC Chang
Clemens Gröpl
D Ballard
DM Horn
DN Perkins
E Lange
E Lange
EA Kapp
Eva Lange
G Stockman
J Hartler
JB Breen
K Reinert
KC Leptos
Knut Reinert
LNN Mueller
LY Geer
M Bellew
M Katajamaa
Marc Sturm
ME Monroe
N Pfeifer
Nico Pfeifer
O Kohlbacher
O Schulz-Trieglaff
Ole Schulz-Trieglaff
Oliver Kohlbacher
P Soille
PGA Pedrioli
R Hussong
Rene Hussong
RG Sadygov
S Orchard
S Tanner
VB Di Marco
W Press
XJ Li
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. Results We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies. Conclusion OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at <url>http://www.openms.de</url>.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Gutenberg Open

LC-MSsim – a simulation software for liquid chromatography mass spectrometry data

Abstract Background Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. Results We present <it>LC-MSsim</it>, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, <it>LC-MSsim </it>writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. Conclusion <it>LC-MSsim </it>generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that <it>LC-MSsim </it>will be useful to the wider community to perform benchmark studies and comparisons between computational tools.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Computational pan-genomics: status, promises and challenges

Author: Abeel Thomas
Alkan Can
Baaijens Jasmijn
Bakker Paul
Boeva Valentina
Bonnal Raoul
Chiaromonte Francesca
Chikhi Rayan
Ciccarelli Francesca
Cijvat Robin
Datema Erwin
Dijkstra Louis
Duijn Cornelia
Dutilh Bas
Eichler Evan
El-Kebir Mohammed
Ernst Corinna
Eskin Eleazar
Garrison Erik
Ghaffaari Ali
Guryev Victor
Kersey Paul
Klau Gunnar
Kloosterman Wigard
Korbel Jan
Lameijer Eric-Wubbo
Langmead Benjamin
Marschall Tobias
Martin Marcel
Marz Manja
Medvedev Paul
Mu John
Mäkinen Veli
Neerincx Pieter
Novak Adam
Ouwens Klaasjan
Paten Benedict
Peterlongo Pierre
Pisanti Nadia
Porubsky David
Rahmann Sven
Raphael Benjamin
Reinert Knut
Ridder Dick
Ridder Jeroen
Rivals Eric
Sanders Ashley
Schlesner Matthias
Schulz-Trieglaff Ole
Schönhuth Alexander
Sheikhizadeh Siavash
Shneider Carl
Smit Sandra
The Computational Pan-Genomics Consortium
Valenzuela Daniel
Vandin Fabio
Wang Jiayin
Wessels Lodewyk
Ye Kai
Zhang Ying
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

EUR Research Repository

HAL-MINES ParisTech

Archivio della ricerca della Scuola Superiore Sant'Anna

Radboud Repository

HAL-Rennes 1

Computational pan-genomics: Status, promises and challenges

Author: Abeel T. (Thomas)
Alkan C. (Can)
Baaijens J.A. (Jasmijn)
Bakker P.I.W. (Paul) de
Boeva V. (Valentina)
Bonnal R.J.P. (Raoul)
Chiaromonte F. (Francesca)
Chikhi R. (Rayan)
Ciccarelli F.D. (Francesca)
Cijvat C.P. (Robin)
Datema E. (Erwin)
Dijkstra L.J. (Louis)
Duijn C.M. (Cornelia) van
Dutilh B.E. (Bas)
Eichler E.E. (Evan)
El-Kebir M. (Mohammed)
Ernst C. (Corinna)
Eskin E. (Eleazar)
Garrison E. (Erik)
Ghaffaari A. (Ali)
Guryev V. (Victor)
Kersey P. (Paul)
Klau G.W. (Gunnar)
Kloosterman W.P. (Wigard)
Korbel J.O. (Jan)
Lameijer E.-W. (Eric-Wubbo)
Langmead B. (Benjamin)
Marschall T. (Tobias)
Martin M. (Marcel)
Marz M. (Manja)
Medvedev P. (Paul)
Mu J.C. (John)
Mäkinen V. (Veli)
Neerincx P.B.T. (Pieter)
Novak A.M. (Adam)
Ouwens K. (Klaasjan)
Paten B. (Benedict)
Peterlongo P. (Pierre)
Pisanti N. (Nadia)
Porubsky D. (David)
Rahmann S. (Sven)
Raphael B.J. (Benjamin)
Reinert K. (Knut)
Ridder D. (Dick) de
Ridder J. (Jeroen) de
Rivals E. (Eric)
Sanders A.D. (Ashley)
Schlesner M. (Matthias)
Schulz-Trieglaff O. (Ole)
Schönhuth A. (Alexander)
Sheikhizadeh S. (Siavash)
Shneider C. (Carl)
Smit S. (Sandra)
The Computational Pan-Genomics Consortium
Valenzuela D. (Daniel)
Vandin F. (Fabio)
Wang J. (Jiayin)
Wessels L.F.A. (Lodewyk)
Ye K. (Kai)
Zhang Y. (Ying)
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations

CWI's Institutional Repository

Erasmus University Digital Repository

Computational methods for Quantitative Peptide Mass Spectrometry

Author: Schulz-Trieglaff Ole
Publication venue
Publication date: 05/08/2009
Field of study

This thesis presents algorithms for the analysis of liquid chromatography-mass spectrometry (LC-MS) data. Mass spectrometry is a technology that can be used to determine the identities and abundances of the compounds in complex samples. In combination with liquid chromatography, it has become a popular method in the field of proteomics, the large-scale study of proteins and peptides in living systems. This area of research has gained a lot of interest in recent years since proteins control fundamental reactions in the cell. Consequently, a deeper knowledge of their function is expected to be crucial for the development of new drugs and the cure of diseases. The data sets obtained from an LC-MS experiment are large and highly complex. The outcome of such an experiment is called an LC-MS map. The map is a collection of mass spectra. They contain, among the signals of interest, a high amount of noise and other disturbances. That is why algorithms for the low-level processing of LC-MS data are becoming increasingly important. These algorithms are the focus of this text. Our novel contributions are threefold: first, we introduce SweepWavelet, an algorithm for the efficient detection and quantification of peptides from LC-MS data. The quantification of proteins and peptides using mass spectrometry is of high interest for biomedical research but also for the pharmaceutical industry since it is usually among the first steps in an LC-MS data analysis pipeline and all subsequent steps depend on its quality. Our approach was among the first to address this problem in a sound computational framework. It consists of three steps: first, we apply a tailored wavelet function that filters mass spectra for the isotope peaks of peptides. Second, we use a method inspired by the sweep-line paradigm which makes use of the redundant information in LC-MS data to determine mass, charge, retention time and abundance of all peptides. Finally, we apply a flexible peptide signal model to filter the extracted signals for false positives. The second part of this thesis deals with the benchmarking of LC-MS signal detection algorithms. This is a non-trivial task since it is difficult to establish a ground truth using real world samples: which sample compounds become visible in an LC-MS data set is not known in advance. To this end, we use annotated data and simulations to assess the performance of currently available algorithms. To simulate benchmark data, we developed a simulation software called LC-MSsim. It incorporates computational models for retention time prediction, peptide detectability, isotope pattern and elution peaks. Using this software, we can simulate all steps in an LC-MS experiment and obtain a list with the positions, charges and abundances of all peptide signals contained in the resulting LC-MS map. This gives us a ground truth against which we can match the results of a signal detection algorithm. In this thesis, we use it for the benchmarking of quantification algorithms but its scope is wider and it can also be used to evaluate other algorithms. To our knowledge, LC-MSsim is the first software that can simulate the full LC-MS data acquisition process. The third contribution of this thesis is a statistical framework for the quality assessment of quantitative LC-MS experiments. Whereas quality assessment and control are already widespread in the field of gene expression analysis, our work is the first to address this problem for LCMS data. We use methods from robust statistics to detect outlier LC-MS maps in large-scale quantitative experiments. Our approach introduces the notion of quality descriptors to derive an abstract representation of an LC-MS map and applies a robust principal component analysis based on projection pursuit. We show that it is sensible to use robust statistics for this problem and evaluate our method on simulated maps and on data from three real-world LC-MS studies

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Rechnergestützte Methoden für die Quantitative Massenspektrometrie von Peptiden

Author: Schulz-Trieglaff Ole
Publication venue
Publication date: 01/01/2009
Field of study

Das Thema dieser Arbeit sind Algorithmen für die Analyse vom Füssigchromatographie-Massenspektrometrie (LC-MS) Daten. Das Ergebnis eines LC-MS experiment wird LC-MS Map genannt. Die Map ist eine Gruppe von Massenspektren. Mit Hilfe der Massenspektrometrie lassen sich komplexe biologische Proben auf ihre Zusammensetzung untersuchen. In Kombination mit Fl¨ussigchromatographie ist sie zu einem wichtigen Werkzeug in der Proteomik geworden. Die Proteomik umfasst die Erforschung des Proteoms, das heißt der Gesamtheit aller in einer Probe vorhandenen Proteine und Peptide. Proteomik als wissenschaftliche Disziplin ist den letzten Jahren sehr populär geworden, da Proteine essentielle Reaktionen in der Zelle steuern und als wichtige Angriffspunkte für die Diagnose und Heilung von Krankheiten gelten. Diese Arbeit enthält drei neue wissenschaftliche Beiträge. Der erste ist SweepWavelet, ein Algorithmus zur Quantifizierung von Peptiden aus LC-MS Daten. Die akkurate Quantifizierung von Peptiden und Proteinen ist ein wichtiges Thema in der biomedizinischen Forschung, da sie der erste Schritt in der rechnergestützten Analyse von LC-MS Daten ist. Alle weiteren Schritte hängen von einer präzisen und zuverlässigen Quantifizierung ab. Im Gegensatz zu bestehenden Verfahren ist unser Algorithmus flexibel, schnell und kann leicht an Datensätze von verschiedenen LC-MS Instrumenten angepasst werden. Unser Algorithmus besteht aus drei Schritten: wir verwenden eine Wavelet Funktion um Peptidsignale aus den LC-MS Daten herauszufiltern und Hintergrundrauschen zu unterdr¨ucken. Danach benutzen wir die sweep-line Methode aus der algorithmischen Geometrie um effizient die Position der Peptidsignale im LC-MS Datensatz zu bestimmen und ihre Abundanz zu schätzen. Im dritten Teil des Algorithmus verwenden wir ein flexibles Modell von LC-MS Peptidsignalen um falsch positive Signale zu entfernen. Der zweite Teil dieser Arbeit widmet sich dem Vergleich von Algorithmen zur Peptidsignalerkennung und -quantifizierung. Dies ist ein schwieriges Unterfangen, da man in echten LC-MS Experimenten im Voraus nicht mit Sicherheit bestimmen kann, welche Substanzen in der LC-MS Map als Signale auftreten und welche nicht. Deshalb sind die Resultate von Algorithmen oft schwer zu beurteilen. Wir führen Vergleiche auf echten und simulierten Daten durch. Zu diesem Zweck haben wir eine Simulationssoftware f¨ur LC-MS Experimente entwickelt. Diese Software, LC- MSsim, simuliert alle Teilschritte eines LC-MS Experiments, u.a. die Vorhersage von Retentionszeiten, Elutionsprofile und Hintergrundrauschen in den Spektren. Das Ergebnis einer Simulation ist ein künstlicher LC-MS Datensatz mit einer Liste der Positionen, Ladungen und Intensitäten aller Peptidsignale. Wir verwenden den Simulator um verschiedene Algorithmen zur Peptidquantifizierung zu vergleichen. Die Software ist unter einer Open Source Lizenz frei verfügbar. LC-MSsim ist die erste frei verfügbare Software, welche vollständige LC-MS Datensätze inklusive die wichtigsten experimentellen Schritte simulieren kann. Der dritte Beitrag dieser Arbeit ist eine neue statistische Methode zur Erkennung von Ausreißern bzw. Datensätzen schlechter Qualität in LC-MS Studien. Diese Methode basiert auf einer projection pursuit Version der Hauptkomponentenanalyse. Der Vorteil des projection pursuit Ansatzes ist seine Robustheit gegenüber Ausreißern. In anderen wissenschaftlichen Gebieten, wie z.B. der Genexpressionsanalyse, sind Methoden zur Qualitätskontrolle weit verbreitet. Unsere Methode gehört jedoch zu den ersten die sich der Qualitätskontrolle in LC-MS gestützten Studien widmet. Gerade in Hochdurchsatzexperimenten ist es äußerst wichtig, schlechte Messungen schnell entfernen zu können, um aussagekräftige Ergebnisse zu erhalten. Wir evaluieren unsere Methode auf simulierten und echten Daten und zeigen, dass wir Ausreißer schnell und präzise identifizieren können.This thesis presents algorithms for the analysis of liquid chromatography-mass spectrometry (LC-MS) data. Mass spectrometry is a technology that can be used to determine the identities and abundances of the compounds in complex samples. In combination with liquid chromatography, it has become a popular method in the field of proteomics, the large-scale study of proteins and peptides in living systems. This area of research has gained a lot of interest in recent years since proteins control fundamental reactions in the cell. Consequently, a deeper knowledge of their function is expected to be crucial for the development of new drugs and the cure of diseases. The data sets obtained from an LC-MS experiment are large and highly complex. The outcome of such an experiment is called an LC-MS map. The map is a collection of mass spectra. They contain, among the signals of interest, a high amount of noise and other disturbances. That is why algorithms for the low-level processing of LC-MS data are becoming increasingly important. These algorithms are the focus of this text. Our novel contributions are threefold: first, we introduce SweepWavelet, an algorithm for the efficient detection and quantification of peptides from LC-MS data. The quantification of proteins and peptides using mass spectrometry is of high interest for biomedical research but also for the pharmaceutical industry since it is usually among the first steps in an LC-MS data analysis pipeline and all subsequent steps depend on its quality. Our approach was among the first to address this problem in a sound computational framework. It consists of three steps: first, we apply a tailored wavelet function that filters mass spectra for the isotope peaks of peptides. Second, we use a method inspired by the sweep-line paradigm which makes use of the redundant information in LC-MS data to determine mass, charge, retention time and abundance of all peptides. Finally, we apply a flexible peptide signal model to filter the extracted signals for false positives. The second part of this thesis deals with the benchmarking of LC-MS signal detection algorithms. This is a non-trivial task since it is difficult to establish a ground truth using real world samples: which sample compounds become visible in an LC-MS data set is not known in advance. To this end, we use annotated data and simulations to assess the performance of currently available algorithms. To simulate benchmark data, we developed a simulation software called LC-MSsim. It incorporates computational models for retention time prediction, peptide detectability, isotope pattern and elution peaks. Using this software, we can simulate all steps in an LC-MS experiment and obtain a list with the positions, charges and abundances of all peptide signals contained in the resulting LC-MS map. This gives us a ground truth against which we can match the results of a signal detection algorithm. In this thesis, we use it for the benchmarking of quantification algorithms but its scope is wider and it can also be used to evaluate other algorithms. To our knowledge, LC-MSsim is the first software that can simulate the full LC-MS data acquisition process. The third contribution of this thesis is a statistical framework for the quality assessment of quantitative LC-MS experiments. Whereas quality assessment and control are already widespread in the field of gene expression analysis, our work is the first to address this problem for LCMS data. We use methods from robust statistics to detect outlier LC-MS maps in large-scale quantitative experiments. Our approach introduces the notion of quality descriptors to derive an abstract representation of an LC-MS map and applies a robust principal component analysis based on projection pursuit. We show that it is sensible to use robust statistics for this problem and evaluate our method on simulated maps and on data from three real-world LC-MS studies

Institutional Repository of the Freie Universität Berlin