Search CORE

27 research outputs found

LC-MSsim – a simulation software for liquid chromatography mass spectrometry data

Abstract Background Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. Results We present <it>LC-MSsim</it>, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, <it>LC-MSsim </it>writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. Conclusion <it>LC-MSsim </it>generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that <it>LC-MSsim </it>will be useful to the wider community to perform benchmark studies and comparisons between computational tools.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

MSSimulator: Simulation of Mass Spectrometry Data

Author: Aiche Stephan
Andreotti Sandro
Bielow Chris
Reinert Knut
Publication venue
Publication date: 01/01/2011
Field of study

Crossref

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Open Access Repository

An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree

Author: Aebersold
Andrea Pietracaprina
Barbara Di Camillo
Deutsch
Francesco Silvestri
Francesco Tisiot
Gianna Maria Toffolo
Guttman
Hartler
Khan
Kyriacos
Lin
Martens
Orchard
Sara Nasso
Schulz-Trieglaff
Taylor
Vitter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput profile data. LC-MS datasets are usually accessed through 2D range queries. Optimizing this type of operation could dramatically reduce the complexity of data analysis. We propose a novel data structure for LC-MS datasets, called mzRTree, which embodies a scalable index based on the R-tree data structure. mzRTree can be efficiently created from the XML-based data formats and it is suitable for handling very large datasets. We experimentally show that, on all range queries, mzRTree outperforms other known structures used for LC-MS data, even on those queries these structures are optimized for. Besides, mzRTree is also more space efficient. As a result, mzRTree reduces data analysis computational costs for very large profile datasets.Comment: Paper details: 10 pages, 7 figures, 2 tables. To be published in Journal of Proteomics. Source code available at http://www.dei.unipd.it/mzrtre

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments

Author: A Fraser
A Prakash
AI Nesvizhskii
BM Mayr
C Croux
CS Brown
DA Stead
E Machtejevas
Egidijus Machtejevas
F Model
GV Cohen Freue
Hartmut Schlüter
J Harezlak
J Listgarten
Joachim Thiemann
K Choo
K Flikka
K Pearson
KC Leptos
Klaus Unger
Knut Reinert
KR Coombes
M Bern
M Mann
M Sturm
M Xu
O Hössjer
O Kohlbacher
O Schulz-Trieglaff
O Schulz-Trieglaff
Ole Schulz-Trieglaff
P Mahalanobis
RE Moore
S Cappadona
S Na
T Whistler
W Windig
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important. Results We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis. Conclusion We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.</p

Crossref

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Inferring Proteolytic Processes from Mass Spectrometry Time Series Data

Author: Aiche S.
Publication venue
Publication date: 30/09/2013
Field of study

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Quantification and Simulation of Liquid Chromatography-Mass Spectrometry Data

Author: Bielow C.
Publication venue
Publication date: 29/10/2012
Field of study

Computational mass spectrometry is a fast evolving field that has attracted increased attention over the last couple of years. The performance of software solutions determines the success of analysis to a great extent. New algorithms are required to reflect new experimental procedures and deal with new instrument generations. One essential component of algorithm development is the validation (as well as comparison) of software on a broad range of data sets. This requires a gold standard (or so-called ground truth), which is usually obtained by manual annotation of a real data set. Comprehensive manually annotated public data sets for mass spectrometry data are labor-intensive to produce and their quality strongly depends on the skill of the human expert. Some parts of the data may even be impossible to annotate due to high levels of noise or other ambiguities. Furthermore, manually annotated data is usually not available for all steps in a typical computational analysis pipeline. We thus developed the most comprehensive simulation software to date, which allows to generate multiple levels of ground truth and features a plethora of settings to reflect experimental conditions and instrument settings. The simulator is used to generate several distinct types of data. The data are subsequently employed to evaluate existing algorithms. Additionally, we employ simulation to determine the influence of instrument attributes and sample complexity on the ability of algorithms to recover information. The results give valuable hints on how to optimize experimental setups. Furthermore, this thesis introduces two quantitative approaches, namely a decharging algorithm based on integer linear programming and a new workflow for identification of differentially expressed proteins for a large in vitro study on toxic compounds. Decharging infers the uncharged mass of a peptide (or protein) by clustering all its charge variants. The latter occur frequently under certain experimental conditions. We employ simulation to show that decharging is robust against missing values even for high complexity data and that the algorithm outperforms other solutions in terms of mass accuracy and run time on real data. The last part of this thesis deals with a new state-of-the-art workflow for protein quantification based on isobaric tags for relative and absolute quantitation (iTRAQ). We devise a new approach to isotope correction, propose an experimental design, introduce new metrics of iTRAQ data quality, and confirm putative properties of iTRAQ data using a novel approach. All tools developed as part of this thesis are implemented in OpenMS, a C++ library for computational mass spectrometry

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

In silico optimization of mass spectrometry fragmentation strategies in metabolomics

Author: Daly Ronan
Davies Vinny
Rogers Simon
van der Hooft Justin J.J.
Wandy Joe
Weidt Stefan
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) is widely used in identifying small molecules in untargeted metabolomics. Various strategies exist to acquire MS/MS fragmentation spectra; however, the development of new acquisition strategies is hampered by the lack of simulators that let researchers prototype, compare, and optimize strategies before validations on real machines. We introduce Virtual Metabolomics Mass Spectrometer (ViMMS), a metabolomics LC-MS/MS simulator framework that allows for scan-level control of the MS2 acquisition process in silico. ViMMS can generate new LC-MS/MS data based on empirical data or virtually re-run a previous LC-MS/MS analysis using pre-existing data to allow the testing of different fragmentation strategies. To demonstrate its utility, we show how ViMMS can be used to optimize N for Top-N data-dependent acquisition (DDA) acquisition, giving results comparable to modifying N on the mass spectrometer. We expect that ViMMS will save method development time by allowing for offline evaluation of novel fragmentation strategies and optimization of the fragmentation strategy for a particular experiment

Enlighten

Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist's point of view

Author: A Frank
A Michalski
AB Noyce
Andrew D Mathis
B Domon
B Fischer
BM Hemminger
C Bielow
CA Smith
CF Taylor
Dan Ventura
E Fahy
E Lange
EW Kraegen
H Liu
H Mischak
HC Köfeler
J Listgarten
J Samuelsson
JB German
JD Egertson
JE Elias
JK Eng
John T Prince
JW Wong
K Biemann
K Podwojski
K Schmelzer
KK Murray
L Feng
LN Mueller
LN Mueller
M Dakna
M Morris
M Sugimoto
MR Wenk
MY Brusniak
N Jeffries
O Fiehn
O Schulz-Trieglaff
PL Whetzel
R Smith
R Smith
R Smith
RB Cole
RJ Arnold
Rob Smith
S Cappadona
TM Annesley
VI Babushok
W Wang
WE Wolski
WJ Griffiths
X Han
XJ Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Ultra-Fast Metabolite Prediction Algorithm

Author: A Duran
A Lommen
A Norbeck
A Robinson
B Efron
B Fischer
B Voss
C Sedgewick
C Smith
CD Broeckling
D De Souza
E Lange
E von Roepenack-Lahaye
F Matthäus
J de Groot
J Wong
Jérémie Bourdon
K Johnson
K Saito
K Saito
KM Oksman-Caldentey
L Wu
M Chae
M Katajamaa
M Robinson
M Sturm
MR Garey
Murray Grant
N Hoffmann
O Fiehn
O Schulz-Trieglaff
P Baldi
PJ DiMaggio
Q Ma
R Baran
R Biedendieck
R Powers
R Tibshirani
S Skiena
S Westergaard
T Conrads
T Okada
V Mapelli
V Perera
V Tusher
Zheng Rong Yang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Small molecules are central to all biological processes and metabolomics becoming an increasingly important discovery tool. Robust, accurate and efficient experimental approaches are critical to supporting and validating predictions from post-genomic studies. To accurately predict metabolic changes and dynamics, experimental design requires multiple biological replicates and usually multiple treatments. Mass spectra from each run are processed and metabolite features are extracted. Because of machine resolution and variation in replicates, one metabolite may have different implementations (values) of retention time and mass in different spectra. A major impediment to effectively utilizing untargeted metabolomics data is ensuring accurate spectral alignment, enabling precise recognition of features (metabolites) across spectra. Existing alignment algorithms use either a global merge strategy or a local merge strategy. The former delivers an accurate alignment, but lacks efficiency. The latter is fast, but often inaccurate. Here we document a new algorithm employing a technique known as quicksort. The results on both simulated data and real data show that this algorithm provides a dramatic increase in alignment speed and also improves alignment accuracy

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Open Research Exeter

FigShare