Search CORE

4 research outputs found

SIRIUS: decomposing isotope patterns for metabolite identification†

Author: Anton Pervukhin
Audi
Beck
Böcker
Böcker
Falkner
Hsu
Iijima
Kanehisa
Kellerer
Kind
Kind
Kubinyi
Martello
Matthias C. Letzel
Rockwood
Rockwood
Sebastian Böcker
Senior
Wilf
Yergey
Zhang
Zhang
Zhang
Zsuzsanna Lipták
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: High-resolution mass spectrometry (MS) is among the most widely used technologies in metabolomics. Metabolites participate in almost all cellular processes, but most metabolites still remain uncharacterized. Determination of the sum formula is a crucial step in the identification of an unknown metabolite, as it reduces its possible structures to a hopefully manageable set

Crossref

PubMed Central

Catalogo dei prodotti della ricerca

Publications at Bielefeld University

Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry

Author: A Makarov
AA Pontet
AJ Dempster
AL Rockwood
AM Richard
AW Jensen
B Seebass
BG Buchanan
C Djerassi
C Steinbeck
C Steinbeck
DA Laws
DL Olson
DL Wheeler
DR Scott
DS Wishart
F Csizmadia
H Budzikiewicz
HE Dayringer
J Braun
J Chen
J Lederberg
JC Lindon
JF Zhang
JJ Irwin
JK Senior
JL Faulon
JM Halket
JR De Laeter
L Sleno
M Badertscher
MD Soffer
ME Elyashberg
MP Balogh
N Huang
O Fiehn
O Fiehn
Oliver Fiehn
P Murray-Rust
QY Wu
RG Dromey
S Heuerding
S Noury
S Omura
SE Stein
SR Heller
SR Heller
T Fink
T Kind
T Morikawa
Tobias Kind
V Wray
W Windig
WD Ihlenfeldt
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas. RESULTS: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80–99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries. CONCLUSION: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65–81%. Corresponding software and supplemental data are available for downloads from the authors' website

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Probabilistic Modelling of Liquid Chromatography Time-of-Flight Mass Spectrometry

Author: Ipsen Andreas
Ipsen Andreas
Publication venue: Medicine: Department of Surgery and Cancer, Imperial College London
Publication date: 01/06/2011
Field of study

Liquid Chromatography Time-of-Flight Mass Spectrometry (LC-TOFMS) is an analytical platform that is widely used in the study of biological mixtures in the rapidly growing fields of proteomics and metabolomics. The development of statistical methods for the analysis of the very large data-sets that are typically produced in LC-TOFMS experiments is a very active area of research. However, the theoretical basis on which these methods are built is currently rather thin and as a result, inferences regarding the samples analysed are generally drawn in a somewhat qualitative fashion. This thesis concerns the development of a statistical formalism that can be used to describe and analyse the data produced in an LC-TOFMS experiment. This is done through the derivation of a number of probability distributions, each corresponding to a different level of approximation of the distribution of the empirically obtained data. Using such probabilistic models, statistically rigorous methods are developed and validated which are designed to address some of the central problems encountered in the practical analysis of LC-TOFMS data, most notably those related to the identification of unknown metabolites. Unlike most existing bioinformatics techniques, this work aims for rigour rather than generality. Consequently the methods developed are closely tailored to a particular type of TOF mass spectrometer, although they do carry over to other TOF instruments, albeit with important restrictions. And while the algorithms presented may constitute useful analytical tools for the mass spectrometers to which they can be applied, the broader implications of the general methodological approach that is taken are also of central importance. In particular, it is arguable that the main value of this work lies in its role as a proof-of-concept that detailed probabilistic modelling of TOFMS data is possible and can be used in practice to address important data analytical problems in a statistically rigorous manner

Spiral - Imperial College Digital Repository

Bioinformatics solutions for confident identification and targeted quantification of proteins using tandem mass spectrometry

Author: Bessant Conrad
Cham Jennifer A
Regan Stephen
Publication venue
Publication date: 01/01/2009
Field of study

Proteins are the structural supports, signal messengers and molecular workhorses that underpin living processes in every cell. Understanding when and where proteins are expressed, and their structure and functions, is the realm of proteomics. Mass spectrometry (MS) is a powerful method for identifying and quantifying proteins, however, very large datasets are produced, so researchers rely on computational approaches to transform raw data into protein information. This project develops new bioinformatics solutions to support the next generation of proteomic MS research. Part I introduces the state of the art in proteomic bioinformatics in industry and academia. The business history and funding mechanisms are examined to fill a notable gap in management research literature, and to explain events at the sponsor, GlaxoSmithKline. It reveals that public funding of proteomic science has yet to come to fruition and exclusively high-tech niche bioinformatics businesses can succeed in the current climate. Next, a comprehensive review of repositories for proteomic MS is performed, to locate and compile a summary of sources of datasets for research activities in this project, and as a novel summary for the community. Part II addresses the issue of false positive protein identifications produced by automated analysis with a proteomics pipeline. The work shows that by selecting a suitable decoy database design, a statistically significant improvement in identification accuracy can be made. Part III describes development of computational resources for selecting multiple reaction monitoring (MRM) assays for quantifying proteins using MS. A tool for transition design, MRMaid (pronounced „mermaid‟), and database of pre-published transitions, MRMaid-DB, are developed, saving practitioners time and leveraging existing resources for superior transition selection. By improving the quality of identifications, and providing support for quantitative approaches, this project brings the field a small step closer to achieving the goal of systems biology.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository