166 research outputs found

    Las Redes Transeuropeas en el presupuesto de la Unión

    Get PDF
    PANEL IX: LA FINANCIACIÓN DE LAS REDES TRANSEUROPEA

    Computational methods for small molecule identification

    Get PDF
    Identification of small molecules remains a central question in analytical chemistry, in particular for natural product research, metabolomics, environmental research, and biomarker discovery. Mass spectrometry is the predominant technique for high-throughput analysis of small molecules. But it reveals only information about the mass of molecules and, by using tandem mass spectrometry, about the mass of molecular fragments. Automated interpretation of mass spectra is often limited to searching in spectral libraries, such that we can only dereplicate molecules for which we have already recorded reference mass spectra. In this thesis we present methods for answering two central questions: What is the molecular formula of the measured ion and what is its molecular structure? SIRIUS is a combinatorial optimization method for annotating a spectrum and identifying the ion's molecular formula by computing hypothetical fragmentation trees. We present a new scoring for computing fragmentation trees, transforming the combinatorial optimization into a maximum a posteriori estimator. This allows us to learn parameters and hyperparameters of the scoring directly from data. We demonstrate that the statistical model, which was fitted on a small dataset, generalises well across many different datasets and mass spectrometry instruments. In addition to tandem mass spectra, isotope pattern can be used for identifying the molecular formula of the precursor ion. We present a novel scoring for comparing isotope patterns based on maximum likelihood. We describe how to integrate the isotope pattern analysis into the fragmentation tree optimisation problem to analyse data were fragment peaks and isotope peaks occur within the same spectrum. We demonstrate that the new scorings significantly improves on the task of molecular formula assignment. We evaluate SIRIUS on several datasets and show that it outperforms all other methods for molecular formula annotation by a large margin. We also present CSI:FingerID, a method for predicting a molecular fingerprint from a tandem mass spectrum using kernel support vector machines. The predicted fingerprint can be searched in a structure database to identify the molecular structure. CSI:FingerID is based on FingerID, that uses probability product kernels on mass spectra for this task. We describe several novel kernels for comparing fragmentation trees instead of spectra. These kernels are combined using multiple kernel learning. We present a new scoring based on posterior probabilities and extend the method to use additional molecular fingerprints. We demonstrate on several datasets that CSI:FingerID identifies more molecules than its predecessor FingerID and outperforms all other methods for this task. We analyse how each of the methodological improvements of CSI:FingerID contributes to its identification performance and make suggestions for future improvements of the method. Both methods, SIRIUS and CSI:FingerID, are available as commandline tool and as user interface. The molecular fingerprint prediction is implemented as web service and receives over one million requests per month.Die Identifizierung kleiner Moleküle ist eine zentrale Fragestellung der analytischen Chemie, insbesondere in der Naturwirkstoffforschung, der Metabolomik, der Ökologie und Umweltforschung sowie in der Entwicklung neuer Diagnoseverfahren mittels Biomarker. Massenspektrometrie ist die vorherrschende Technik für Hochdurchsatzanalysen kleiner Moleküle. Aber sie liefert nur Informationen über die Masse der gemessenen Moleküle und, mittels Tandem-Massenspektrometrie, über die Massen der gemessenen Fragmente. Die automatisierte Auswertung von Massenspektren beschränkt sich oft auf die Suche in Spektrendatenbanken, so dass nur Moleküle derepliziert werden können, die bereits in einer solchen Datenbank gemessen wurden. In dieser Dissertation präsentieren wir zwei Methoden zur Beantwortung zweier zentraler Fragen: Was ist die Molekülformel eines gemessenen Ions? Und was ist seine Molekülstruktur? SIRIUS ist eine Methode der kombinatorischen Optimierung für die Annotation von Massenspektren und der Identifikation der Molekülformel. Dazu berechnet sie hypothetische Fragmentierungsbäume. Wir stellen ein neues Scoring Modell für die Berechnung von Fragmentierungsbäumen vor, welches die kombinatorische Optimierung als einen Maximum-a-posteriori-Schätzer auffasst. Dieses Modell ermöglicht es uns, Parameter und Hyperparameter des Scorings direkt aus den Daten abzuschätzen. Wir zeigen, dass dieses statistische Modell, dessen (Hyper)Parameter auf einem kleinen Datensatz geschätzt wurden, allgemeingültig für viele Datensätze und sogar für verschiedene Massenspektrometriegeräte ist. Neben Tandem-Massenspektren lassen sich auch Isotopenmuster zur Molekülformelidentifizierung des Ions verwenden. Wir stellen ein neuartiges Scoring für den Vergleich von Isotopenmustern vor, welches auf Maximum Likelihood basiert. Wir beschreiben, wie die Isotopenmusteranalyse in das Optimierungsproblem für Fragmentierungsbäume integriert werden kann, so dass sich auch Daten analysieren lassen, in denen Fragmente und Isotopenmuster im selben Massenspektrum gemessen werden. Wir zeigen, dass das neue Scoring die korrekte Zuweisung der Molekülformeln signifikant verbessert. Wir evaluieren SIRIUS auf einer Vielzahl von Datensätzen und zeigen, dass die Methode deutlich besser funktioniert als alle anderen Methoden für die Identifikation von Molekülformeln. Wir stellen außerdem CSI:FingerID vor, eine Methode, die Kernel Support Vector Maschinen zur Vorhersage von molekularen Fingerabdrücken aus Tandem-Massenspektren nutzt. Vorhergesagte molekulare Fingerabdrücke können in Strukturdatenbanken gesucht werden, um die genaue Molekülstruktur aufzuklären. CSI:FingerID basiert auf FingerID, welches Wahrscheinlichkeitsprodukt-Kernels für diese Aufgabe benutzt. Wir beschreiben etliche neue Kernels, zum Vergleich von Fragmentierungsbäumen anstelle von Massenspektren. Diese Kernels werden mittels Multiple Kernel Learning zu einem Kernel kombiniert. Wir stellen ein neues Scoring vor, welches auf A-posteriori-Wahrscheinlichkeiten basiert. Außerdem erweitern wir die Methode, so dass sie zusätzliche molekulare Fingerabdrücke verwendet. Wir zeigen auf verschiedenen Testdatensätzen, dass CSI:FingerID mehr Molekülstrukturen identifizieren kann als der Vorgänger FingerID, und damit auch alle anderen Methoden für diese Anwendung übertrifft. Wir werten aus, wie die verschiedenen methodischen Erweiterung zur Identifikationsrate von CSI:FingerID beitragen und machen Vorschläge für künftige Verbesserungen der Methode. Beide Methoden, SIRIUS und CSI:FingerID, sind als Kommandozeilenprogramm und als Benutzeroberfläche verfügbar. Die Vorhersage molekularer Fingerabdrücke ist als Webservice implementiert, der über eine Millionen Anfragen pro Monat erhält

    Normal, Abby Normal, Prefix Normal

    Full text link
    A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number pnw(n)pnw(n) of prefix normal words of length nn, showing that pnw(n)=Ω(2ncnlnn)pnw(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right) for some cc and pnw(n)=O(2n(lnn)2n)pnw(n) = O \left(\frac{2^n (\ln n)^2}{n}\right). We introduce efficient algorithms for testing the prefix normal property and a "mechanical algorithm" for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes to stay normal but Bob wants to drive her "abnormal" -- we discuss which parameter settings allow Alice to succeed.Comment: Accepted at FUN '1

    Fast alignment of fragmentation trees

    Get PDF
    Motivation: Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of ‘unknown’ small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns

    Towards a comprehensive characterisation of the human internal chemical exposome: Challenges and perspectives

    Get PDF
    The holistic characterisation of the human internal chemical exposome using high-resolution mass spectrometry (HRMS) would be a step forward to investigate the environmental AE tiology of chronic diseases with an unprecedented precision. HRMS-based methods are currently operational to reproducibly profile thousands of endogenous metabolites as well as externally-derived chemicals and their biotransformation products in a large number of biological samples from human cohorts. These approaches provide a solid ground for the discovery of unrecognised biomarkers of exposure and metabolic effects associated with many chronic diseases. Nevertheless, some limitations remain and have to be overcome so that chemical exposomics can provide unbiased detection of chemical exposures affecting disease susceptibility in epidemiological studies. Some of these limitations include (i) the lack of versatility of analytical techniques to capture the wide diversity of chemicals; (ii) the lack of analytical sensitivity that prevents the detection of exogenous (and endogenous) chemicals occurring at (ultra) trace levels from restricted sample amounts, and (iii) the lack of automation of the annotation/identification process. In this article, we discuss a number of technological and methodological limitations hindering applications of HRMS-based methods and propose initial steps to push towards a more comprehensive characterisation of the internal chemical exposome. We also discuss other challenges including the need for harmonisation and the difficulty inherent in assessing the dynamic nature of the internal chemical exposome, as well as the need for establishing a strong international collaboration, high level networking, and sustainable research infrastructure. A great amount of research, technological development and innovative bio-informatics tools are still needed to profile and characterise the "invisible" (not profiled), "hidden" (not detected) and "dark" (not annotated) components of the internal chemical exposome and concerted efforts across numerous research fields are paramount

    Integrative analysis of multimodal mass spectrometry data in MZmine 3

    Get PDF
    3 Pág.We thank Christopher Jensen and Gauthier Boaglio for their contributions to the MZmine codebase. We thank Jianbo Zhang and Zachary Russ for their donations to MZmine development. The MZmine 3 logo was designed by the Bioinformatics & Research Computing group at the Whitehead Institute for Biomedical Research. T.P. is supported by Czech Science Foundation (GA CR) grant 21-11563M and by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement 891397. Support for P.C.D. was from US NIH U19 AG063744, P50HD106463, 1U24DK133658 and BBSRC-NSF award 2152526. T.S. acknowledges funding by Deutsche Forschungsgemeinschaft (441958208). M. Wang acknowledges the US Department of Energy Joint Genome Institute ( https://ror.org/04xm1d337 , a DOE Office of Science User Facility) and is supported by the Office of Science of the US Department of Energy operated under subcontract No. 7601660. E.R. and H.H. thank Wen Jiang (HILICON AB) for providing the iHILIC Fusion(+) column for HILIC measurements. M.F., K.D. and S.B. are supported by Deutsche Forschungsgemeinschaft (BO 1910/20). L.-F.N. is supported by the Swiss National Science Foundation (project 189921). D.P. was supported through the Deutsche Forschungsgemeinschaft (German Research Foundation) through the CMFI Cluster of Excellence (EXC-2124 — 390838134 project-ID 1-03.006_0) and the Collaborative Research Center CellMap (TRR 261 - 398967434). J.-K.W. acknowledges the US National Science Foundation (MCB-1818132), the US Department of Agriculture, and the Chan Zuckerberg Initiative. MZmine developers have received support from the European COST Action CA19105 — Pan-European Network in Lipidomics and EpiLipidomics (EpiLipidNET). We acknowledge the support of the Google Summer of Code (GSoC) program, which has funded the development of several MZmine modules through student projects. We thank Adam Tenderholt for introducing MZmine to the GSoC program.Peer reviewe

    Mortar admixtures for winter construction

    No full text
    corecore