53 research outputs found

    OFraMP: a fragment-based tool to facilitate the parametrization of large molecules

    Get PDF
    An Online tool for Fragment-based Molecule Parametrization (OFraMP) is described. OFraMP is a web application for assigning atomic interaction parameters to large molecules by matching sub-fragments within the target molecule to equivalent sub-fragments within the Automated Topology Builder (ATB, atb.uq.edu.au) database. OFraMP identifies and compares alternative molecular fragments from the ATB database, which contains over 890,000 pre-parameterized molecules, using a novel hierarchical matching procedure. Atoms are considered within the context of an extended local environment (buffer region) with the degree of similarity between an atom in the target molecule and that in the proposed match controlled by varying the size of the buffer region. Adjacent matching atoms are combined into progressively larger matched sub-structures. The user then selects the most appropriate match. OFraMP also allows users to manually alter interaction parameters and automates the submission of missing substructures to the ATB in order to generate parameters for atoms in environments not represented in the existing database. The utility of OFraMP is illustrated using the anti-cancer agent paclitaxel and a dendrimer used in organic semiconductor devices. Graphical abstract: OFraMP applied to paclitaxel (ATB ID 35922).[Figure not available: see fulltext.

    Molecular Formula Identification using High Resolution Mass Spectrometry: Algorithms and Applications in Metabolomics and Proteomics

    Get PDF
    Wir untersuchen mehrere theoretische und praktische Aspekte der Identifikation der Summenformel von Biomolekülen mit Hilfe von hochauflösender Massenspektrometrie. Durch die letzten Forschritte in der Instrumentation ist die Massenspektrometrie (MS) zur einen der Schlüsseltechnologien für die Analyse von Biomolekülen in der Proteomik und Metabolomik geworden. Sie misst die Massen der Moleküle in der Probe mit hoher Genauigkeit, und ist für die Messdatenerfassung im Hochdurchsatz gut geeignet. Eine der Kernaufgaben in der MS-basierten Proteomik und Metabolomik ist die Identifikation der Moleküle in der Probe. In der Metabolomik unterliegen Metaboliten der Strukturaufklärung, beginnend bei der Summenformel eines Moleküls, d.h. der Anzahl der Atome jedes Elements. Dies ist der entscheidende Schritt in der Identifikation eines unbekannten Metabolits, da die festgelegte Formel die Anzahl der möglichen Molekülstrukturen auf eine viel kleinere Menge reduziert, die mit Methoden der automatischen Strukturaufklärung weiter analysiert werden kann. Nach der Vorverarbeitung ist die Ausgabe eines Massenspektrometers eine Liste von Peaks, die den Molekülmassen und deren Intensitäten, d.h. der Anzahl der Moleküle mit einer bestimmten Masse, entspricht. Im Prinzip können die Summenformel kleiner Moleküle nur mit präzisen Massen identifiziert werden. Allerdings wurde festgestellt, dass aufgrund der hohen Anzahl der chemisch legitimer Formeln in oberen Massenbereich eine exzellente Massengenaugkeit alleine für die Identifikation nicht genügt. Hochauflösende MS erlaubt die Bestimmung der Molekülmassen und Intensitäten mit hervorragender Genauigkeit. In dieser Arbeit entwickeln wir mehrere Algorithmen und Anwendungen, die diese Information zur Identifikation der Summenformel der Biomolekülen anwenden

    Advances in structure elucidation of small molecules using mass spectrometry

    Get PDF
    The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for modern mass spectrometers, such as mass resolving power, mass accuracy, isotopic abundance accuracy, accurate mass multiple-stage MS(n) capability, as well as hybrid mass spectrometric and orthogonal chromatographic approaches. The latter part discusses mass spectral data handling strategies, which includes background and noise subtraction, adduct formation and detection, charge state determination, accurate mass measurements, elemental composition determinations, and complex data-dependent setups with ion maps and ion trees. The importance of mass spectral library search algorithms for tandem mass spectra and multiple-stage MS(n) mass spectra as well as mass spectral tree libraries that combine multiple-stage mass spectra are outlined. The successive chapter discusses mass spectral fragmentation pathways, biotransformation reactions and drug metabolism studies, the mass spectral simulation and generation of in silico mass spectra, expert systems for mass spectral interpretation, and the use of computational chemistry to explain gas-phase phenomena. A single chapter discusses data handling for hyphenated approaches including mass spectral deconvolution for clean mass spectra, cheminformatics approaches and structure retention relationships, and retention index predictions for gas and liquid chromatography. The last section reviews the current state of electronic data sharing of mass spectra and discusses the importance of software development for the advancement of structure elucidation of small molecules

    CELLmicrocosmos - Integrative cell modeling at the  molecular, mesoscopic and functional level

    Get PDF
    Sommer B. CELLmicrocosmos - Integrative cell modeling at the  molecular, mesoscopic and functional level. Bielefeld: Bielefeld University; 2012.The modeling of cells is an important application area of Systems Biology. In the context of this work, three cytological levels are defined: the mesoscopic, the molecular and the functional level. A number of related approaches which are quite diverse will be introduced during this work which can be categorized into these disciplines. But none of these approaches covers all areas. In this work, the combination of all three aforementioned cytological levels is presented, realized by the CELLmicrocosmos project, combining and extending different Bioinformatics-related methods. The mesoscopic level is covered by CellEditor which is a simple tool to generate eukaryotic or prokaryotic cell models. These are based on cell components represented by three-dimensional shapes. Different methods to generate these shapes are discussed by using partly external tools such as Amira, 3ds Max and/or Blender; abstract, interpretative, 3D-microscopy-based and molecular-structure-based cell component modeling. To communicate with these tools, CellEditor provides import as well as export capabilities based on the VRML97 format. In addition, different cytological coloring methods are discussed which can be applied to the cell models. MembraneEditor operates at the molecular level. This tool solves heterogeneous Membrane Packing Problems by distributing lipids on rectangular areas using collision detection. It provides fast and intuitive methods supporting a wide range of different application areas based on the PDB format. Moreover, a plugin interface enables the use of custom algorithms. In the context of this work, a high-density-generating lipid packing algorithm is evaluated; The Wanderer. The semi-automatic integration of proteins into the membrane is enabled by using data from the OPM and PDBTM database. Contrasting with the aforementioned structural levels, the third level covers the functional aspects of the cell. Here, protein-related networks or data sets can be imported and mapped into the previously generated cell models using the PathwayIntegration. For this purpose, data integration methods are applied, represented by the data warehouse DAWIS-M.D. which includes a number of established databases. This information is enriched by the text-mining data acquired from the ANDCell database. The localization of proteins is supported by different tools like the interactive Localization Table and the Localization Charts. The correlation of partly multi-layered cell components with protein-related networks is covered by the Network Mapping Problem. A special implementation of the ISOM layout is used for this purpose. Finally, a first approach to combine all these interrelated levels is represented; CellExplorer which integrates CellEditor as well as PathwayIntegration and imports structures generated with MembraneEditor. For this purpose, the shape-based cell components can be correlated with networks as well as molecular membrane structures using Membrane Mapping. It is shown that the tools discussed here can be applied to scientific as well as educational tasks: educational cell visualization, initial membrane modeling for molecular simulations, analysis of interrelated protein sets, cytological disease mapping. These are supported by the user-friendly combination of Java, Java 3D and Web Start technology. In the last part of this thesis the future of Integrative Cell Modeling is discussed. While the approaches discussed here represent basically three-dimensional snapshots of the cell, prospective approaches have to be extended into the fourth dimension; time

    Development and implementation of in silico molecule fragmentation algorithms for the cheminformatics analysis of natural product spaces

    Get PDF
    Computational methodologies extracting specific substructures like functional groups or molecular scaffolds from input molecules can be grouped under the term “in silico molecule fragmentation”. They can be used to investigate what specifically characterises a heterogeneous compound class, like pharmaceuticals or Natural Products (NP) and in which aspects they are similar or dissimilar. The aim is to determine what specifically characterises NP structures to transfer patterns favourable for bioactivity to drug development. As part of this thesis, the first algorithmic approach to in silico deglycosylation, the removal of glycosidic moieties for the study of aglycones, was developed with the Sugar Removal Utility (SRU) (Publication A). The SRU has also proven useful for investigating NP glycoside space. It was applied to one of the largest open NP databases, COCONUT (COlleCtion of Open Natural prodUcTs), for this purpose (Publication B). A contribution was made to the Chemistry Development Kit (CDK) by developing the open Scaffold Generator Java library (Publication C). Scaffold Generator can extract different scaffold types and dissect them into smaller parent scaffolds following the scaffold tree or scaffold network approach. Publication D describes the OngLai algorithm, the first automated method to identify homologous series in input datasets, group the member structures of each group, and extract their common core. To support the development of new fragmentation algorithms, the open Java rich client graphical user interface application MORTAR (MOlecule fRagmenTAtion fRamework) was developed as part of this thesis (Publication E). MORTAR allows users to quickly execute the steps of importing a structural dataset, applying a fragmentation algorithm, and visually inspecting the results in different ways. All software developed as part of this thesis is freely and openly available (see https://github.com/JonasSchaub)

    Towards automated identification of metabolites using mass spectral trees

    Get PDF
    The detailed description of the chemical compounds present in organisms, organs/tissues, biofluids and cells is the key to understand the complexity of biological systems. The small molecules (metabolites) are known to be very diverse in structure and function. However, the identification of the chemical structure of metabolites is one of the major bottlenecks in metabolomics research. Hence, the annotation and the structure elucidation of the metabolites are essential to understand the biological system under study. Actually, no single analytical platform exists that can measure and identify all existing metabolites. Multistage mass spectrometry (MSn) is a powerful analytical technique that helps identifying all these metabolites. This technique provides detailed structural information of the unknown metabolite by fragmenting the metabolite and its fragments recursively. However, only computational tools can provide a fast and straightforward analysis of the large amount of complex data that is generated by using MSn spectrometry. The aim of this thesis was to develop a novel semi-automatic approach for the identification of metabolites using MS n data. Furthermore, these tools were to be integrated into a pipeline to assign identities to unknown metabolites present in databases but especially to unknown metabolites not present in a databaseUBL - phd migration 201

    Analytical and computational methods towards a metabolic model of ageing in Caenorhabditis elegans

    Get PDF
    Human life expectancy is increasing globally. This has major socioeconomic implications, but also raises scientific questions about the biological bases of ageing and longevity. Research on appropriate model organisms, such as the nematode worm Caenorhabditis elegans, is a key component of answering these questions. Ageing is a complex phenomenon, with both environmental and genetic influences. Metabolomics, the analysis of all small molecules within a biological system, offers the ability to integrate these complex factors to help understand the role of metabolism in ageing. This thesis addresses the current lack of methods for C. elegans metabolite analysis, with a particular focus on combining analytical and computational approaches. As a first essential step, C. elegans metabolite extraction protocols for NMR, GC-MS and LC-MS based analysis were optimized. Several methods to improve the coverage, automatic annotation and data analysis steps of NMR and GC-MS are proposed. Next, stable isotope labelling was explored as a tool for C. elegans metabolomics. An automated stable isotope based workflow was developed, which identifies all biological, non-redundant features within a LC-MS acquisition and annotates them with molecular compositions. This demonstrated that the vast majority (> 99.5%) of detected features inside LC-MS metabolomics experiments are not of biological origin or redundant. This stable isotope workflow was then used to compare the metabolism of 24 different C. elegans mutant strains from different pathways (e.g. insulin signalling, TOR pathway, neuronal signalling), with differing levels of lifespan extension compared to wild-type worms. The biologically relevant features (metabolites) were detected and annotated, and compared across the mutants. Some metabolites were correlated with longevity across the mutant set, in particular, glycerophospholipids. This led to the formulation of a hypothesis, that lifespan extension in C. elegans requires increased activity of common downstream longevity effector mechanisms (autophagy, and mitochondrial biogenesis), that also involve subcellular compartmentation and hence membrane formation. This results in the alterations in lipid metabolism detected here.Open Acces

    Multiple-choice knapsack for assigning partial atomic charges in drug-like molecules

    Get PDF
    A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Charges are accompanied by a score that depends on their observed frequency in similar neighbourhoods (chemical environments) in a database of previously parameterised molecules. The aim is to assign the charges such that the total charge equals a known target charge within a margin of error while maximizing the sum of the charge scores. We show that the problem is a variant of the well-studied multiple-choice knapsack problem and thus weakly NP-complete. We propose solutions based on Integer Linear Programming and a pseudo-polynomial time Dynamic Programming algorithm. We show that the results obtained for novel molecules not included in the database are comparable to the ones obtained performing explicit charge calculations while decreasing the time to determine partial charges for a molecule by several orders of magnitude, that is, from hours or even days to below a second. Our software is openly available at https://github.com/enitram/charge_assign

    Multiple-choice knapsack for assigning partial atomic charges in drug-like molecules

    No full text
    A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select a charge from a set of candidate charges for every atom of a given query molecule. Charges are accompanied by a score that depends on their observed frequency in similar neighbourhoods (chemical environments) in a database of previously parameterised molecules. The aim is to assign the charges such that the total charge equals a known target charge within a margin of error while maximizing the sum of the charge scores. We show that the problem is a variant of the well-studied multiple-choice knapsack problem and thus weakly NP-complete. We propose solutions based on Integer Linear Programming and a pseudo-polynomial time Dynamic Programming algorithm. We show that the results obtained for novel molecules not included in the database are comparable to the ones obtained performing explicit charge calculations while decreasing the time to determine partial charges for a molecule by several orders of magnitude, that is, from hours or even days to below a second. Our software is openly available at https://github.com/enitram/charge_assign
    corecore