245 research outputs found

    Method and System for Identification of Metabolites Using Mass Spectra

    Get PDF
    A method and system is provided for mass spectrometry for identification of a specific elemental formula for an unknown compound which includes but is not limited to a metabolite. The method includes calculating a natural abundance probability (NAP) of a given isotopologue for isotopes of non-labelling elements of an unknown compound. Molecular fragments for a subset of isotopes identified using the NAP are created and sorted into a requisite cache data structure to be subsequently searched. Peaks from raw spectrum data from mass spectrometry for an unknown compound. Sample-specific peaks of the unknown com- pound from various spectral artifacts in ultra-high resolution Fourier transform mass spectra are separated. A set of possible isotope-resolved molecular formula (IMF) are created by iteratively searching the molecular fragment caches and combining with additional isotopes and then statistically filtering the results based on NAP and mass-to-charge (m/2) matching probabilities. An unknown compound is identified and its corresponding elemental molecular formula (EMF) from statistically-significant caches of isotopologues with compatible IMFs

    Mathematische Verfahren zur Aufklärung der Struktur, Dynamik und biologischen Aktivität von Molekülen unter Verwendung von NMR spektroskopischen und empirischen Parametern

    Get PDF
    In der vorliegenden Arbeit werden Verfahren der Mathematik und Informatik entwickelt und eingesetzt, um Struktur, Dynamik und biologische Aktivität aus NMR spektroskopischen und empirischen Parametern zu bestimmen. Dolastatin 10 und Epothilon A sind potentielle Wirkstoffe gegen Krebs, da sie durch Wechselwirkung mit Tubulin die Zellteilung unterbinden. Die 3D Struktur beider Wirkstoffe in Lösung und die Struktur von an Tubulin gebundenem Epothilon A wird aus NMR spektroskopischen Parametern bestimmt. Dolastatin 10 liegt in einem konformationellen Gleichgewicht zwischen der cis -- und trans -- Konformation in der ungewöhnlichen Aminosäure DAP vor. Beide Konformationen des flexiblen Pentapeptids können bestimmt werden mit RMSD = 1.423 Å für das cis -- Konformer und RMSD = 1.488 Å für das trans -- Konformer. Während das trans -- Konformer gestreckt vorliegt, faltet das cis -- Konformer am DAP zurück. Epothilone A ist durch einen Makrozyklus weniger flexibel und sowohl die an Tubulin gebundene Struktur (RMSD = 0.537 Å) als auch freie Form (RMSD = 0.497 Å) kann mit geringen RMSD -- Werten bestimmt werden. Die Struktur der freien Form, welche in Lösung hauptsächlich vorliegt, ist mit der Röntgenstruktur weitgehend identisch. In der an Tubulin gebundenen Form wird eine essentielle Umorientierung der Seitenkette beobachtet, die für die Wechselwirkung mit Tubulin entscheidend ist. Dipolare Kopplungen eines Proteins sind geeignet, eine 3D Homologiesuche in der PDB durchzuführen, da die relative Orientierung von Sekundärstrukturelementen und Domänen durch sie beschrieben wird 85 . Die frühe Erkennung 3D homologer Proteinfaltungen eröffnet die Möglichkeit, die Bestimmung von Proteinstrukturen zu beschleunigen. Eine Homolgiesuche unter Nutzung dipolarer Kopplungen ist in der Lage, Proteine oder zumindest Fragmente mit ähnlicher 3D Struktur zu finden, auch wenn die Primärsequenzhomologie gering ist. Darüber hinaus wird eine Transformation für experimentelle dipolare Kopplungen entwickelt, die die indirekte Orientierungsinformation eines Vektors relativ zu einem externen Tensor in den möglichen Bereich für den Projektionswinkel zwischen zwei Vektoren und somit in eine intramolekulare Strukturinformation übersetzt. Diese Einschränkungen können in der Strukturbestimmung von Proteinen mittels Molekulardynamik genutzt werden 92 . Im Gegensatz zu allen existierenden Implementierungen wird die Konvergenz der Rechnung durch die auf diese Weise eingeführten dipolare Kopplungsinformation kaum beeinflusst. Die dipolaren Kopplungen werden trotzdem von den errechneten Strukturen erfüllt. Auch ohne die Nutzung bereits bekannter Protein­ oder Fragmentstrukturen kann so ein erheblicher Teil der NOE -- Information substituiert werden. Die Dynamik des Vektors, der die beiden wechselwirkenden Dipole verbindet, beeinflusst den Messwert der dipolaren Kopplung. Dadurch wird Information über die Dynamik von Molekülen auf der µs­Zeitskala zugänglich, die bisher nur schwer untersucht werden konnte. Die Messung dipolarer Kopplungen für einen Vektor in verschiedenen Orientierungen erlaubt die Analyse seiner Bewegung 89 . Im besonderen ist die Ableitung eines modellfreien Ordnungsparameters 2 S möglich. Weiterhin lassen sich ebenso modellfrei eine mittlere Orientierung des Vektors, axialsymmetrische Anteile und nichtaxialsymmetrische Anteile der Dynamik ableiten und auswerten. Die Anwendung der so entwickelten Protokolle auf experimentelle Daten 90 lässt Proteine deutlich dynamischer erscheinen als auf der Zeitskala der Relaxationsexperimente zu erkennen ist. Der mittlere Ordnungsparameter sinkt von 0.8 auf 0.6. Dies entspricht einer Erhöhung des Öffnungswinkels der Bewegung von ca. 22 ° auf ca. 33°. Die Bewegungen weichen teilweise bis zu 40% und im Mittel 15% von der Axialsymmetrie ab. Neuronale Netze erlauben eine schnelle (ca. 5000 chemische Verschiebungen pro Sekunde) und exakte (mittleren Abweichung von 1.6 ppm) Berechnung der 13 C NMR chemischen Verschiebung 115 . Dabei kombinieren sie die Vorteile bisher bekannter Datenbankabschätzungen (hohe Genauigkeit) und Inkrementverfahren (hohe Geschwindigkeit). Das 13 C NMR Spektrum einer organischen Verbindung stellt eine detaillierte Beschreibung seiner Struktur dar. Resultate des Strukturgenerators COCON können durch den Vergleich des experimentellen mit den berechneten 13 C NMR Spektren auf ca. 1 o/oo der vorgeschlagenen Strukturen eingeschränkt werden, die eine geringe Abweichung zum experimentellen Spektrum haben 122 . Die Kombination mit einer Substrukturanalyse erlaubt weiterhin die Erkennung wahrscheinlicher, geschlossener Ringsysteme und gibt einen Überblick über die Struktur des generierten Konstitutionssubraumes. Genetische Algorithmen können die Struktur organischer Moleküle ausgehend von derer Summenformel auf eine Übereinstimmung mit dem experimentellen 13 C NMR Spektrum optimieren. Die Konstitution von Molekülen wird dafür durch einen Vektor der Bindungszustände zwischen allen Atom -- Atom Paaren beschrieben. Selbige Vektoren sind geeignet, in einem genetischen Algorithmus als genetischer Code von Konstitutionen betrachtet zu werden. Diese Methode erlaubt die automatisierte Bestimmung der Konstitution von Molekülen mit 10 bis 20 Nichtwasserstoffatomen 123 . Symmetrische neuronale Netze können fünf bzw. sieben dimensionale, heterogene Parameterrepräsentationen der 20 proteinogenen Aminosäuren unter Erhalt der wesentlichen Information in den dreidimensionalen Raum projizieren 134 . Die niederdimensionalen Projektionen ermöglichen eine Visualisierung der Beziehungen der Aminosäuren untereinander. Die reduzierten Parameterrepräsentationen sind geeignet, als Eingabe für ein neuronales Netz zu dienen, welches die Sekundärstruktur eines Proteins mit einer Genauigkeit von 66 % im Q 3 -- Wert berechnet. Neuronale Netzte sind aufgrund ihrer flexiblen Struktur besonders geeignet, quantitative Beziehungen zwischen Struktur und Aktivität zu beschreiben, da hier hochgradig nichtlineare, komplexe Zusammenhänge vorliegen. Eine numerische Codierung der über 200 in der Literatur beschriebenen Epothilonderivate erlaubt es, Modelle zur Berechnung der Induktion der Tubulin Polymerisation (R = 0.73) und der Inhibierung des Krebszellenwachstums (R = 0.94) zu erstellen 136 . Die trainierten neuronalen Netze können in einer Sensitivitätsanalyse genutzt werden, um die Bindungsstellen des Moleküls zu identifizieren. Aus der Berechnung der Aktivität für alle Moleküle des durch die Parameter definierten Strukturraums ergeben sich Vorschläge für Epothilonderivate, die bis zu 1 000 mal aktiver als die bisher synthetisierten sein könnten

    Robust automatic assignment of nuclear magnetic resonance spectra for small molecules

    Get PDF
    Abstract. In this document we describe a fully automatic assignment system for Nuclear Magnetic Resonance (NMR) for small molecules. This system has 3 main features: 1. it uses as input raw NMR data. Which means it should be able to extract from them the information that is useful while ignores the noise; 2. it assigns the signals to atoms in the structure, and associates to each assignment a confidence value, which is used to sort all possible solutions; 3. it does not depend on chemical shifts predictions. So it can use the connectivity information observed in 2D NMR spectra and integrals to perform an assignment(coupling constants are also a possibility, but were not explored in this work). However, the system can use chemical shifts if available.; 4. it can learn in an unsupervised fashion, the relation between configurations of atoms and chemical shifts while solving assignment problems, which allows the system to improve while working. Analogous to the way a human works. This system is completely open source, as well as the data used in this work.En este trabajo describimos un sistema completamente automático de asignación de espectros de Resonancia Magnética Nuclear(RMN) para moléculas pequeñas. Este sistema tiene la siguientes características: 1. usa como entrada datos de RMN crudos. Lo que significa que debe ser capaz de extraer de ellos, la información que es útil y dejar de lado el ruido; 2. asigna las señales a átomos en la estructura, y asocia a cada asignación un valor de confianza, que es usado para ordenar todas las posibles soluciones; 3. no depende de predicciones de desplazamientos químicos, de forma que puede usar solo la información de conectividad observada en los espectros de RMN 2D y las integrales( las constantes de acople también son una posibilidad, pero no fueron exploradas en este trabajo). Sin embargo el sistema puede usar los desplazamientos químicos si están disponibles; 4. puede aprender de forma no supervisada, la relación entre configuraciones de átomos y desplazamientos químicos mientras resuelve problemas de asignación, lo que le permite mejorar mientras trabaja, de forma análoga a como lo hace un humano. Este sistema es completamente de código abierto, al igual que los datos que se usaron en este trabajo.Doctorad

    RASCAL: calculation of graph similarity using maximum common edge subgraphs

    Get PDF
    A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection

    Application of Singular Spectrum Analysis (SSA), Independent Component Analysis (ICA) and Empirical Mode Decomposition (EMD) for automated solvent suppression and automated baseline and phase correction from multi-dimensional NMR spectra

    Get PDF
    A common problem on protein structure determination by NMR spectroscopy is due to the solvent artifact. Typically, a deuterated solvent is used instead of normal water. However, several experimental methods have been developed to suppress the solvent signal in the case that one has to use a protonated solvent or if the signals of the remaining protons even in a highly deuterated sample are still too strong. For a protein dissolved in 90% H2O / 10% D2O, the concentration of solvent protons is about five orders of magnitude greater than the concentration of the protons of interest in the solute. Therefore, the evaluation of multi-dimensional NMR spectra may be incomplete since certain resonances of interest (e.g. Hα proton resonances) are hidden by the solvent signal and since signal parts of the solvent may be misinterpreted as cross peaks originating from the protein. The experimental solvent suppression procedures typically are not able to recover these significant protein signals. Many post-processing methods have been designed in order to overcome this problem. In this work, several algorithms for the suppression of the water signal have been developed and compared. In particular, it has been shown that the Singular Spectrum Analysis (SSA) can be applied advantageously to remove the solvent artifact from NMR spectra of any dimensionality both digitally and analogically acquired. In particular, the investigated time domain signals (FIDs) are decomposed into water and protein related components by means of an initial embedding of the data in the space of time-delayed coordinates. Eigenvalue decomposition is applied on these data and the component with the highest variance (typically represented by the dominant solvent signal) is neglected before reverting the embedding. Pre-processing (group delay management and signal normalization) and post-processing (inverse normalization, Fourier transformation and phase and baseline corrections) of the NMR data is mandatory in order to obtain a better performance of the suppression. The optimal embedding dimension has been empirically determined in accordance to a specific qualitative and quantitative analysis of the extracted components applied on a back-calculated two-dimensional spectrum of HPr protein from Staphylococcus aureus. Moreover, the investigation of experimental data (three-dimensional 1H13C HCCH-TOCSY spectrum of Trx protein from Plasmodium falciparum and two-dimensional NOESY and TOCSY spectra of HPr protein from Staphylococcus aureus) has revealed the ability of the algorithm to recover resonances hidden underneath the water signal. Pathological diseases and the effects of drugs and lifestyle can be detected from NMR spectroscopy applied on samples containing biofluids (e.g. urine, blood, saliva). The detection of signals of interest in such spectra can be hampered by the solvent as well. The SSA has also been successfully applied to one-dimensional urine, blood and cell spectra. The algorithm for automated solvent suppression has been introduced in the AUREMOL software package (AUREMOL_SSA). It is optionally followed by an automated baseline correction in the frequency domain (AUREMOL_ALS) that can be also used out the former algorithm. The automated recognition of baseline points is differently performed in dependence on the dimensionality of the data. In order to investigate the limitations of the SSA, it has been applied to spectra whose dominant signal is not the solvent (as in case of watergate solvent suppression and in case of back-calculated data not including any experimental water signal) determining the optimal solvent-to-solute ratio. The Independent Component Analysis (ICA) represents a valid alternative for water suppression when the solvent signal is not the dominant one in the spectra (when it is smaller than the half of the strongest solute resonance). In particular, two components are obtained: the solvent and the solute. The ICA needs as input at least as many different spectra (mixtures) as the number of components (source signals), thus the definition of a suitable protocol for generating a dataset of one-dimensional ICA-tailored inputs is straightforward. The ICA has revealed to overcome the SSA limitations and to be able to recover resonances of interest that cannot be detected applying the SSA. The ICA avoids all the pre- and post-processing steps, since it is directly applied in the frequency domain. On the other hand, the selection of the component to be removed is automatically detected in the SSA case (having the highest variance). In the ICA, a visual inspection of the extracted components is still required considering that the output is permutable and scale and sign ambiguities may occur. The Empirical Mode Decomposition (EMD) has revealed to be more suitable for automated phase correction than for solvent suppression purposes. It decomposes the FID into several intrinsic mode functions (IMFs) whose frequency of oscillation decreases from the first to the last ones (that identifies the solvent signal). The automatically identified non-baseline regions in the Fourier transform of the sum of the first IMFs are separately evaluated and genetic algorithms are applied in order to determine the zero- and first-order terms suitable for an optimal phase correction. The SSA and the ALS algorithms have been applied before assigning the two-dimensional NOESY spectrum (with the program KNOWNOE) of the PSCD4-domain of the pleuralin protein in order to increase the number of already existing distance restraints. A new routine to derive 3JHNHα couplings from torsion angles (Karplus relation) and vice versa, has been introduced in the AUREMOL software. Using the newly developed tools a refined three-dimensional structure of the PSCD4-domain could be obtained

    Towards automated identification of metabolites using mass spectral trees

    Get PDF
    The detailed description of the chemical compounds present in organisms, organs/tissues, biofluids and cells is the key to understand the complexity of biological systems. The small molecules (metabolites) are known to be very diverse in structure and function. However, the identification of the chemical structure of metabolites is one of the major bottlenecks in metabolomics research. Hence, the annotation and the structure elucidation of the metabolites are essential to understand the biological system under study. Actually, no single analytical platform exists that can measure and identify all existing metabolites. Multistage mass spectrometry (MSn) is a powerful analytical technique that helps identifying all these metabolites. This technique provides detailed structural information of the unknown metabolite by fragmenting the metabolite and its fragments recursively. However, only computational tools can provide a fast and straightforward analysis of the large amount of complex data that is generated by using MSn spectrometry. The aim of this thesis was to develop a novel semi-automatic approach for the identification of metabolites using MS n data. Furthermore, these tools were to be integrated into a pipeline to assign identities to unknown metabolites present in databases but especially to unknown metabolites not present in a databaseUBL - phd migration 201

    Computational methods for small molecules

    Get PDF
    Metabolism is the system of chemical reactions sustaining life in the cells of living organisms. It is responsible for cellular processes that break down nutrients for energy and produce building blocks for necessary molecules. The study of metabolism is vital to many disciplines in medicine and pharmacy. Chemical reactions operate on small molecules called metabolites, which form the core of metabolism. In this thesis we propose efficient computational methods for small molecules in metabolic applications. In this thesis we discuss four distinctive studies covering two major themes: the atom-level description of biochemical reactions, and analysis of tandem mass spectrometric measurements of metabolites. In the first part we study atom-level descriptions of organic reactions. We begin by proposing an optimal algorithm for determining the atom-to-atom correspondences between the reactant and product metabolites of organic reactions. In addition, we introduce a graph edit distance based cost as the mathematical formalism to determine optimality of atom mappings. We continue by proposing a compact single-graph representation of reactions using the atom mappings. We investigate the utility of the new representation in a reaction function classification task, where a descriptive category of the reaction's function is predicted. To facilitate the prediction, we introduce the first feasible path-based graph kernel, which describes the reactions as path sequences to high classification accuracy. In the second part we turn our focus on analysing tandem mass spectrometric measurements of metabolites. In a tandem mass spectrometer, an input molecule structure is fragmented into substructures or fragments, whose masses are observed. We begin by studying the fragment identification problem. A combinatorial algorithm is presented to enumerate candidate substructures based on the given masses. We also demonstrate the usefulness of utilising approximated bond energies as a cost function to rank the candidate structures according to their chemical feasibility. We propose fragmentation tree models to describe the dependencies between fragments for higher identification accuracy. We continue by studying a closely related problem where an unknown metabolite is elucidated based on its tandem mass spectrometric fragment signals. This metabolite identification task is an important problem in metabolomics, underpinning the subsequent modelling and analysis efforts. We propose an automatic machine learning framework to predict a set of structural properties of the unknown metabolite. The properties are turned into candidate structures by a novel statistical model. We introduce the first mass spectral kernels and explore three feature classes to facilitate the prediction. The kernels introduce support for high-accuracy mass spectrometric measurements for enhanced predictive accuracy.Tässä väitöskirjassa esitetään tehokkaita laskennallisia menetelmiä pienille molekyyleille aineenvaihduntasovelluksissa. Aineenvaihdunta on kemiallisten reaktioiden järjestelmä, joka ylläpitää elämää solutasolla. Aineenvaihduntaprosessit hajottavat ravinteita energiaksi ja rakennusaineiksi soluille tarpeellisten molekyylien valmistamiseen. Kemiallisten reaktioiden muokkaamia pieniä molekyylejä kutsutaan metaboliiteiksi. Tämä väitöskirja sisältää neljä itsenäistä tutkimusta, jotka jakautuvat teemallisesti biokemiallisten reaktioiden atomitason kuvaamiseen ja metaboliittien massaspektrometriamittausten analysointiin. Väitöskirjan ensimmäisessä osassa käsitellään biokemiallisten reaktioiden atomitason kuvauksia. Väitöskirjassa esitellään optimaalinen algoritmi reaktioiden lähtö- ja tuoteaineiden välisten atomikuvausten määrittämiseen. Optimaalisuus määrittyy verkkojen editointietäisyyteen perustuvalla kustannusfunktiolla. Optimaalinen atomikuvaus mahdollistaa reaktion kuvaamisen yksikäsitteisesti yhdellä verkolla. Uutta reaktiokuvausta hyödynnetään reaktion funktion ennustustehtävässä, jossa pyritään määrittämään reaktiota sanallisesti kuvaava kategoria automaattisesti. Väitöskirjassa esitetään polku-perustainen verkkokerneli, joka kuvaa reaktiot atomien polkusekvensseinä verrattuna aiempiin kulkusekvensseihin saavuttaen paremman ennustustarkkuuden. Väitöskirjan toisessa osassa analysoidaan metaboliittien tandem-massaspektrometriamittauksia. Tandem-massaspektrometri hajottaa analysoitavan syötemolekyylin fragmenteiksi ja mittaa niiden massa-varaus suhteet. Väitöskirjassa esitetään perusteellinen kombinatorinen algoritmi fragmenttien tunnistamiseen. Menetelmän kustannusfunktio perustuu fragmenttien sidosenergioiden vertailuun. Lopuksi väitöskirjassa esitetään fragmentaatiopuut, joiden avulla voidaan mallintaa fragmenttien välisiä suhteita ja saavuttaa parempi tunnistustarkkuus. Fragmenttien tunnistuksen ohella voidaan tunnistaa myös analysoitavia metaboliitteja. Ongelma on merkittävä ja edellytys aineenvaihdunnun analyyseille. Väitöskirjassa esitetään koneoppimismenetelmä, joka ennustaa tuntemattoman metaboliitin rakennetta kuvaavia piirteitä ja muodostaa niiden perusteella rakenne-ennusteita tilastollisesti. Menetelmä esittelee ensimmäiset erityisesti massaspektrometriadataan soveltuvat kernel-funktiot ja saavuttaa hyvän ennustustarkkuuden

    Development and prospective application of chemoinformatic tools to explore new ligand chemistry and protein biology

    Get PDF
    Drug discovery and design is a tedious and expensive process whose small chances of success necessitates the development of novel chemoinformatic approaches and concepts. Their common goal is the efficient and robust identification of promising chemical matter and the reliable prediction of its properties. Computer-aided drug discovery and design (CADDD) and its multifarious installments throughout the different phases of the drug discovery pipeline contribute significantly to the expansion of the hits, the understanding of their structure-activity relationship and their rational diversification. They alleviate the development’s costs and its time-demand thus support the search for the needle in the haystack – a potent hit. The HTS-driven brute-force nature of current and of the decades’ past discovery and design strategies compelled researchers to develop ideas and algorithms in order to interfere with the pipeline and prevent its frequent failures. In the introduction, I describe the drug discovery and design pipeline and point out interfaces where CADDD contributes to its success. In Part 1 of this thesis, I present a novel methodology that supports the early-stage hit discovery processes through a fragment-based reduced graph similarity approach (RedFrag). It is a chimeric algorithm that combines fingerprint-based similarity calculation with scaffold-hopping-enabling graph isomorphism. We thoroughly investigated its performance retro- and prospectively. It uses a new type of reduced graph that does not suffer from information loss during its construction and bypasses the necessity of feature definitions. Built upon chemical epitopes resulting from molecule fragmentation, the reduced graph embodies physico-chemical and 2D-structural properties of a molecule. Reduced graphs are compared with a continuous-similarity-distance-driven maximal common subgraph algorithm, which calculates similarity at the fragmental and topological levels. The second chapter, Part 2, is dedicated to PrenDB: A digital compendium of the reaction space of prenyltransferases of the dimethylallyltryptophan synthase (DMATS) superfamily. Their catalytical transformations represent a major skeletal diversification step in the biosynthesis of secondary metabolites including the indole alkaloids. DMATS enzymes thus contribute significantly to the biological and pharmacological diversity of small molecule metabolites. The attachment of the prenyl donor to lead- or drug-like molecules renders the prenyltransferases useful in the access of chemical space that is difficult to reach by conventional synthesis. In PrenDB, we collected the substrates, enzymes and products. We then used a newly developed algorithm based on molecular fragmentation to automatically extract reactive chemical epitopes. The analysis of the collected data sheds light on the thus far explored substrate space of DMATS enzymes. We supplemented the browsable database with algorithmic prediction routines in order to assess the prenylability of novel compounds and did so for a set of 38 molecules. In a case study, Part 3, we investigated the regioselectivity of five prenyltransferases in the presence of unnatural prenyl donors. Detailed biochemical investigations revealed the acceptance of these dimethylallyl pyrophosphate (DMAPP) analogs by all tested enzymes with different relative activities and regioselectivities. In order to understand the activity profiles and their differences on a molecular level we investigated the interaction within the enzyme-prenyl donor-substrate system with molecular dynamics. Our experiments show that the reactivity of a prenyl donor strongly correlates with the distance of its electrophilic, reactive atom and the nucleophilic center of the substrate molecule. It renders the first step towards a better mechanistic understanding of the reactivity of prenyltransferases and expands significantly the potential usage and rational design of tryptophan prenylating enzymes as biocatalysts for Friedel–Crafts alkylation. Lastly, in Part 4, we present the synergistic potential of combined ligand- and structure-based drug discovery methodologies applied to the β2-adrenergic receptor (β2AR). The β2AR is a G protein-coupled receptor (GPCR) and a well-explored target. By the joint application of fingerprint-based similarity, substructure-based searches and docking we discovered 13 ligands – ten of which were novel – of this particular GPCR. Of note, two of the molecules used as starting points for the similarity and substructure searches distinguish themselves from other β2AR antagonists by their unique scaffold. Thus, the usage of a multistep hierarchical or parallel screening approach enabled us to use these unique structural features and discover novel chemical matter beyond the bounds of the ligand space known so far and emphasize the intrinsic complementarity of ligand- and structure-based approaches. The molecules described in this work allow us to explore the ligand space around the previously reported molecules in greater detail, leading to insights into their structure-activity relationship. In addition, we also characterized our hits with experimental binding and selectivity data and discussed it based on their putative binding modes derived by docking

    Kangaroo Island Propolis: Improved Characterisation and Assessment of Chemistry and Botanical Origins through Metabolomics

    Get PDF
    Introduction: Propolis, a sticky substance produced by bees from plant resins, has a long history of safe use medicinally. Kangaroo Island, SA (KI) lacks many introduced European plants bees preferentially collect resin from; consequentially, propolis from KI is produced from resinous native plants. Several identifiably reproducible pure-source KI propolis types exist. Research into medical use of compounds from KI native plants is limited. Metabolomics is a growing field of interest in natural products chemistry, including beehive products. Metabolomic and similarity-scoring assessment of KI propolis, through statistical evaluation of 1D 1H-NMR fingerprints, provides an entry point for research into medical use of KI native plant compounds. Many avenues to product discovery in pharmaceutical chemistry are suffering diminishing returns: metabolomics-guided natural products assessment has the potential for further identification of novel therapeutic compounds from resinous plants. Aim: To assess and identify, via metabolomic investigation of NMR fingerprints, major propolis types on KI, and to produce, from this, similarity-scoring tools for assessment of propolis samples. Method: KI propolis samples, identified as pure-source by TLC, and resinous KI plants were analysed by 1H-NMR and HPLC. Data points of interest were normalised and binned to form individual sample ‘fingerprints’. Data from these fingerprints were analysed by hierarchical clustering and principal component analysis (PCA) to confirm provisionally-identified pure-source propolis types and identify subtypes within propolis and resinous plant species. From this, calculator tools were created to score similarity (out of 1000) of 1H-NMR fingerprints to the average spectrum of pure-source propolis types, as well as to calculated mixtures of these average spectra. Assessment of the chemistry of two major KI propolis types identified (CP- and F-type) was made by fractionation and NMR, with one compound, 6,8-diprenyleriodictyol, isolated from CP-type propolis in quantity, submitted for epigenetic and other biological assays. Results: Source resinous plants were demonstrated, through hierarchical clustering and PCA, to cluster with propolis types arising from these sources, with closely related plants and sub-chemotypes clustering separately, confirming specificity. A number of previously-identified pure-source propolis types and known botanical sources were shown to have very high similarity (> 800/1000) to the expected propolis type. Calculator tools were observed to accurately predict the content of mixed propolis samples to within ± 10%. A number of methylflavanones, and two novel terminally-hydroxylated prenyldihydrochalcones were isolated from F-type propolis. 6,8-diprenyleriodictyol demonstrated a range of promising activity in biological assays. Conclusion: Metabolomic evaluation of 1H-NMR fingerprints can reliably identify and assess pure-source KI propolis and identify botanical origin of source resins. Similarity scoring calculators can accurately identify mixed-source propolis samples. KI propolis types are a rich source of pharmaceutically-interesting flavanones and related compounds, many of which are prenylated. 6,8-diprenyleriodictyol displays strong anti-inflammatory and anticancer activity, especially against Burkitt’s lymphoma. A number of possible epigenetic pathways for this activity were observed

    Predicting NMR parameters from the molecular structure

    Get PDF
    corecore