283 research outputs found

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    Graph embedding in SYNCHEM2, an expert system for organic synthesis discovery

    Get PDF
    AbstractGraph embedding (subgraph isomorphism) is an NP-complete problem of great theoretical and practical importance in the sciences, especially chemistry and computer science. This paper presents positive test results for techniques to speed embedding by modeling graphs with subroutines, precalculating edge tables, turning recursion into iteration, and using search-ordering heuristics.The expert system synchem2 searches for synthesis routes of organic molecules without the online guidance of a user, and this paper examines how embedding information helps to implement the central operations of synchem2: selection, application, and evaluation of chemical reactions. The paper also outlines the architecture of synchem2, analyzes the computational time complexity of embedding and related problems in graph isomorphism and canonical chemical naming, and suggests topics and techniques for further research

    Analysis of Generative Chemistries

    Get PDF
    For the modelling of chemistry we use undirected, labelled graphs as explicit models of molecules and graph transformation rules for modelling generalised chemical reactions. This is used to define artificial chemistries on the level of individual bonds and atoms, where formal graph grammars implicitly represent large spaces of chemical compounds. We use a graph rewriting formalism, rooted in category theory, called the Double Pushout approach, which directly expresses the transition state of chemical reactions. Using concurrency theory for transformation rules, we define algorithms for the composition of rewrite rules in a chemically intuitive manner that enable automatic abstraction of the level of detail in chemical pathways. Based on this rule composition we define an algorithmic framework for generation of vast reaction networks for specific spaces of a given chemistry, while still maintaining the level of detail of the model down to the atomic level. The framework also allows for computation with graphs and graph grammars, which is utilised to model non-trivial chemical systems. The graph generation relies on graph isomorphism testing, and we review the general individualisation-refinement paradigm used in the state-of-the-art algorithms for graph canonicalisation, isomorphism testing, and automorphism discovery. We present a model for chemical pathways based on a generalisation of network flows from ordinary directed graphs to directed hypergraphs. The model allows for reasoning about the flow of individual molecules in general pathways, and the introduction of chemically motivated routing constraints. It further provides the foundation for defining specialised pathway motifs, which is illustrated by defining necessary topological constraints for both catalytic and autocatalytic pathways. We also prove that central types of pathway questions are NP-complete, even for restricted classes of reaction networks. The complete pathway model, including constraints for catalytic and autocatalytic pathways, is implemented using integer linear programming. This implementation is used in a tree search method to enumerate both optimal and near-optimal pathway solutions. The formal methods are applied to multiple chemical systems: the enzyme catalysed beta-lactamase reaction, variations of the glycolysis pathway, and the formose process. In each of these systems we use rule composition to abstract pathways and calculate traces for isotope labelled carbon atoms. The pathway model is used to automatically enumerate alternative non-oxidative glycolysis pathways, and enumerate thousands of candidates for autocatalytic pathways in the formose process

    Chemoinformatics approaches for new drugs discovery

    Get PDF
    Chemoinformatics uses computational methods and technologies to solve chemical problems. It works on molecular structures, their representations, properties and related data. The first and most important phase in this field is the translation of interconnected atomic systems into in-silico models, ensuring complete and correct chemical information transfer. In the last 20 years the chemical databases evolved from the state of molecular repositories to research tools for new drugs identification, while the modern high-throughput technologies allow for continuous chemical libraries size increase as highlighted by publicly available repository like PubChem [http://pubchem.ncbi.nlm.nih.gov/], ZINC [http://zinc.docking.org/], ChemSpider[http://www.chemspider. com/]. Chemical libraries fundamental requirements are molecular uniqueness, absence of ambiguity, chemical correctness (related to atoms, bonds, chemical orthography), standardized storage and registration formats. The aim of this work is the development of chemoinformatics tools and data for drug discovery process. The first part of the research project was focused on accessible commercial chemical space analysis; looking for molecular redundancy and in-silico models correctness in order to identify a unique and univocal molecular descriptor for chemical libraries indexing. This allows for the 0%-redundancy achievement on a 42 millions compounds library. The protocol was implemented as MMsDusty, a web based tool for molecular databases cleaning. The major protocol developed is MMsINC, a chemoinformatics platform based on a starting number of 4 millions non-redundant high-quality annotated and biomedically relevant chemical structures; the library is now being expanded up to 460 millions compounds. MMsINC is able to perform various types of queries, like substructure or similarity search and descriptors filtering. MMsINC is interfaced with PDB(Protein Data Bank)[http://www.rcsb.org/pdb/home/home.do] and related to approved drugs. The second developed protocol is called pepMMsMIMIC, a peptidomimetic screening tool based on multiconformational chemical libraries; the screening process uses pharmacophoric fingerprints similarity to identify small molecules able to geometrically and chemically mimic endogenous peptides or proteins. The last part of this project lead to the implementation of an optimized and exhaustive conformational space analysis protocol for small molecules libraries; this is crucial for high quality 3D molecular models prediction as requested in chemoinformatics applications. The torsional exploration was optimized in the range of most frequent dihedral angles seen in X-ray solved small molecules structures of CSD(Cambridge Structural Database); by appling this on a 89 millions structures library was generated a library of 2.6 x 10 exp 7 high quality conformers. Tools, protocols and platforms developed in this work allow for chemoinformatics analysis and screening on large size chemical libraries achieving high quality, correct and unique chemical data and in-silico model

    Chiral spectroscopy : a multidisciplinary approach to chiral structure determination of organic molecules

    Get PDF

    Computational and in vitro study of isolated domains from fungal polyketide synthases

    Get PDF
    Diverse approaches have been explored to generate new polyketides by engineering polyketide synthases (PKS). Although it has been proven possible to produce new compounds by designed PKS, engineering strategies failed to make polyketides available via widely applicable rules and protocols. The aim of this work was the first rational engineering of an iterative highly-reducing polyketide synthase (HR-PKS). This approach was performed on the Squalestatin Tetraketide Synthase (SQTKS), which catalyses the biosynthesis of the tetraketide side chain of squalestatin-S1 53, which is a potent squalene synthase inhibitor and can be potentially used to treat serum cholesterol related diseases. Second, tenellin 62 was investigated, which is the product of the iterative Type I polyketide synthase non ribosomal peptide synthetase (PKS-NRPS) TENS. Using a combination of different in silico methods, structural models of the enoyl reductase (ER) domain of SQTKS were obtained and validated. With the generated protein models different rational engineering experiments in silico were performed, in which amino acids for the mutagenesis approach in vitro were identified. The subsequent in vitro experiments revealed that it was possible to rationally engineer the ER domain of SQTKS. In addition, the different integrated mutations showed different effects on the intrinsic programming of the ER domain. Further, the chemical selectivity and kinetic parameters of the tested di-, tri-, tetra- and heptaketide substrate were influenced in a specific way through the different mutated ER domains. In addition, the structural-biological foundations and analysis for the domain swaps between Pretenellin A Synthetase (TENS), Predesmethylbassianin A Synthetase (DMBS) and Premilitarinone C Synthetase (MILS) were investigated and validated. Through different in silico structural analyses it was possible to consider the effects of swaps on protein structure and to understand the effect of the swaps at the structural level. Additionally, the in silico analysis helped to clarify the influence of extrinsic and intrinsic programming factors

    Molassembler: Molecular graph construction, modification and conformer generation for inorganic and organic molecules

    Full text link
    We present the graph-based molecule software Molassembler for building organic and inorganic molecules. Molassembler provides algorithms for the construction of molecules built from any set of elements from the periodic table. In particular, poly-nuclear transition metal complexes and clusters can be considered. Structural information is encoded as a graph. Stereocenter configurations are interpretable from Cartesian coordinates into an abstract index of permutation for an extensible set of polyhedral shapes. Substituents are distinguished through a ranking algorithm. Graph and stereocenter representations are freely modifiable and chiral state is propagated where possible through incurred ranking changes. Conformers are generated with full stereoisomer control by four spatial dimension Distance Geometry with a refinement error function including dihedral terms. Molecules are comparable by an extended graph isomorphism and their representation is canonicalizeable. Molassembler is written in C++ and provides Python bindings.Comment: 81 pages, 26 figures, 3 table

    Isolation of enantiomers via diastereomer crystallisation

    Get PDF
    Enantiomer separation remains an important technique for obtaining optically active materials. Even though the enantiomers have identical physical properties, the difference in their biological activities make it important to separate them, in order to use single enantiomer products in the pharmaceutical and fine chemical industries. In this project, the separations of three pairs of diastereomer salts (Fig1) by crystallisation are studied, as examples of the ‘classical’ resolution of enantiomers via conversion to diastereomers. The lattice energies of these diastereomer compounds are calculated computationally (based on realistic potentials for the dominant electrostatic interactions and ab initio conformational energies). Then the experimental data are compared with the theoretical data to study the efficiency of the resolving agent. All three fractional crystallisations occurred relatively slowly, and appeared to be thermodynamically controlled. Separabilities by crystallisation have been compared with measured phase equilibrium data for the three systems studied. All crystallisations appear to be consistent with ternary phase diagrams. In the case of R = CH3, where the salt-solvent ternaries exhibited eutonic behaviour, the direction of isomeric enrichment changed abruptly on passing through the eutonic composition. In another example, R = OH, the ternaries indicated near-ideal solubility behaviour of the salt mixtures, and the separation by crystallisation again corresponded. Further, new polymorphic structures and generally better structure predictions have been obtained through out this study. In the case of R = CH3, an improved structure of the p-salt has been determined. In the case of R = C2H5, new polymorphic forms of the n-salts, II and III, have been both discovered and predicted. This work also demonstrates that chemically related organic molecules can exhibit different patterns of the relative energies of the theoretical low energy crystal structures, along with differences in the experimental polymorphic behaviour. This joint experimental and computational investigation provides a stringent test of the reliability of lattice modelling to explain the origins of chiral resolution via diastereomer formation. All the experimental and computational works investigated in this thesis are published (see APPENDIX 1)

    Structural determinants of binding and specificity in transforming growth factor-receptor interactions

    Get PDF
    Transforming growth factor (TGF-ÎČ) protein families are cytokines that occur as a large number of homologous proteins. Three major subgroups of these proteins with marked specificities for their receptors have been found-TGF-ÎČ, activin/inhibin, and bone morphogenic protein. Although structural information is available for some members of the TGF-ÎČ family of ligands and receptors, very little is known about the way these growth factors interact with the extracellular domains of their cell surface receptors, especially the type II receptor. In addition, the elements that are the determinants of binding and specificity of the ligands are poorly understood. The structure of the extracellular domain of the receptor is a three-finger fold similar to some toxin structures. Amino acid exchanges between multiply aligned homologous sequences of type II receptors point to a residue at the surface, specifically finger 1, as the determinant of ligand specificity and complex formation. The "knuckle" epitope of ligands was predicted to be the surface that interacts with the type II receptor. The residues on strands ÎČ2, ÎČ3, ÎČ7, ÎČ8 and the loop region joining ÎČ2 and ÎČ3 and joining ÎČ7 and ÎČ8 of the ligands were identified as determinants of binding and specificity. These results are supported by studies on the docking of the type II receptor to the ligand dimer-type I receptor complex
    • 

    corecore