5 research outputs found

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author鈥檚 contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph鈥恌ree matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo鈥恆symmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron鈥恮ithdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one鈥恠tep retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    Estudio, desarrollo y aplicaci贸n de modelos de la teor铆a QSPR-QSAR en pesticidas

    Get PDF
    La presente tesis doctoral se enfoca en la construcci贸n de modelos predictivos y que sean de utilidad como herramienta para asistir la b煤squeda de estructuras qu铆micas con valores favorables de la propiedad/actividad. La habilidad de predecir las propiedades fisicoqu铆micas y actividades biol贸gicas de las sustancias qu铆micas permite analizar de antemano las propiedades de compuestos nuevos, t贸xicos o que demandan demasiado tiempo de evaluaci贸n experimental. As铆, los modelos pueden ser utilizados para la predicci贸n de las propiedades fisicoqu铆micas/actividades biol贸gicas de nuevos compuestos qu铆micos sintetizados en el laboratorio y carentes de datos experimentales. Entre los objetivos espec铆ficos del presente trabajo de tesis se citan: - desarrollar modelos matem谩ticos que resulten capaces de cuantificar relaciones hipot茅ticas entre la estructura qu铆mica y la propiedad/actividad de pesticidas, a trav茅s de la t茅cnica del an谩lisis de regresi贸n lineal multivariable aplicada a diferentes bases de datos de propiedades de inter茅s agron贸mico extra铆das de la literatura actualizada. Para ello, se utilizar谩n los mejores descriptores moleculares que surjan del an谩lisis de miles de descriptores estructurales, obtenidos de programas computacionales de libre acceso - investigar el comportamiento de los descriptores flexibles u 贸ptimos en los estudios QSPR-QSAR de pesticidas, e incorporarlos en los modelos en caso que resulten adecuados. Para ello, uno debe ser capaz de definir la construcci贸n matem谩tica del descriptor flexible, y debe elegir el procedimiento de ajuste de sus partes variables para alcanzar las mejores predicciones de la propiedad, evitando el sobreajuste del conjunto de calibraci贸n para as铆 poder alcanzar una calidad predictiva aceptable y el modelo supere su validaci贸n externa - abordar el tratamiento de grandes conjuntos moleculares de alta diversidad estructural y que incluyan pesticidas - demostrar a trav茅s de los resultados encontrados que un enfoque basado en la representaci贸n estructural independiente de la conformaci贸n molecular permite alcanzar predicciones confiables de la propiedad/actividad estudiada La calidad de las predicciones conseguidas con estos estudios QSPR-QSAR de pesticidas se compara con la informaci贸n experimental disponible y a trav茅s de las predicciones alcanzadas por metodolog铆as te贸ricas alternativas de la literatura.Facultad de Ciencias Exacta

    Hydrate crystal structures, radial distribution functions, and computing solubility

    Get PDF
    Solubility prediction usually refers to prediction of the intrinsic aqueous solubility, which is the concentration of an unionised molecule in a saturated aqueous solution at thermodynamic equilibrium at a given temperature. Solubility is determined by structural and energetic components emanating from solid-phase structure and packing interactions, solute鈥搒olvent interactions, and structural reorganisation in solution. An overview of the most commonly used methods for solubility prediction is given in Chapter 1. In this thesis, we investigate various approaches to solubility prediction and solvation model development, based on informatics and incorporation of empirical and experimental data. These are of a knowledge-based nature, and specifically incorporate information from the Cambridge Structural Database (CSD). A common problem for solubility prediction is the computational cost associated with accurate models. This issue is usually addressed by use of machine learning and regression models, such as the General Solubility Equation (GSE). These types of models are investigated and discussed in Chapter 3, where we evaluate the reliability of the GSE for a set of structures covering a large area of chemical space. We find that molecular descriptors relating to specific atom or functional group counts in the solute molecule almost always appear in improved regression models. In accordance with the findings of Chapter 3, in Chapter 4 we investigate whether radial distribution functions (RDFs) calculated for atoms (defined according to their immediate chemical environment) with water from organic hydrate crystal structures may give a good indication of interactions applicable to the solution phase, and justify this by comparison of our own RDFs to neutron diffraction data for water and ice. We then apply our RDFs to the theory of the Reference Interaction Site Model (RISM) in Chapter 5, and produce novel models for the calculation of Hydration Free Energies (HFEs)

    Discrimination Power of Polynomial-Based Descriptors for Graphs by Using Functional Matrices.

    No full text
    In this paper, we study the discrimination power of graph measures that are based on graph-theoretical matrices. The paper generalizes the work of [M. Dehmer, M. Moosbrugger. Y. Shi, Encoding structural information uniquely with polynomial-based descriptors by employing the Randi膰 matrix, Applied Mathematics and Computation, 268(2015), 164-168]. We demonstrate that by using the new functional matrix approach, exhaustively generated graphs can be discriminated more uniquely than shown in the mentioned previous work

    Book of abstracts

    Get PDF
    corecore