3 research outputs found

    Analysis of Random Fragment Profiles for the Detection of Structure-Activity Relationships

    Get PDF
    Substructure- or fragment-type descriptors are effective and widely used tools for chemical similarity searching and other applications in chemoinformatics and computer-aided drug discovery. Therefore, a large number of well-defined computational fragmentation schemes has been devised including hierarchical fragmentation of molecules for the analysis of core structures in drugs or retrosynthetic fragmentation of compounds for de novo ligand design. Furthermore, the generation of dictionaries of structural key-type descriptors that are important tools in pharmaceutical research involves knowledge-based fragment design. Currently more than 5 000 standard descriptors are available for the representation of molecular structures, and therefore the selection of suitable combinations of descriptors for specific chemoinformatic applications is a crucial task. This thesis departs from well-defined substructure design approaches. Randomly generated fragment populations are generated and mined for substructures associated with different compound classes. A novel method termed MolBlaster is introduced for the evaluation of molecular similarity relationships on the basis of randomly generated fragment populations. Fragment profiles of molecules are generated by random deletion of bonds in connectivity tables and quantitatively compared using entropy-based metrics. In test calculations, MolBlaster accurately reproduced a structural key-based similarity ranking of druglike molecules. To adapt the generation and comparison of random fragment populations for largescale compound screening, different fragmentation schemes are compared and a novel entropic similarity metric termed PSE is introduced for compound ranking. The approach is extensively tested on different compound activity classes with varying degrees of intra-class structural diversity and produces promising results in these calculations, comparable to similarity searching using state-of-the-art fingerprints. These results demonstrate the potential of randomly generated fragments for the detection of structure-activity relationships. Furthermore, a methodology to analyze random fragment populations at the molecular level of detail is introduced. It determines conditional probability relationships between fragments. Random fragment profiles are generated for an arbitrary set of molecules, and a frequency vector is assigned for each observed fragment. An algorithm is designed to compare frequency vectors and derive dependencies of fragment occurrence. Using calculated dependency values, random fragment populations can be organized in graphs that capture their relationships and make it possible to map fragment pathways of biologically active molecules. For sets of molecules having similar activity, unique fragment signatures, so-called Activity Class Characteristic Substructures (ACCS), are identified. Random fragment profiles are found to contain compound class-specific information and activity-specific fragment hierarchies. In virtual screening trials, short ACCS fingerprints perform well on many compound classes when compared to more complex state-of-the-art 2D fingerprints. In order to elucidate potential reasons for the high predictive utility of ACCS a thorough systematic analysis of their distribution in active and database compounds have been carried out. This reveals that the discriminatory power of ACCS results from the rare occurrence of individual and combinations of ACCS in screening databases. Furthermore, it is shown that ACCS sets isolated from random populations are typically found to form coherent molecular cores in active compounds. Characteristic core regions are already formed by small numbers of substructures and remain stable when more fragments are added. Thus, classspecific random fragment hierarchies encode meaningful structural information, providing a structural rationale for the signature character of activity-specific fragment hierarchies. It follows that compound-class-directed structural descriptors that do not depend on the application of predefined fragmentation or design schemes can be isolated from random fragment populations

    Computational Methods for the Integration of Biological Activity and Chemical Space

    Get PDF
    One general aim of medicinal chemistry is the understanding of structure-activity relationships of ligands that bind to biological targets. Advances in combinatorial chemistry and biological screening technologies allow the analysis of ligand-target relationships on a large-scale. However, in order to extract useful information from biological activity data, computational methods are needed that link activity of ligands to their chemical structure. In this thesis, it is investigated how fragment-type descriptors of molecular structure can be used in order to create a link between activity and chemical ligand space. First, an activity class-dependent hierarchical fragmentation scheme is introduced that generates fragmentation pathways that are aligned using established methodologies for multiple alignment of biological sequences. These alignments are then used to extract consensus fragment sequences that serve as a structural signature for individual biological activity classes. It is also investigated how defined, chemically intuitive molecular fragments can be organized based on their topological environment and co-occurrence in compounds active against closely related targets. Therefore, the Topological Fragment Index is introduced that quantifies the topological environment complexity of a fragment in a given molecule, and thus goes beyond fragment frequency analysis. Fragment dependencies have been established on the basis of common topological environments, which facilitates the identification of activity class-characteristic fragment dependency pathways that describe fragment relationships beyond structural resemblance. Because fragments are often dependent on each other in an activity class-specific manner, the importance of defined fragment combinations for similarity searching is further assessed. Therefore, Feature Co-occurrence Networks are introduced that allow the identification of feature cliques characteristic of individual activity classes. Three differently designed molecular fingerprints are compared for their ability to provide such cliques and a clique-based similarity searching strategy is established. For molecule- and activity class-centric fingerprint designs, feature combinations are shown to improve similarity search performance in comparison to standard methods. Moreover, it is demonstrated that individual features can form activity-class specific combinations. Extending the analysis of feature cliques characteristic of individual activity classes, the distribution of defined fragment combinations among several compound classes acting against closely related targets is assessed. Fragment Formal Concept Analysis is introduced for flexible mining of complex structure-activity relationships. It allows the interactive assembly of fragment queries that yield fragment combinations characteristic of defined activity and potency profiles. It is shown that pairs and triplets, rather than individual fragments distinguish between different activity profiles. A classifier is built based on these fragment signatures that distinguishes between ligands of closely related targets. Going beyond activity profiles, compound selectivity is also analyzed. Therefore, Molecular Formal Concept Analysis is introduced for the systematic mining of compound selectivity profiles on a whole-molecule basis. Using this approach, structurally diverse compounds are identified that share a selectivity profile with selected template compounds. Structure-selectivity relationships of obtained compound sets are further analyzed

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis
    corecore