62 research outputs found

    Computational Methods Generating High-Resolution Views of Complex Structure-Activity Relationships

    Get PDF
    The analysis of structure-activity relationships (SARs) of small bioactive compounds is a central task in medicinal chemistry and pharmaceutical research. The study of SARs is in principle not limited to computational methods, however, as data sets rapidly grow in size, advanced computational approaches become indispensable for SAR analysis. Activity landscapes are one of the preferred and widely used computational models to study large-scale SARs. Activity cliffs are cardinal features of activity landscape representations and are thought to contain high SAR information content. This work addresses major challenges in systematic SAR exploration and specifically focuses on the design of novel activity landscape models and comprehensive activity cliff analysis. In the first part of the thesis, two conceptually different activity landscape representations are introduced for compounds active against multiple targets. These models are designed to provide an intuitive graphical access to compounds forming single and multi-target activity cliffs and displaying multi-target SAR characteristics. Further, a systematic analysis of the frequency and distribution of activity cliffs is carried out. In addition, a large-scale data mining effort is designed to quantify and analyze fingerprint-dependent changes in SAR information. The second part of this work is dedicated to the concept of activity cliffs and their utility in the practice of medicinal chemistry. Therefore, a computational approach is introduced to search for detectable SAR advantages associated with activity cliffs. In addition, the question is investigated to what extent activity cliffs might be utilized as starting points in practical compound optimization efforts. Finally, all activity cliff configurations formed by currently available bioactive compounds are thoroughly examined. These configurations are further classified and their frequency of occurrence and target distribution are determined. Furthermore, the activity cliff concept is extended to explore the relation between chemical structures and compound promiscuity. The notion of promiscuity cliffs is introduced to deduce structural modifications that might induce large-magnitude promiscuity effects

    Multi-faceted Structure-Activity Relationship Analysis Using Graphical Representations

    Get PDF
    A core focus in medicinal chemistry is the interpretation of structure-activity relationships (SARs) of small molecules. SAR analysis is typically carried out on a case-by-case basis for compound sets that share activity against a given target. Although SAR investigations are not a priori dependent on computational approaches, limitations imposed by steady rise in activity information have necessitated the use of such methodologies. Moreover, understanding SARs in multi-target space is extremely difficult. Conceptually different computational approaches are reported in this thesis for graphical SAR analysis in single- as well as multi-target space. Activity landscape models are often used to describe the underlying SAR characteristics of compound sets. Theoretical activity landscapes that are reminiscent of topological maps intuitively represent distributions of pair-wise similarity and potency difference information as three-dimensional surfaces. These models provide easy access to identification of various SAR features. Therefore, such landscapes for actual data sets are generated and compared with graph-based representations. Existing graphical data structures are adapted to include mechanism of action information for receptor ligands to facilitate simultaneous SAR and mechanism-related analyses with the objective of identifying structural modifications responsible for switching molecular mechanisms of action. Typically, SAR analysis focuses on systematic pair-wise relationships of compound similarity and potency differences. Therefore, an approach is reported to calculate SAR feature probabilities on the basis of these pair-wise relationships for individual compounds in a ligand set. The consequent expansion of feature categories improves the analysis of local SAR environments. Graphical representations are designed to avoid a dependence on preconceived SAR models. Such representations are suitable for systematic large-scale SAR exploration. Methods for the navigation of SARs in multi-target space using simple and interpretable data structures are introduced. In summary, multi-faceted SAR analysis aided by computational means forms the primary objective of this dissertation

    Roughness of molecular property landscapes and its impact on modellability

    Full text link
    In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular property landscapes is one of their most studied geometric attributes, as it can characterize the presence of activity cliffs, with rougher landscapes generally expected to pose tougher optimization challenges. Here, we introduce a general, quantitative measure for describing the roughness of molecular property landscapes. The proposed roughness index (ROGI) is loosely inspired by the concept of fractal dimension and strongly correlates with the out-of-sample error achieved by machine learning models on numerous regression tasks.Comment: 17 pages, 6 figures, 2 tables (SI with 17 pages, 16 figures

    Methods for the Analysis of Matched Molecular Pairs and Chemical Space Representations

    Get PDF
    Compound optimization is a complex process where different properties are optimized to increase the biological activity and therapeutic effects of a molecule. Frequently, the structure of molecules is modified in order to improve their property values. Therefore, computational analysis of the effects of structure modifications on property values is of great importance for the drug discovery process. It is also essential to analyze chemical space, i.e., the set of all chemically feasible molecules, in order to find subsets of molecules that display favorable property values. This thesis aims to expand the computational repertoire to analyze the effect of structure alterations and visualize chemical space. Matched molecular pairs are defined as pairs of compounds that share a large common substructure and only differ by a small chemical transformation. They have been frequently used to study property changes caused by structure modifications. These analyses are expanded in this thesis by studying the effect of chemical transformations on the ionization state and ligand efficiency, both measures of great importance in drug design. Additionally, novel matched molecular pairs based on retrosynthetic rules are developed to increase their utility for prospective use of chemical transformations in compound optimization. Further, new methods based on matched molecular pairs are described to obtain preliminary SAR information of screening hit compounds and predict the potency change caused by a chemical transformation. Visualizations of chemical space are introduced to aid compound optimization efforts. First, principal component plots are used to rationalize a matched molecular pair based multi-objective compound optimization procedure. Then, star coordinate and parallel coordinate plots are introduced to analyze drug-like subspaces, where compounds with favorable property values can be found. Finally, a novel network-based visualization of high-dimensional property space is developed. Concluding, the applications developed in this thesis expand the methodological spectrum of computer-aided compound optimization

    Computational Methods for Structure-Activity Relationship Analysis and Activity Prediction

    Get PDF
    Structure-activity relationship (SAR) analysis of small bioactive compounds is a key task in medicinal chemistry. Traditionally, SARs were established on a case-by-case basis. However, with the arrival of high-throughput screening (HTS) and synthesis techniques, a surge in the size and structural heterogeneity of compound data is seen and the use of computational methods to analyse SARs has become imperative and valuable. In recent years, graphical methods have gained prominence for analysing SARs. The choice of molecular representation and the method of assessing similarities affects the outcome of the SAR analysis. Thus, alternative methods providing distinct points of view of SARs are required. In this thesis, a novel graphical representation utilizing the canonical scaffold-skeleton definition to explore meaningful global and local SAR patterns in compound data is introduced. Furthermore, efforts have been made to go beyond descriptive SAR analysis offered by the graphical methods. SAR features inferred from descriptive methods are utilized for compound activity predictions. In this context, a data structure called SAR matrix (SARM), which is reminiscent of conventional R-group tables, is utilized. SARMs suggest many virtual compounds that represent as of yet unexplored chemical space. These virtual compounds are candidates for further exploration but are too many to prioritize simply on the basis of visual inspection. Conceptually different approaches to enable systematic compound prediction and prioritization are introduced. Much emphasis is put on evolving the predictive ability for prospective compound design. Going beyond SAR analysis, the SARM method has also been adapted to navigate multi-target spaces primarily for analysing compound promiscuity patterns. Thus, the original SARM methodology has been further developed for a variety of medicinal chemistry and chemogenomics applications

    Development and Interpretation of Machine Learning Models for Drug Discovery

    Get PDF
    In drug discovery, domain experts from different fields such as medicinal chemistry, biology, and computer science often collaborate to develop novel pharmaceutical agents. Computational models developed in this process must be correct and reliable, but at the same time interpretable. Their findings have to be accessible by experts from other fields than computer science to validate and improve them with domain knowledge. Only if this is the case, the interdisciplinary teams are able to communicate their scientific results both precisely and intuitively. This work is concerned with the development and interpretation of machine learning models for drug discovery. To this end, it describes the design and application of computational models for specialized use cases, such as compound profiling and hit expansion. Novel insights into machine learning for ligand-based virtual screening are presented, and limitations in the modeling of compound potency values are highlighted. It is shown that compound activity can be predicted based on high-dimensional target profiles, without the presence of molecular structures. Moreover, support vector regression for potency prediction is carefully analyzed, and a systematic misprediction of highly potent ligands is discovered. Furthermore, a key aspect is the interpretation and chemically accessible representation of the models. Therefore, this thesis focuses especially on methods to better understand and communicate modeling results. To this end, two interactive visualizations for the assessment of naive Bayes and support vector machine models on molecular fingerprints are presented. These visual representations of virtual screening models are designed to provide an intuitive chemical interpretation of the results

    Systematic Identification of Scaffolds Representing Different Types of Structure-Activity Relationships

    Get PDF
    In medicinal chemistry, it is of central importance to understand structure-activity relationships (SARs) of small bioactive compounds. Typically, SARs are analyzed on a case-by-case basis for sets of compounds active against a given target. However, the increasing amount of compound activity data that is becoming available allows SARs to be explored on a large-scale. Moreover, molecular scaffolds derived from bioactive compounds are also of high interest for SAR analysis. In general, scaffolds are obtained by removing all substituents from rings and from linkers between rings. This thesis aims at systematically mining compounds for which activity annotations are available and investigating relationships between chemical structure and biological activities at the level of active compounds, in particular, molecular scaffolds. Therefore, data mining approaches are designed to identify scaffolds with different structural and/or activity characteristics. Initially, scaffold distributions in compounds at different stages of pharmaceutical development are analyzed. Sets of scaffolds that overlap between different stages or preferentially occur at certain stages are identified. Furthermore, a systematic selectivity profile analysis of public domain active compounds is carried out. Scaffolds that yield compounds selective for communities of closely related targets and represent compounds selective only for one particular target over others are identified. In addition, the degree of promiscuity of scaffolds is thoroughly examined. Eighty-three scaffolds covering 33 chemotypes correspond to compounds active against at least three different target families and thus are considered to be promiscuous. Moreover, by integrating pairwise scaffold similarity and compound potency differences, the propensity of scaffolds to form multi-target activity or selectivity cliffs and, in addition, the global scaffold potential of individual targets are quantitatively assessed, respectively. Finally, structural relationships between scaffolds are systematically explored. Most scaffolds extracted from active compounds are found to be involved in substructure relationships and/or share topological features with others. These substructure relationships are also compared to, and combined with, hierarchical substructure relationships to facilitate activity prediction
    • …
    corecore