1,836 research outputs found

    Computational Methods for the Integration of Biological Activity and Chemical Space

    Get PDF
    One general aim of medicinal chemistry is the understanding of structure-activity relationships of ligands that bind to biological targets. Advances in combinatorial chemistry and biological screening technologies allow the analysis of ligand-target relationships on a large-scale. However, in order to extract useful information from biological activity data, computational methods are needed that link activity of ligands to their chemical structure. In this thesis, it is investigated how fragment-type descriptors of molecular structure can be used in order to create a link between activity and chemical ligand space. First, an activity class-dependent hierarchical fragmentation scheme is introduced that generates fragmentation pathways that are aligned using established methodologies for multiple alignment of biological sequences. These alignments are then used to extract consensus fragment sequences that serve as a structural signature for individual biological activity classes. It is also investigated how defined, chemically intuitive molecular fragments can be organized based on their topological environment and co-occurrence in compounds active against closely related targets. Therefore, the Topological Fragment Index is introduced that quantifies the topological environment complexity of a fragment in a given molecule, and thus goes beyond fragment frequency analysis. Fragment dependencies have been established on the basis of common topological environments, which facilitates the identification of activity class-characteristic fragment dependency pathways that describe fragment relationships beyond structural resemblance. Because fragments are often dependent on each other in an activity class-specific manner, the importance of defined fragment combinations for similarity searching is further assessed. Therefore, Feature Co-occurrence Networks are introduced that allow the identification of feature cliques characteristic of individual activity classes. Three differently designed molecular fingerprints are compared for their ability to provide such cliques and a clique-based similarity searching strategy is established. For molecule- and activity class-centric fingerprint designs, feature combinations are shown to improve similarity search performance in comparison to standard methods. Moreover, it is demonstrated that individual features can form activity-class specific combinations. Extending the analysis of feature cliques characteristic of individual activity classes, the distribution of defined fragment combinations among several compound classes acting against closely related targets is assessed. Fragment Formal Concept Analysis is introduced for flexible mining of complex structure-activity relationships. It allows the interactive assembly of fragment queries that yield fragment combinations characteristic of defined activity and potency profiles. It is shown that pairs and triplets, rather than individual fragments distinguish between different activity profiles. A classifier is built based on these fragment signatures that distinguishes between ligands of closely related targets. Going beyond activity profiles, compound selectivity is also analyzed. Therefore, Molecular Formal Concept Analysis is introduced for the systematic mining of compound selectivity profiles on a whole-molecule basis. Using this approach, structurally diverse compounds are identified that share a selectivity profile with selected template compounds. Structure-selectivity relationships of obtained compound sets are further analyzed

    Application and Development of Computational Methods for Ligand-Based Virtual Screening

    Get PDF
    The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified

    Computational Analysis of Structure-Activity Relationships : From Prediction to Visualization Methods

    Get PDF
    Understanding how structural modifications affect the biological activity of small molecules is one of the central themes in medicinal chemistry. By no means is structure-activity relationship (SAR) analysis a priori dependent on computational methods. However, as molecular data sets grow in size, we quickly approach our limits to access and compare structures and associated biological properties so that computational data processing and analysis often become essential. Here, different types of approaches of varying complexity for the analysis of SAR information are presented, which can be applied in the context of screening and chemical optimization projects. The first part of this thesis is dedicated to machine-learning strategies that aim at de novo ligand prediction and the preferential detection of potent hits in virtual screening. High emphasis is put on benchmarking of different strategies and a thorough evaluation of their utility in practical applications. However, an often claimed disadvantage of these prediction methods is their "black box" character because they do not necessarily reveal which structural features are associated with biological activity. Therefore, these methods are complemented by more descriptive SAR analysis approaches showing a higher degree of interpretability. Concepts from information theory are adapted to identify activity-relevant structure-derived descriptors. Furthermore, compound data mining methods exploring prespecified properties of available bioactive compounds on a large scale are designed to systematically relate molecular transformations to activity changes. Finally, these approaches are complemented by graphical methods that primarily help to access and visualize SAR data in congeneric series of compounds and allow the formulation of intuitive SAR rules applicable to the design of new compounds. The compendium of SAR analysis tools introduced in this thesis investigates SARs from different perspectives

    Computational Methods Generating High-Resolution Views of Complex Structure-Activity Relationships

    Get PDF
    The analysis of structure-activity relationships (SARs) of small bioactive compounds is a central task in medicinal chemistry and pharmaceutical research. The study of SARs is in principle not limited to computational methods, however, as data sets rapidly grow in size, advanced computational approaches become indispensable for SAR analysis. Activity landscapes are one of the preferred and widely used computational models to study large-scale SARs. Activity cliffs are cardinal features of activity landscape representations and are thought to contain high SAR information content. This work addresses major challenges in systematic SAR exploration and specifically focuses on the design of novel activity landscape models and comprehensive activity cliff analysis. In the first part of the thesis, two conceptually different activity landscape representations are introduced for compounds active against multiple targets. These models are designed to provide an intuitive graphical access to compounds forming single and multi-target activity cliffs and displaying multi-target SAR characteristics. Further, a systematic analysis of the frequency and distribution of activity cliffs is carried out. In addition, a large-scale data mining effort is designed to quantify and analyze fingerprint-dependent changes in SAR information. The second part of this work is dedicated to the concept of activity cliffs and their utility in the practice of medicinal chemistry. Therefore, a computational approach is introduced to search for detectable SAR advantages associated with activity cliffs. In addition, the question is investigated to what extent activity cliffs might be utilized as starting points in practical compound optimization efforts. Finally, all activity cliff configurations formed by currently available bioactive compounds are thoroughly examined. These configurations are further classified and their frequency of occurrence and target distribution are determined. Furthermore, the activity cliff concept is extended to explore the relation between chemical structures and compound promiscuity. The notion of promiscuity cliffs is introduced to deduce structural modifications that might induce large-magnitude promiscuity effects

    Methods for the Analysis of Matched Molecular Pairs and Chemical Space Representations

    Get PDF
    Compound optimization is a complex process where different properties are optimized to increase the biological activity and therapeutic effects of a molecule. Frequently, the structure of molecules is modified in order to improve their property values. Therefore, computational analysis of the effects of structure modifications on property values is of great importance for the drug discovery process. It is also essential to analyze chemical space, i.e., the set of all chemically feasible molecules, in order to find subsets of molecules that display favorable property values. This thesis aims to expand the computational repertoire to analyze the effect of structure alterations and visualize chemical space. Matched molecular pairs are defined as pairs of compounds that share a large common substructure and only differ by a small chemical transformation. They have been frequently used to study property changes caused by structure modifications. These analyses are expanded in this thesis by studying the effect of chemical transformations on the ionization state and ligand efficiency, both measures of great importance in drug design. Additionally, novel matched molecular pairs based on retrosynthetic rules are developed to increase their utility for prospective use of chemical transformations in compound optimization. Further, new methods based on matched molecular pairs are described to obtain preliminary SAR information of screening hit compounds and predict the potency change caused by a chemical transformation. Visualizations of chemical space are introduced to aid compound optimization efforts. First, principal component plots are used to rationalize a matched molecular pair based multi-objective compound optimization procedure. Then, star coordinate and parallel coordinate plots are introduced to analyze drug-like subspaces, where compounds with favorable property values can be found. Finally, a novel network-based visualization of high-dimensional property space is developed. Concluding, the applications developed in this thesis expand the methodological spectrum of computer-aided compound optimization

    Multi-faceted Structure-Activity Relationship Analysis Using Graphical Representations

    Get PDF
    A core focus in medicinal chemistry is the interpretation of structure-activity relationships (SARs) of small molecules. SAR analysis is typically carried out on a case-by-case basis for compound sets that share activity against a given target. Although SAR investigations are not a priori dependent on computational approaches, limitations imposed by steady rise in activity information have necessitated the use of such methodologies. Moreover, understanding SARs in multi-target space is extremely difficult. Conceptually different computational approaches are reported in this thesis for graphical SAR analysis in single- as well as multi-target space. Activity landscape models are often used to describe the underlying SAR characteristics of compound sets. Theoretical activity landscapes that are reminiscent of topological maps intuitively represent distributions of pair-wise similarity and potency difference information as three-dimensional surfaces. These models provide easy access to identification of various SAR features. Therefore, such landscapes for actual data sets are generated and compared with graph-based representations. Existing graphical data structures are adapted to include mechanism of action information for receptor ligands to facilitate simultaneous SAR and mechanism-related analyses with the objective of identifying structural modifications responsible for switching molecular mechanisms of action. Typically, SAR analysis focuses on systematic pair-wise relationships of compound similarity and potency differences. Therefore, an approach is reported to calculate SAR feature probabilities on the basis of these pair-wise relationships for individual compounds in a ligand set. The consequent expansion of feature categories improves the analysis of local SAR environments. Graphical representations are designed to avoid a dependence on preconceived SAR models. Such representations are suitable for systematic large-scale SAR exploration. Methods for the navigation of SARs in multi-target space using simple and interpretable data structures are introduced. In summary, multi-faceted SAR analysis aided by computational means forms the primary objective of this dissertation

    Analysis of Biological Screening Data and Molecular Selectivity Profiles Using Fingerprints and Mapping Algorithms

    Get PDF
    The identification of promising drug candidates is a major milestone in the early stages of drug discovery and design. Among the properties that have to be optimized before a drug candidate is admitted to clinical testing, potency and target selectivity are of great interest and can be addressed very early. Unfortunately, optimization–relevant knowledge is often limited, and the analysis of noisy and heterogeneous biological screening data with standard methods like QSAR is hardly feasible. Furthermore, the identification of compounds displaying different selectivity patterns against related targets is a prerequisite for chemical genetics and genomics applications, allowing to specifically interfere with functions of individual members of protein families. In this thesis it is shown that computational methods based on molecular similarity are suitable tools for the analysis of compound potency and target selectivity. Originally developed to facilitate the efficient discovery of active compounds by means of virtual screening of compound libraries, these ligand–based approaches assume that similar molecules are likely to exhibit similar properties and biological activities based on the similarity property principle. Given their holistic approach to molecular similarity analysis, ligand–based virtual screening methods can be applied when little or no structure– activity information is available and do not require the knowledge of the target structure. The methods under investigation cover a wide methodological spectrum and only rely on properties derived from one– and two–dimensional molecular representations, which renders them particularly useful for handling large compound libraries. Using biological screening data, these virtual screening methods are shown to be able to extrapolate from experimental data and preferentially detect potent compounds. Subsequently, extensive benchmark calculations prove that existing 2D molecular fingerprints and dynamic mapping algorithms are suitable tools for the distinction between compounds with differential selectivity profiles. Finally, an advanced dynamic mapping algorithm is introduced that is able to generate target–selective chemical reference spaces by adaptively identifying most–discriminative molecular properties from a set of active compounds. These reference spaces are shown to be of great value for the generation of predictive target–selectivity models by screening a biologically annotated compound library. </p

    Systematic Computational Analysis of Structure-Activity Relationships

    Get PDF
    The exploration of structure–activity relationships (SARs) of small bioactive molecules is a central task in medicinal chemistry. Typically, SARs are analyzed on a case-by-case basis for series of closely related molecules. Classical methods that explore SARs include quantitative SAR (QSAR) modeling and molecular similarity analysis. These methods conceptually rely on the similarity–property principle which states that similar molecules should also have similar biological activity. Although this principle is intuitive and supported by a wealth of observations, it is well-recognized that SARs can have fundamentally different character. Small chemical modifications of active molecules often dramatically alter biological responses, giving rise to “activity cliffs” and “discontinuous” SARs. By contrast, structurally diverse molecules can have similar activity, a situation that is indicative of “continuous” SARs. The combination of continuous and discontinuous components characterizes “heterogeneous” SARs, a phenotype that is frequently encountered in medicinal chemistry. This thesis focuses on the systematic computational analysis of SARs present in sets of active molecules. Approaches to quantitatively describe, classify, and compare SARs at multiple levels of detail are introduced. Initially, a comparative study of crystallographic enzyme–inhibitor complexes is presented that relates two-dimensional and three-dimensional inhibitor similarity and potency to each other. The analysis reveals the presence of systematic and in part unexpected relationships between molecular similarity and potency and explains why apparently inconsistent SARs can coexist in compound activity classes. For the systematic characterization of complex SARs, a numerical function termed SAR Index (SARI) is developed that quantitatively describes continuous and discontinuous SAR components present in sets of active molecules. On the basis of two-dimensional molecular similarity and potency, SARI distinguishes between the three basic SAR categories described above. Heterogeneous SARs are further divided into two previously unobserved subtypes that are distinguished by the way they combine different SAR features. SARI profiling of various enzyme inhibitor classes demonstrates the prevalence of heterogeneous SARs for many classes. Furthermore, control calculations are conducted in order to assess the influence of molecular representation and data set size on SARI scoring. It is shown that SARI scores remain largely stable in response to variation of these critical parameters. Based on the SARI formalism, a methodology is developed to study multiple global and local SAR components of compound activity classes. The approach combines graphical analysis of Network-like Similarity Graphs (NSGs) and SARI score calculations at multiple levels of detail. Compound classes of different global SAR character are found to produce distinct network topologies. Local SAR features are studied in subsets of similar compounds and systematically related to global SAR character. Furthermore, key compounds are identified that are major determinants of local and global SAR characteristics. The approach is also applied to study structure–selectivity relationships (SSRs). Compound selectivity often results from potency differences for multiple targets and presents a critical factor in lead optimization projects. Here, SSRs are explored for sets of compounds that are active against pairs of related targets. For this purpose, the molecular network approach is adapted to the evaluation of SSRs. Results show that SSRs can be quantitatively described and categorized in analogy to single-target SARs. In addition, local SSR environments are identified and compared to SAR features. Within these environments, key compounds are identified that determine characteristic features of single-target SARs and dual-target SSRs. Comparison of similar compounds that have significantly different selectivity reveals chemical modifications that render compounds target-selective. Furthermore, a methodology is introduced to study SAR contributions from functional groups and substitution sites in series of analogous molecules. Analog series are systematically organized according to substitution sites in a hierarchical data structure termed Combinatorial Analog Graph (CAG), and the SARI scoring scheme is applied to evaluate SAR contributions of variable functional groups at specific substitution sites. Combinations of sites that determine SARs within analog series and make large contributions to SAR discontinuity are identified. These sites are prime targets for further chemical modification. In addition to determining key substitution patterns, CAG analysis also identifies substitution sites that have not been thoroughly explored
    • 

    corecore