195 research outputs found

    Systematic Identification of Scaffolds Representing Different Types of Structure-Activity Relationships

    Get PDF
    In medicinal chemistry, it is of central importance to understand structure-activity relationships (SARs) of small bioactive compounds. Typically, SARs are analyzed on a case-by-case basis for sets of compounds active against a given target. However, the increasing amount of compound activity data that is becoming available allows SARs to be explored on a large-scale. Moreover, molecular scaffolds derived from bioactive compounds are also of high interest for SAR analysis. In general, scaffolds are obtained by removing all substituents from rings and from linkers between rings. This thesis aims at systematically mining compounds for which activity annotations are available and investigating relationships between chemical structure and biological activities at the level of active compounds, in particular, molecular scaffolds. Therefore, data mining approaches are designed to identify scaffolds with different structural and/or activity characteristics. Initially, scaffold distributions in compounds at different stages of pharmaceutical development are analyzed. Sets of scaffolds that overlap between different stages or preferentially occur at certain stages are identified. Furthermore, a systematic selectivity profile analysis of public domain active compounds is carried out. Scaffolds that yield compounds selective for communities of closely related targets and represent compounds selective only for one particular target over others are identified. In addition, the degree of promiscuity of scaffolds is thoroughly examined. Eighty-three scaffolds covering 33 chemotypes correspond to compounds active against at least three different target families and thus are considered to be promiscuous. Moreover, by integrating pairwise scaffold similarity and compound potency differences, the propensity of scaffolds to form multi-target activity or selectivity cliffs and, in addition, the global scaffold potential of individual targets are quantitatively assessed, respectively. Finally, structural relationships between scaffolds are systematically explored. Most scaffolds extracted from active compounds are found to be involved in substructure relationships and/or share topological features with others. These substructure relationships are also compared to, and combined with, hierarchical substructure relationships to facilitate activity prediction

    Computational Analysis of Structure-Activity Relationships : From Prediction to Visualization Methods

    Get PDF
    Understanding how structural modifications affect the biological activity of small molecules is one of the central themes in medicinal chemistry. By no means is structure-activity relationship (SAR) analysis a priori dependent on computational methods. However, as molecular data sets grow in size, we quickly approach our limits to access and compare structures and associated biological properties so that computational data processing and analysis often become essential. Here, different types of approaches of varying complexity for the analysis of SAR information are presented, which can be applied in the context of screening and chemical optimization projects. The first part of this thesis is dedicated to machine-learning strategies that aim at de novo ligand prediction and the preferential detection of potent hits in virtual screening. High emphasis is put on benchmarking of different strategies and a thorough evaluation of their utility in practical applications. However, an often claimed disadvantage of these prediction methods is their "black box" character because they do not necessarily reveal which structural features are associated with biological activity. Therefore, these methods are complemented by more descriptive SAR analysis approaches showing a higher degree of interpretability. Concepts from information theory are adapted to identify activity-relevant structure-derived descriptors. Furthermore, compound data mining methods exploring prespecified properties of available bioactive compounds on a large scale are designed to systematically relate molecular transformations to activity changes. Finally, these approaches are complemented by graphical methods that primarily help to access and visualize SAR data in congeneric series of compounds and allow the formulation of intuitive SAR rules applicable to the design of new compounds. The compendium of SAR analysis tools introduced in this thesis investigates SARs from different perspectives

    Computational Methods for Structure-Activity Relationship Analysis and Activity Prediction

    Get PDF
    Structure-activity relationship (SAR) analysis of small bioactive compounds is a key task in medicinal chemistry. Traditionally, SARs were established on a case-by-case basis. However, with the arrival of high-throughput screening (HTS) and synthesis techniques, a surge in the size and structural heterogeneity of compound data is seen and the use of computational methods to analyse SARs has become imperative and valuable. In recent years, graphical methods have gained prominence for analysing SARs. The choice of molecular representation and the method of assessing similarities affects the outcome of the SAR analysis. Thus, alternative methods providing distinct points of view of SARs are required. In this thesis, a novel graphical representation utilizing the canonical scaffold-skeleton definition to explore meaningful global and local SAR patterns in compound data is introduced. Furthermore, efforts have been made to go beyond descriptive SAR analysis offered by the graphical methods. SAR features inferred from descriptive methods are utilized for compound activity predictions. In this context, a data structure called SAR matrix (SARM), which is reminiscent of conventional R-group tables, is utilized. SARMs suggest many virtual compounds that represent as of yet unexplored chemical space. These virtual compounds are candidates for further exploration but are too many to prioritize simply on the basis of visual inspection. Conceptually different approaches to enable systematic compound prediction and prioritization are introduced. Much emphasis is put on evolving the predictive ability for prospective compound design. Going beyond SAR analysis, the SARM method has also been adapted to navigate multi-target spaces primarily for analysing compound promiscuity patterns. Thus, the original SARM methodology has been further developed for a variety of medicinal chemistry and chemogenomics applications

    Computational Methods Generating High-Resolution Views of Complex Structure-Activity Relationships

    Get PDF
    The analysis of structure-activity relationships (SARs) of small bioactive compounds is a central task in medicinal chemistry and pharmaceutical research. The study of SARs is in principle not limited to computational methods, however, as data sets rapidly grow in size, advanced computational approaches become indispensable for SAR analysis. Activity landscapes are one of the preferred and widely used computational models to study large-scale SARs. Activity cliffs are cardinal features of activity landscape representations and are thought to contain high SAR information content. This work addresses major challenges in systematic SAR exploration and specifically focuses on the design of novel activity landscape models and comprehensive activity cliff analysis. In the first part of the thesis, two conceptually different activity landscape representations are introduced for compounds active against multiple targets. These models are designed to provide an intuitive graphical access to compounds forming single and multi-target activity cliffs and displaying multi-target SAR characteristics. Further, a systematic analysis of the frequency and distribution of activity cliffs is carried out. In addition, a large-scale data mining effort is designed to quantify and analyze fingerprint-dependent changes in SAR information. The second part of this work is dedicated to the concept of activity cliffs and their utility in the practice of medicinal chemistry. Therefore, a computational approach is introduced to search for detectable SAR advantages associated with activity cliffs. In addition, the question is investigated to what extent activity cliffs might be utilized as starting points in practical compound optimization efforts. Finally, all activity cliff configurations formed by currently available bioactive compounds are thoroughly examined. These configurations are further classified and their frequency of occurrence and target distribution are determined. Furthermore, the activity cliff concept is extended to explore the relation between chemical structures and compound promiscuity. The notion of promiscuity cliffs is introduced to deduce structural modifications that might induce large-magnitude promiscuity effects

    MI-NODES multiscale models of metabolic reactions, brain connectome, ecological, epidemic, world trade, and legal-social networks

    Get PDF
    [Abstract] Complex systems and networks appear in almost all areas of reality. We find then from proteins residue networks to Protein Interaction Networks (PINs). Chemical reactions form Metabolic Reactions Networks (MRNs) in living beings or Atmospheric reaction networks in planets and moons. Network of neurons appear in the worm C. elegans, in Human brain connectome, or in Artificial Neural Networks (ANNs). Infection spreading networks exist for contagious outbreaks networks in humans and in malware epidemiology for infection with viral software in internet or wireless networks. Social-legal networks with different rules evolved from swarm intelligence, to hunter-gathered societies, or citation networks of U.S. Supreme Court. In all these cases, we can see the same question. Can we predict the links based on structural information? We propose to solve the problem using Quantitative Structure-Property Relationship (QSPR) techniques commonly used in chemo-informatics. In so doing, we need software able to transform all types of networks/graphs like drug structure, drug-target interactions, protein structure, protein interactions, metabolic reactions, brain connectome, or social networks into numerical parameters. Consequently, we need to process in alignment-free mode multitarget, multiscale, and multiplexing, information. Later, we have to seek the QSPR model with Machine Learning techniques. MI-NODES is this type of software. Here we review the evolution of the software from chemoinformatics to bioinformatics and systems biology. This is an effort to develop a universal tool to study structure-property relationships in complex systems

    Mapping networks of anti-HIV drug cocktails vs. AIDS epidemiology in the US counties

    Get PDF
    [Abstract] The implementation of the highly active antiretroviral therapy (HAART) and the combination of anti-HIV drugs have resulted in longer survival and a better quality of life for the people infected with the virus. In this work, a method is proposed to map complex networks of AIDS prevalence in the US counties, incorporating information about the chemical structure, molecular target, organism, and results in preclinical protocols of assay for all drugs in the cocktail. Different machine learning methods were trained and validated to select the best model. The Shannon information invariants of molecular graphs for drugs, and social networks of income inequality were used as input. The nodes in molecular graphs represent atoms weighed by Pauling electronegativity values, and the links correspond to the chemical bonds. On the other hand, the nodes in the social network represent the US counties and have Gini coefficients as weights. We obtained the data about anti-HIV drugs from the ChEMBL database and the data about AIDS prevalence and Gini coefficient from the AIDSVu database of Emory University. Box–Jenkins operators were used to measure the shift with respect to average behavior of drugs from reference compounds assayed with/in a given protocol, target, or organism. To train/validate the model and predict the complex network, we needed to analyze 152,628 data points including values of AIDS prevalence in 2310 counties in the US vs. ChEMBL results for 21,582 unique drugs, 9 viral or human protein targets, 4856 protocols, and 10 possible experimental measures. The best model found was a linear discriminant analysis (LDA) with accuracy, specificity, and sensitivity above 0.80 in training and external validation series.Ministerio de Educación, Cultura y Deportes; AGL2011-30563-C03-0

    ANN multiscale model of anti-HIV Drugs activity vs AIDS prevalence in the US at county level based on information indices of molecular graphs and social networks

    Get PDF
    [Abstract] This work is aimed at describing the workflow for a methodology that combines chemoinformatics and pharmacoepidemiology methods and at reporting the first predictive model developed with this methodology. The new model is able to predict complex networks of AIDS prevalence in the US counties, taking into consideration the social determinants and activity/structure of anti-HIV drugs in preclinical assays. We trained different Artificial Neural Networks (ANNs) using as input information indices of social networks and molecular graphs. We used a Shannon information index based on the Gini coefficient to quantify the effect of income inequality in the social network. We obtained the data on AIDS prevalence and the Gini coefficient from the AIDSVu database of Emory University. We also used the Balaban information indices to quantify changes in the chemical structure of anti-HIV drugs. We obtained the data on anti-HIV drug activity and structure (SMILE codes) from the ChEMBL database. Last, we used Box-Jenkins moving average operators to quantify information about the deviations of drugs with respect to data subsets of reference (targets, organisms, experimental parameters, protocols). The best model found was a Linear Neural Network (LNN) with values of Accuracy, Specificity, and Sensitivity above 0.76 and AUROC > 0.80 in training and external validation series. This model generates a complex network of AIDS prevalence in the US at county level with respect to the preclinical activity of anti-HIV drugs in preclinical assays. To train/validate the model and predict the complex network we needed to analyze 43,249 data points including values of AIDS prevalence in 2,310 counties in the US vs ChEMBL results for 21,582 unique drugs, 9 viral or human protein targets, 4,856 protocols, and 10 possible experimental measures.Ministerio de Educación, Cultura y Deportes; AGL2011-30563-C03-0

    Enumeration, conformation sampling and population of libraries of peptide macrocycles for the search of chemotherapeutic cardioprotection agents

    Get PDF
    Peptides are uniquely endowed with features that allow them to perturb previously difficult to drug biomolecular targets. Peptide macrocycles in particular have seen a flurry of recent interest due to their enhanced bioavailability, tunability and specificity. Although these properties make them attractive hit-candidates in early stage drug discovery, knowing which peptides to pursue is non‐trivial due to the magnitude of the peptide sequence space. Computational screening approaches show promise in their ability to address the size of this search space but suffer from their inability to accurately interrogate the conformational landscape of peptide macrocycles. We developed an in‐silico compound enumerator that was tasked with populating a conformationally laden peptide virtual library. This library was then used in the search for cardio‐protective agents (that may be administered, reducing tissue damage during reperfusion after ischemia (heart attacks)). Our enumerator successfully generated a library of 15.2 billion compounds, requiring the use of compression algorithms, conformational sampling protocols and management of aggregated compute resources in the context of a local cluster. In the absence of experimental biophysical data, we performed biased sampling during alchemical molecular dynamics simulations in order to observe cyclophilin‐D perturbation by cyclosporine A and its mitochondrial targeted analogue. Reliable intermediate state averaging through a WHAM analysis of the biased dynamic pulling simulations confirmed that the cardio‐protective activity of cyclosporine A was due to its mitochondrial targeting. Paralleltempered solution molecular dynamics in combination with efficient clustering isolated the essential dynamics of a cyclic peptide scaffold. The rapid enumeration of skeletons from these essential dynamics gave rise to a conformation laden virtual library of all the 15.2 Billion unique cyclic peptides (given the limits on peptide sequence imposed). Analysis of this library showed the exact extent of physicochemical properties covered, relative to the bare scaffold precursor. Molecular docking of a subset of the virtual library against cyclophilin‐D showed significant improvements in affinity to the target (relative to cyclosporine A). The conformation laden virtual library, accessed by our methodology, provided derivatives that were able to make many interactions per peptide with the cyclophilin‐D target. Machine learning methods showed promise in the training of Support Vector Machines for synthetic feasibility prediction for this library. The synergy between enumeration and conformational sampling greatly improves the performance of this library during virtual screening, even when only a subset is used

    Optically Micro-fabricated Linear and Freeform 3-D Extracellular Matrix Scaffolds for Tissue Engineering

    Get PDF
    This work was aimed at advancing multi-photon excited, freeform fabrication technology with nano-scale and sub-micron precision as an enabler for tissue engineers to investigate cellular response to a biomimetic, bio-active extracellular matrix. We demonstrated that sub-micron and micron scale Collagen and Fibronectin structures can be fabricated via multi-photon excited photochemistry using a modified Benzophenone dimer and Rose Bengal while maintaining the biomimetic ECM structures’ bioactivity. We confirmed that three-photon excitation produces significantly smaller features at comparable excitation wavelengths as a consideration to better approach focal adhesion size. Bioactivity of MPE cross-linked FN and Collagens I and II was established via immunofluorescence and fibroblast adhesion. Additionally, the relative rates of degradation in these cross-linked matrices are consistent with the known activities of these enzymes. Morphology measurements of fibroblasts grown on these proteins include log(Area), Perimeter, Area/Perimeter2 were considered as proxies for cell response. Fibroblast perimeters are statistically different when associated with the Collagen I microenvironment. Among fibroblasts grown on MPE structures of Collagen I, Fibronectin, BSA and the BSA Monolayer, the stress fiber distributions on Collagen I (all fiber lengths) are highly significantly different (p \u3c 1x10-4) than the distribution of stress fibers of cells on BSA Lines. This suggests contact guidance only for cells on BSA Lines but yet a combination of contact guidance and chemical signaling (RGD) with cells on Collagen I Lines. This supports additional overall orientation findings based on fibroblasts’ fitted ellipse major axis direction for Collagens I, II and Fibronectin. Stress fiber distribution on BSA Monolayer differed significantly from those on BSA structures (p = 0.01). This underscores the effects of pure contact guidance alone provided by the BSA fibers compared to the combined contact guidance and ECM cues provided by the FN, and collagen structures. A method similar to rapid prototyping or three-dimensional printing was accomplished to resolve cellular response at the submicron level by fabricating biomimetic, bioactive extracellular matrices in a freeform three-dimensional (3D) manner. To the best of our knowledge, simultaneous 3D spatial and chemical control of collagen scaffold synthesis at the micrometer and sub-micrometer size scales has not been fully demonstrated

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF
    corecore