19 research outputs found

    Application of 3D Zernike descriptors to shape-based ligand similarity searching

    Get PDF
    Background: The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results: In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD) for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR) and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability

    jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats.</p> <p>Results</p> <p>We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al.</p> <p>Conclusions</p> <p>jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining.</p

    Analysis of Biological Screening Data and Molecular Selectivity Profiles Using Fingerprints and Mapping Algorithms

    Get PDF
    The identification of promising drug candidates is a major milestone in the early stages of drug discovery and design. Among the properties that have to be optimized before a drug candidate is admitted to clinical testing, potency and target selectivity are of great interest and can be addressed very early. Unfortunately, optimization–relevant knowledge is often limited, and the analysis of noisy and heterogeneous biological screening data with standard methods like QSAR is hardly feasible. Furthermore, the identification of compounds displaying different selectivity patterns against related targets is a prerequisite for chemical genetics and genomics applications, allowing to specifically interfere with functions of individual members of protein families. In this thesis it is shown that computational methods based on molecular similarity are suitable tools for the analysis of compound potency and target selectivity. Originally developed to facilitate the efficient discovery of active compounds by means of virtual screening of compound libraries, these ligand–based approaches assume that similar molecules are likely to exhibit similar properties and biological activities based on the similarity property principle. Given their holistic approach to molecular similarity analysis, ligand–based virtual screening methods can be applied when little or no structure– activity information is available and do not require the knowledge of the target structure. The methods under investigation cover a wide methodological spectrum and only rely on properties derived from one– and two–dimensional molecular representations, which renders them particularly useful for handling large compound libraries. Using biological screening data, these virtual screening methods are shown to be able to extrapolate from experimental data and preferentially detect potent compounds. Subsequently, extensive benchmark calculations prove that existing 2D molecular fingerprints and dynamic mapping algorithms are suitable tools for the distinction between compounds with differential selectivity profiles. Finally, an advanced dynamic mapping algorithm is introduced that is able to generate target–selective chemical reference spaces by adaptively identifying most–discriminative molecular properties from a set of active compounds. These reference spaces are shown to be of great value for the generation of predictive target–selectivity models by screening a biologically annotated compound library. </p

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    Molecular Similarity and Xenobiotic Metabolism

    Get PDF
    MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner.MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm.In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions.This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds.MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.----Boehringer-Ingelhie

    Design and implementation of a platform for predicting pharmacological properties of molecules

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019O processo de descoberta e desenvolvimento de novos medicamentos prolonga-se por vários anos e implica o gasto de imensos recursos monetários. Como tal, vários métodos in silico são aplicados com o intuito de dimiuir os custos e tornar o processo mais eficiente. Estes métodos incluem triagem virtual, um processo pelo qual vastas coleções de compostos são examinadas para encontrar potencial terapêutico. QSAR (Quantitative Structure Activity Relationship) é uma das tecnologias utilizada em triagem virtual e em optimização de potencial farmacológico, em que a informação estrutural de ligandos conhecidos do alvo terapêutico é utilizada para prever a actividade biológica de um novo composto para com o alvo. Vários investigadores desenvolvem modelos de aprendizagem automática de QSAR para múltiplos alvos terapêuticos. Mas o seu uso está dependente do acesso aos mesmos e da facilidade em ter os modelos funcionais, o que pode ser complexo quando existem várias dependências ou quando o ambiente de desenvolvimento difere bastante do ambiente em que é usado. A aplicação ao qual este documento se refere foi desenvolvida para lidar com esta questão. Esta é uma plataforma centralizada onde investigadores podem aceder a vários modelos de QSAR, podendo testar os seus datasets para uma multitude de alvos terapêuticos. A aplicação permite usar identificadores moleculares como SMILES e InChI, e gere a sua integração em descritores moleculares para usar como input nos modelos. A plataforma pode ser acedida através de uma aplicação web com interface gráfica desenvolvida com o pacote Shiny para R e directamente através de uma REST API desenvolvida com o pacote flask-restful para Python. Toda a aplicação está modularizada através de teconologia de “contentores”, especificamente o Docker. O objectivo desta plataforma é divulgar o acesso aos modelos criados pela comunidade, condensando-os num só local e removendo a necessidade do utilizador de instalar ou parametrizar qualquer tipo de software. Fomentando assim o desenvolvimento de conhecimento e facilitando o processo de investigação.The drug discovery and design process is expensive, time-consuming and resource-intensive. Various in silico methods are used to make the process more efficient and productive. Methods such as Virtual Screening often take advantage of QSAR machine learning models to more easily pinpoint the most promising drug candidates, from large pools of compounds. QSAR, which means Quantitative Structure Activity Relationship, is a ligand-based method where structural information of known ligands of a specific target is used to predict the biological activity of another molecule against that target. They are also used to improve upon an existing molecule’s pharmacologic potential by elucidating the structural composition with desirable properties. Several researchers create and develop QSAR machine learning models for a variety of different therapeutic targets. However, their use is limited by lack of access to said models. Beyond access, there are often difficulties in using published software given the need to manage dependencies and replicating the development environment. To address this issue, the application documented here was designed and developed. In this centralized platform, researchers can access several QSAR machine learning models and test their own datasets for interaction with various therapeutic targets. The platform allows the use of widespread molecule identifiers as input, such as SMILES and InChI, handling the necessary integration into the appropriate molecular descriptors to be used in the model. The platform can be accessed through a Web Application with a full graphical user interface developed with the R package Shiny and through a REST API developed with the Flask Restful package for Python. The complete application is packaged up in container technology, specifically Docker. The main goal of this platform is to grant widespread access to the QSAR models developed by the scientific community, by concentrating them in a single location and removing the user’s need to install or set up software unfamiliar to them. This intends to incite knowledge creation and facilitate the research process

    Computational Approaches for the Characterization of the Structure and Dynamics of G Protein-Coupled Receptors: Applications to Drug Design

    Get PDF
    G Protein-Coupled Receptors (GPCRs) constitute the most pharmacologically relevant superfamily of proteins. In this thesis, a computational pipeline for modelling the structure and dynamics of GPCRs is presented, properly combined with experimental collaborations for GPCR drug design. These include the discovery of novel scaffolds as potential antipsychotics, and the design of a new series of A3 adenosine receptor antagonists, employing successful combinations of structure- and ligand-based approaches. Additionally, the structure of Adenosine Receptors (ARs) was computationally assessed, with implications in ligand affinity and selectivity. The employed protocol for Molecular Dynamics simulations has allowed the characterization of structural determinants of the activation of ARs, and the evaluation of the stability of GPCR dimers of CXCR4 receptor. Finally, the computational pipeline here developed has been integrated into the web server GPCR-ModSim (http://gpcr.usc.es), contributing to its application in biochemical and pharmacological studies on GPCRs

    Advances and Challenges in Computational Target Prediction

    Get PDF
    Target deconvolution is a vital initial step in preclinical drug development to determine research focus and strategy. In this respect, computational target prediction is used to identify the most probable targets of an orphan ligand or the most similar targets to a protein under investigation. Applications range from the fundamental analysis of the mode-of-action over polypharmacology or adverse effect predictions to drug repositioning. Here, we provide a review on published ligand- and target-based as well as hybrid approaches for computational target prediction, together with current limitations and future directions.Medicinal Chemistr
    corecore