4,666 research outputs found

    Drugs, structures, fragments : substructure-based approaches to GPCR drug discovery and design

    Get PDF
    This thesis is all about cheminformatics, and its impact on drug discovery. A number of strategies are discussed that apply computational methods for the analysis and design of G protein-coupled receptor (GPCR) ligands. Frequent substructure mining is applied to find the common structural motifs that are discriminative for predefined classes of GPCR ligands. In addtion, this approach is extended to cluster GPCRs to suggest a new classification for this receptor superfamily. Furthermore, substructure analysis is utilised to screen for new adenosine A2A receptor ligands. Finally, an automated de novo design approach is described that is used for the design of new adenosine A1 receptor ligands using a multi-objective evolutionary algorithm.Top Institute Pharma (Project: GPCR forum; project number: D1-105).UBL - phd migration 201

    Optimización multi-objetivo en las ciencias de la vida.

    Get PDF
    Para conseguir este objetivo, en lugar de intentar incorporar nuevos algoritmos directamente en el código fuente de AutoDock, se utilizó un framework orientado a la resolución de problemas de optimización con metaheurísticas. Concretamente, se usó jMetal, que es una librería de código libre basada en Java. Ya que AutoDock está implementado en C++, se desarrolló una versión en C++ de jMetal (posteriormente distribuida públicamente). De esta manera, se consiguió integrar ambas herramientas (AutoDock 4.2 y jMetal) para optimizar la energía libre de unión entre compuesto químico y receptor. Después de disponer de una amplia colección de metaheurísticas implementadas en jMetalCpp, se realizó un detallado estudio en el cual se aplicaron un conjunto de metaheurísticas para optimizar un único objetivo minimizando la energía libre de unión, el cual es el resultado de la suma de todos los términos de energía de la función objetivo de energía de AutoDock 4.2. Por lo tanto, cuatro metaheurísticas tales como dos variantes de algoritmo genético gGA (Algoritmo Genético generacional) y ssGA (Algoritmo Genético de estado estacionario), DE (Evolución Diferencial) y PSO (Optimización de Enjambres de Partículas) fueron aplicadas para resolver el problema del acoplamiento molecular. Esta fase se dividió en dos subfases en las que se usaron dos conjuntos de instancias diferentes, utilizando como receptores HIV-proteasas con cadenas laterales de aminoacidos flexibles y como ligandos inhibidores HIV-proteasas flexibles. El primer conjunto de instancias se usó para un estudio de configuración de parámetros de los algoritmos y el segundo para comparar la precisión de las conformaciones ligando-receptor obtenidas por AutoDock y AutoDock+jMetalCpp. La siguiente fase implicó aplicar una formulación multi-objetivo para resolver problemas de acoplamiento molecular dados los resultados interesantes obtenidos en estudios previos existentes en los que dos objetivos como la energía intermolecular y la energía intramolecular fueron minimizados. Por lo tanto, se comparó y analizó el rendimiento de un conjunto de metaheurísticas multi-objetivo mediante la resolución de complejos flexibles de acoplamiento molecular minimizando la energía inter- e intra-molecular. Estos algoritmos fueron: NSGA-II (Algoritmo Genético de Ordenación No dominada) y su versión de estado estacionario (ssNSGA-II), SMPSO (Optimización Multi-objetivo de Enjambres de Partículas con Modulación de Velocidad), GDE3 (Tercera versión de la Evolución Diferencial Generalizada), MOEA/D (Algoritmo Evolutivo Multi-Objetivo basado en la Decomposición) y SMS-EMOA (Optimización Multi-objetivo Evolutiva con Métrica S). Después de probar enfoques multi-objetivo ya existentes, se probó uno nuevo. En concreto, el uso del RMSD como un objetivo para encontrar soluciones similares a la de la solución de referencia. Se replicó el estudio previo usando este conjunto diferente de objetivos. Por último, se analizó de forma detallada el algoritmo que obtuvo mejores resultados en los estudios previos. En concreto, se realizó un estudio de variantes del SMPSO minimizando la energía intermolecular y el RMSD. Este estudio proporcionó algunas pistas sobre cómo nuevos algoritmos basados en SMPSO pueden ser adaptados para mejorar los resultados de acoplamiento molecular para aquellas simulaciones que involucren ligandos y receptores flexibles. Esta tesis demuestra que la inclusión de técnicas metaheurísticas de jMetalCpp en la herramienta de acoplamiento molecular AutoDock incrementa las posibilidades a los usuarios de ámbito biológico cuando resuelven el problema del acoplamiento molecular. El uso de técnicas de optimización mono-objetivo diferentes aparte de aquéllas ampliamente usadas en las comunidades de acoplamiento molecular podría dar lugar a soluciones de mayor calidad. En nuestro caso de estudio mono-objetivo, el algoritmo de evolución diferencial obtuvo mejores resultados que aquellos obtenidos por AutoDock. También se propone diferentes enfoques multi-objetivo para resolver el problema del acoplamiento molecular, tales como la decomposición de los términos de la energía de unión o el uso del RMSD como un objetivo. Finalmente, se demuestra que el SMPSO, una metaheurística de optimización multi-objetivo de enjambres de partículas, es una técnica remarcable para resolver problemas de acoplamiento molecular cuando se usa un enfoque multi-objetivo, obteniendo incluso mejores soluciones que las técnicas mono-objetivo.Las herramientas de acoplamiento molecular han llegado a ser bastante eficientes en el descubrimiento de fármacos y en el desarrollo de la investigación de la industria farmacéutica. Estas herramientas se utilizan para elucidar la interacción de una pequeña molécula (ligando) y una macro-molécula (diana) a un nivel atómico para determinar cómo el ligando interactúa con el sitio de unión de la proteína diana y las implicaciones que estas interacciones tienen en un proceso bioquímico dado. En el desarrollo computacional de las herramientas de acoplamiento molecular los investigadores de este área se han centrado en mejorar los componentes que determinan la calidad del software de acoplamiento molecular: 1) la función objetivo y 2) los algoritmos de optimización. La función objetivo de energía se encarga de proporcionar una evaluación de las conformaciones entre el ligando y la proteína calculando la energía de unión, que se mide en kcal/mol. En esta tesis, se ha usado AutoDock, ya que es una de las herramientas de acoplamiento molecular más citada y usada, y cuyos resultados son muy precisos en términos de energía y valor de RMSD (desviación de la media cuadrática). Además, se ha seleccionado la función de energía de AutoDock versión 4.2, ya que permite realizar una mayor cantidad de simulaciones realistas incluyendo flexibilidad en el ligando y en las cadenas laterales de los aminoácidos del receptor que están en el sitio de unión. Se han utilizado algoritmos de optimización para mejorar los resultados de acoplamiento molecular de AutoDock 4.2, el cual minimiza la energía libre de unión final que es la suma de todos los términos de energía de la función objetivo de energía. Dado que encontrar la solución óptima en el acoplamiento molecular es un problema de gran complejidad y la mayoría de las veces imposible, se suelen utilizar algoritmos no exactos como las metaheurísticas, para así obtener soluciones lo suficientemente buenas en un tiempo razonable

    Advances in De Novo Drug Design : From Conventional to Machine Learning Methods

    Get PDF
    De novo drug design is a computational approach that generates novel molecular structures from atomic building blocks with no a priori relationships. Conventional methods include structure-based and ligand-based design, which depend on the properties of the active site of a biological target or its known active binders, respectively. Artificial intelligence, including ma-chine learning, is an emerging field that has positively impacted the drug discovery process. Deep reinforcement learning is a subdivision of machine learning that combines artificial neural networks with reinforcement-learning architectures. This method has successfully been em-ployed to develop novel de novo drug design approaches using a variety of artificial networks including recurrent neural networks, convolutional neural networks, generative adversarial networks, and autoencoders. This review article summarizes advances in de novo drug design, from conventional growth algorithms to advanced machine-learning methodologies and high-lights hot topics for further development.Peer reviewe

    Combining evolutionary algorithms with reaction rules towards focused molecular design

    Get PDF
    Designing novel small molecules with desirable properties and feasible synthesis continues to pose a significant challenge in drug discovery, particularly in the realm of natural products. Reaction-based gradient-free methods are promising approaches for designing new molecules as they ensure synthetic feasibility and provide potential synthesis paths. However, it is important to note that the novelty and diversity of the generated molecules highly depend on the availability of comprehensive reaction templates. To address this challenge, we introduce ReactEA, a new open-source evolutionary framework for computer-aided drug discovery that solely utilizes biochemical reaction rules. ReactEA optimizes molecular properties using a comprehensive set of 22,949 reaction rules, ensuring chemical validity and synthetic feasibility. ReactEA is versatile, as it can virtually optimize any objective function and track potential synthetic routes during the optimization process. To demonstrate its effectiveness, we apply ReactEA to various case studies, including the design of novel drug-like molecules and the optimization of pre-existing ligands. The results show that ReactEA consistently generates novel molecules with improved properties and reasonable synthetic routes, even for complex tasks such as improving binding affinity against the PARP1 enzyme when compared to existing inhibitors.Centre of Biological Engineering (CEB, University of Minho) for financial and equipment support. Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UIDB/04469/2020 unit and through a Ph.D. scholarship awarded to João Correia (SFRH/BD/144314/2019). European Commission through the project SHIKIFACTORY100 - Modular cell factories for the production of 100 compounds from the shikimate pathway (Reference 814408).info:eu-repo/semantics/publishedVersio

    A survey of diversity-oriented optimization

    Get PDF
    The concept of diversity plays a crucial role in many optimization approaches: On the one hand, diversity can be formulated as an essential goal, such as in level set approximation or multiobjective optimization where the aim is to find a diverse set of alternative feasible or, respectively, Pareto optimal solutions. On the other hand, diversity maintenance can play an important role in algorithms that ultimately searc

    Untersuchung der Struktur und Interaktion mit allosterischen Modulatoren der Familie C GPCRs mit Hilfe von Sequenz-, Struktur- und Ligand-basierten Verfahren

    Get PDF
    This study focuses on structural features of a particular GPCR type, the family C GPCRs. Structure- and ligand-based approaches were adopted for prediction of novel mGluR5 binding ligand and their binding modes. The objectives of this study were: 1. An analysis of function and structural implication of amino acids in the TM region of family C GPCRs. 2. The prediction of the TM domain structure of mGluR5. 3. The discovery of novel selective allosteric modulators of mGluR5 by virtual screening. 4. The prediction of a ligand binding mode for the allosteric binding site in mGluR5. GPCRs are a super-family of structurally related proteins although their primary amino acid sequence can be diverse. Using sequence information a conservation analysis of family C GPCRs should be applied to reveal characteristic differences and similarities with respect function, folding and ligand binding. Using experimental data and conservation analysis the allosteric binding site of mGluR5 should be characterized regarding NAM and PAM and selective ligand binding. For further evaluation experimental knowledge about family A GPCRs as well as conservation between vertebrate rhodopsins was planned to be compared to results obtained for family C GPCRs (Section 4.1 Conservation analysis of family C GPCRs). Since no receptor structure is available for any family C GPCR, discussion of conserved sequence positions between family A and C GPCRs requires the prediction of a receptor structure for mGluR5 using a family A receptor as template. In order to predict the mGluR5 structure a sequence alignment to a GPCR template protein will have to be proposed and GPCR specific features considered in structure calculation (Section 4.1.4 Structure prediction of mGluR5). The obtained structure was intended to be involved in ligand binding mode prediction of newly discovered active molecules. For discovery of novel selective mGluR modulators several ligand-based virtual screening protocols were adapted and evaluated. Prediction models were derived for selection of possibly active molecules using a diverse collection of known mGluR binding ligands. For that purpose a data collection of known mGluR binding ligands should be established and this reference collection analyzed with respect to different ligand activity classes, NAM or PAM and selective modulators. The prediction of novel NAMs and PAMs using several combinations of 2D-, 3D-, pharmacophore or molecule shape encoding methods with machine learning techniques and similarity determining methods should be tested in a prospective manner (Section 4.2 Virtual screening for novel mGluR modulators). In collaboration with Merz Pharmaceuticals (Merz GmbH & Co. KGaA, Frankfurt am Main, Germany) the modulating effect of a few hundred molecules should be approved in a functional cell-based assay. With the objective to predict a binding mode of the discovered active molecules, molecule docking should be applied using the allosteric binding site of the modeled mGluR5 structure (Section 4.2.4 Modeling of binding modes). Predicted ligand binding modes are to be correlated to conservation profiles that had resulted from the sequence-based entropy analysis and information from mutation experiments, and shall be compared to known ligand binding poses from crystal structures of family A GPCRs.Im Rahmen dieser Arbeit wurden Konzepte zur Aufklärung struktureller und funktioneller Eigenschaften von G-Protein gekoppelten Rezeptoren (GPCR) der Familie C entwickelt und angewendet. Mit unterschiedlichen Methodiken der Bio- und Chemieinformatik orientiert an experimentellen Ergebnissen wurden Fragestellungen bezüglich des Funktionsmechanismus von GPCRs untersucht. In Verlauf wurde anhand verfügbarer experimenteller Daten aus Mutations- und Ligandenbindungsstudien ein Vergleich konservierter Bereiche der Rezeptor-Familien A und C angefertigt. Die Konserviertheitsanalyse stützte sich auf die Berechnung der Shannon-Entropie und wurde für ein multiples Sequenzalignment von Transmembrandomänen unterschiedlicher 96 Familie C GPCRs ermittelt. Konservierte Bereiche wurden mit Hilfe experimenteller Daten interpretiert und insbesondere zur Definition von Regionen in der allosterischen Bindetasche hinsichtlich Selektivität verwendet. Mit dem Ziel, neue selektive allosterische Modulatoren für den metabotropen Glutamatrezeptor des Typs fünf (mGluR5) zu finden, wurden mehrere Liganden-basierte Ansätze zur virtuellen Vorhersage der Aktivität von Molekülen entwickelt und getestet. Die dabei angewendete Strategie basierte auf der Kenntnis bereits bekannter Liganden, deren Strukturen und Aktivitätswerte für das Erstellen von Vorhersagemodelle genutzt werden konnten. Die prospektive Vorhersage stützte sich auf unterschiedliche Methoden zur Ähnlichkeitsberechnung und Arten der Molekülkodierung. Die Testung der Moleküle erfolgte hinsichtlich ihrer modulatorischen Wirkung am mGluR5. Die Art der Messung erfasste die Änderungen des Ca2+-Levels in der Zelle. mGluR5-bindende Modulatoren wurden zur Selektivitätsbestimmung einer Testung am mGluR1 unterzogen. Insgesamt konnten 8 von 228 getesteten Molekülen im Aktivitätsbereich unter 10μM ermittelt werden, darunter befand sich ein positiver allosterischer Modulator. Von den restlichen sieben negativen Modulatoren (NAM) waren fünf selektiv für mGluR5. Alle identifizierten NAMs wurden mittels molekularem Dockings auf mögliche Interaktion mit der Transmembrandomäne von mGluR5 untersucht. Die Bindungshypothese entsprach einer Überlagerung der gefundenen Moleküle und ihrer möglicher Interaktionspunkte. Exemplarisch am mGluR5 konnte somit die Eignung einer modellierten GPCR-Struktur für eine Hypothesengenerierung bezüglich Ligandenbindung und struktureller Zusammenhänge untersucht werden

    Novel algorithms for protein sequence analysis

    Get PDF
    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology__s paradigm is that this order of amino acids determines the protein__s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1 begins with the introduction of amino acids, proteins and protein families. Then fundamental techniques from computer science related to the thesis are briefly described. Making a multiple sequence alignment (MSA) and constructing a phylogenetic tree are traditional means of sequence analysis. Information entropy, feature selection and sequential pattern mining provide alternative ways to analyze protein sequences and they are all from computer science. In Chapter 2, information entropy was used to measure the conservation on a given position of the alignment. From an alignment which is grouped into subfamilies, two types of information entropy values are calculated for each position in the MSA. One is the average entropy for a given position among the subfamilies, the other is the entropy for the same position in the entire multiple sequence alignment. This so-called two-entropies analysis or TEA in short, yields a scatter-plot in which all positions are represented with their two entropy values as x- and y-coordinates. The different locations of the positions (or dots) in the scatter-plot are indicative of various conservation patterns and may suggest different biological functions. The globally conserved positions show up at the lower left corner of the graph, which suggests that these positions may be essential for the folding or for the main functions of the protein superfamily. In contrast the positions neither conserved between subfamilies nor conserved in each individual subfamily appear at the upper right corner. The positions conserved within each subfamily but divergent among subfamilies are in the upper left corner. They may participate in biological functions that divide subfamilies, such as recognition of an endogenous ligand in G protein-coupled receptors. The TEA method requires a definition of protein subfamilies as an input. However such definition is a challenging problem by itself, particularly because this definition is crucial for the following prediction of specificity positions. In Chapter 3, we automated the TEA method described in Chapter 2 by tracing the evolutionary pressure from the root to the branches of the phylogenetic tree. At each level of the tree, a TEA plot is produced to capture the signal of the evolutionary pressure. A consensus TEA-O plot is composed from the whole series of plots to provide a condensed representation. Positions related to functions that evolved early (conserved) or later (specificity) are close to the lower left or upper left corner of the TEA-O plot, respectively. This novel approach allows an unbiased, user-independent, analysis of residue relevance in a protein family. We tested the TEA-O method on a synthetic dataset as well as on __real__ data, i.e., LacI and GPCR datasets. The ROC plots for the real data showed that TEA-O works perfectly well on all datasets and much better than other considered methods such as evolutionary trace, SDPpred and TreeDet. While positions were treated independently from each other in Chapter 2 and 3 in predicting specificity positions, in Chapter 4 multi-RELIEF considers both sequence similarity and distance in 3D structure in the specificity scoring function. The multi-RELIEF method was developed based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected __local__ functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we showed that multi-RELIEF identifies specificity residues for the seven test sets used. In addition, incorporating structural information improved the prediction for specificity of interaction with small molecules. Comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance. In Chapter 2, 3 and 4, we heavily relied on multiple sequence alignment to identify conserved and specificity positions. As mentioned before, the construction of such alignment is not self-evident. Following the principle of sequential pattern mining, in Chapter 5, we proposed a new algorithm that directly identifies frequent biologically meaningful patterns from unaligned sequences. Six algorithms were designed and implemented to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. From Chapter 2 to 5, we aimed to identify functional residues from either aligned or unaligned protein sequences. In Chapter 6, we introduce an alignment-independent procedure to cluster protein sequences, which may be used to predict protein function. Traditionally phylogeny reconstruction is usually based on multiple sequence alignment. The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In cheminformatics, constructing a similarity tree of ligands is usually alignment free. Feature spaces are routine means to convert compounds into binary fingerprints. Then distances among compounds can be obtained and similarity trees are constructed via clustering techniques. We explored building feature spaces for phylogeny reconstruction either using the so-called k-mer method or via sequential pattern mining with additional filtering and combining operations. Satisfying trees were built from both approaches compared with alignment-based methods. We found that when k equals 3, the phylogenetic tree built from the k-mer fingerprints is as good as one of the alignment-based methods, in which PAM and Neighborhood joining are used for computing distance and constructing a tree, respectively (NJ-PAM). As for the sequential pattern mining approach, the quality of the phylogenetic tree is better than one of the alignment-based method (NJ-PAM), if we set the support value to 10% and used maximum patterns only as descriptors. Finally in Chapter 7, general conclusions about the research described in this thesis are drawn. They are supplemented with an outlook on further research lines. We are convinced that the described algorithms can be useful in, e.g., genomic analyses, and provide further ideas for novel algorithms in this respect.Leiden University, NWO (Horizon Breakthrough project 050-71-041) and the Dutch Top Institute Pharma (D1-105)UBL - phd migration 201

    A Comprehensive Mapping of the Druggable Cavities within the SARS-CoV-2 Therapeutically Relevant Proteins by Combining Pocket and Docking Searches as Implemented in Pockets 2.0

    Get PDF
    (1) Background: Virtual screening studies on the therapeutically relevant proteins of the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) require a detailed characterization of their druggable binding sites, and, more generally, a convenient pocket mapping represents a key step for structure-based in silico studies; (2) Methods: Along with a careful literature search on SARS-CoV-2 protein targets, the study presents a novel strategy for pocket mapping based on the combination of pocket (as performed by the well-known FPocket tool) and docking searches (as performed by PLANTS or AutoDock/Vina engines); such an approach is implemented by the Pockets 2.0 plug-in for the VEGA ZZ suite of programs; (3) Results: The literature analysis allowed the identification of 16 promising binding cavities within the SARS-CoV-2 proteins and the here proposed approach was able to recognize them showing performances clearly better than those reached by the sole pocket detection; and (4) Conclusions: Even though the presented strategy should require more extended validations, this proved successful in precisely characterizing a set of SARS-CoV-2 druggable binding pockets including both orthosteric and allosteric sites, which are clearly amenable for virtual screening campaigns and drug repurposing studies. All results generated by the study and the Pockets 2.0 plug-in are available for download
    corecore