1,738 research outputs found

    Minimum Population Search, an Application to Molecular Docking

    Get PDF
    Computer modeling of protein-ligand interactions is one of the most important phases in a drug design process. Part of the process involves the optimization of highly multi-modal objective (scoring) functions. This research presents the Minimum Population Search heuristic as an alternative for solving these global unconstrained optimization problems. To determine the effectiveness of Minimum Population Search, a comparison with seven state-of-the-art search heuristics is performed. Being specifically designed for the optimization of large scale multi-modal problems, Minimum Population Search achieves excellent results on all of the tested complexes, especially when the amount of available function evaluations is strongly reduced. A first step is also made toward the design of hybrid algorithms based on the exploratory power of Minimum Population Search. Computational results show that hybridization leads to a further improvement in performance

    Study of ligand-based virtual screening tools in computer-aided drug design

    Get PDF
    Virtual screening is a central technique in drug discovery today. Millions of molecules can be tested in silico with the aim to only select the most promising and test them experimentally. The topic of this thesis is ligand-based virtual screening tools which take existing active molecules as starting point for finding new drug candidates. One goal of this thesis was to build a model that gives the probability that two molecules are biologically similar as function of one or more chemical similarity scores. Another important goal was to evaluate how well different ligand-based virtual screening tools are able to distinguish active molecules from inactives. One more criterion set for the virtual screening tools was their applicability in scaffold-hopping, i.e. finding new active chemotypes. In the first part of the work, a link was defined between the abstract chemical similarity score given by a screening tool and the probability that the two molecules are biologically similar. These results help to decide objectively which virtual screening hits to test experimentally. The work also resulted in a new type of data fusion method when using two or more tools. In the second part, five ligand-based virtual screening tools were evaluated and their performance was found to be generally poor. Three reasons for this were proposed: false negatives in the benchmark sets, active molecules that do not share the binding mode, and activity cliffs. In the third part of the study, a novel visualization and quantification method is presented for evaluation of the scaffold-hopping ability of virtual screening tools.Siirretty Doriast

    Improving the resolution of interaction maps: A middleground between high-resolution complexes and genome-wide interactomes

    Get PDF
    Protein-protein interactions are ubiquitous in Biology and therefore central to understand living organisms. In recent years, large-scale studies have been undertaken to describe, at least partially, protein-protein interaction maps or interactomes for a number of relevant organisms including human. Although the analysis of interaction networks is proving useful, current interactomes provide a blurry and granular picture of the molecular machinery, i.e. unless the structure of the protein complex is known the molecular details of the interaction are missing and sometime is even not possible to know if the interaction between the proteins is direct, i.e. physical interaction or part of functional, not necessary, direct association. Unfortunately, the determination of the structure of protein complexes cannot keep pace with the discovery of new protein-protein interactions resulting in a large, and increasing, gap between the number of complexes that are thought to exist and the number for which 3D structures are available. The aim of the thesis was to tackle this problem by implementing computational approaches to derive structural models of protein complexes and thus reduce this existing gap. Over the course of the thesis, a novel modelling algorithm to predict the structure of protein complexes, V-D2OCK, was implemented. This new algorithm combines structure-based prediction of protein binding sites by means of a novel algorithm developed over the course of the thesis: VORFFIP and M-VORFFIP, data-driven docking and energy minimization. This algorithm was used to improve the coverage and structural content of the human interactome compiled from different sources of interactomic data to ensure the most comprehensive interactome. Finally, the human interactome and structural models were compiled in a database, V-D2OCK DB, that offers an easy and user-friendly access to the human interactome including a bespoken graphical molecular viewer to facilitate the analysis of the structural models of protein complexes. Furthermore, new organisms, in addition to human, were included providing a useful resource for the study of all known interactomes

    Force computations in automated docking

    Get PDF
    Automated docking refers to the problem of computing the optimal complementary fit of two molecules---a macromolecular receptor such as DNA or protein and a small molecule of interest (ligand). AutoDock is a docking software that computes the receptor-ligand binding energy to rank docked ligands. In this work, AutoDock\u27s grid-based method for energy evaluation was exploited to complement the binding energies with computed forces on docked ligand atoms. These forces and energies helped to provide insights into enzyme-substrate interaction mechanisms of three different enzymes--- Hypocrea jecorina Cel7A, a cellobiohydrolase, Fusarium oxysporum Cel7B, an endoglucanase, and Saccharomyces cerevisiae alpha-1,2-mannosidase. Cel7A and Cel7B are cellulose-degrading enzymes that, based on structural homology, belong to glycoside hydrolase Family 7. Cel7A binds crystalline cellulose and processively breaks cellobiose units from chain ends, while Cel7B targets amorphous cellulose and makes internal breaks in cellulose chain with limited processivity. The processive force on the substrate docked to the Cel7A catalytic domain (CD) is greater than twice that on the substrate docked to the Cel7B CD, explaining the difference in their processive behavior. Cel7A has a two-domain structure with a CD and a cellulose binding domain (CBD) joined by a highly glycosylated linker. Based on the interaction energies and forces on cellooligosaccharides docked to the CD and CBD, we propose a molecular machine model where the CBD wedges itself under a free chain end on the crystalline cellulose surface and feeds it to the CD active-site tunnel. alpha-1,2-Mannosidase from the endoplasmic reticulum, a Family 47 glycosyl hydrolase, is a key enzyme in the N-glycon synthesis pathway. AutoDock was used to dock alpha-D-mannopyranosyl-(1,2)-alpha-D-mannopyranose with its glycon in chair (1C4, 4C1), half-chair (3 H2, 3H4, 4H3), skew-boat (O S2, 3S1, 5S1), boat (2,5 B, 3,OB, B 1,4, B2,5), and envelope (3 E, 4E, E3, E4) conformations. Both docked energies and forces on docked ligand atoms were calculated to determine how the ligand distorts to the transition state. From these, we can conclude that the most likely binding pathways are 1C4 → 3H2 → OS 2 → 3,OB → 3 S1 → 3E and OS2 → 3,O B → 3S1 → 3E with 1C4 and OS2 as starting conformations, respectively

    Development and application of conformational methodologies: eliciting enthalpic global minima and reaction pathways

    Get PDF
    2014 Fall.The information granted by assembling the global minimum and low-enthalpy population of a chemical species or ensemble can be utilized to great effect across all fields of chemistry. With this population, otherwise impossible tasks including (but not limited to) reaction pathway characterization, protein folding, protein-ligand docking, and constructing the entropy to characterize free energy surfaces becomes a reasonable undertaking. For very small systems (single molecule with 1-3 torsions) generating the low-enthalpy population is a trivial task. However as the system grows, the task exponentially increases in difficulty. This dissertation will detail the two sides of this problem, generating the low-energy population of larger and more complex species and then utilizing those populations to garner a greater understanding of their systems. The first discussion describes a new model, Surface Editing Molecular Dynamics (SEMD), which aids in accelerating conformational searching by removing minima from the potential energy surface by adding Gaussian functions. Accompanying this new method are a multitude of new tools that can be utilized to aid in molecular dynamics simulations. The first of these tools, named CHILL, performs a projection of unproductive degrees of freedom from the molecular dynamics velocity to smooth atomic motions without artificially constraining those degrees of freedom. Another tool, Conjugate Velocity Molecular Dynamics (CVMD), rigorously generates a list of productive velocities via the biorthogonalization of local modes with a vector representation of previously explored conformational minima. In addition to these tools, a new description of distance in torsional space was developed to provide a robust means of conformational uniqueness. With each of these tools working in concert, the global minimum and associated low-enthalpy population of conformations have been obtained for various benchmark species. The second section discusses the application of conformational searching and the subsequent electronic structure calculations to characterize the reaction pathway for the ruthenium tris(2,2'-bipyridine) photocatalyzed [2+2] cycloaddition of aromatically substituted bis(enones). The APFD hybrid density functional is used along with a 6-311+g* basis and a PCM solvent model. The reaction is computed to proceed through a rate-limited formation of a cyclopentyl intermediate. Lithium tetrafluoroborate is found to facilitate initial bis(enone) reduction as well as final product distribution. In addition, aromatic substituents are found to impact both initial reduction and final product distribution

    Optimización multi-objetivo en las ciencias de la vida.

    Get PDF
    Para conseguir este objetivo, en lugar de intentar incorporar nuevos algoritmos directamente en el código fuente de AutoDock, se utilizó un framework orientado a la resolución de problemas de optimización con metaheurísticas. Concretamente, se usó jMetal, que es una librería de código libre basada en Java. Ya que AutoDock está implementado en C++, se desarrolló una versión en C++ de jMetal (posteriormente distribuida públicamente). De esta manera, se consiguió integrar ambas herramientas (AutoDock 4.2 y jMetal) para optimizar la energía libre de unión entre compuesto químico y receptor. Después de disponer de una amplia colección de metaheurísticas implementadas en jMetalCpp, se realizó un detallado estudio en el cual se aplicaron un conjunto de metaheurísticas para optimizar un único objetivo minimizando la energía libre de unión, el cual es el resultado de la suma de todos los términos de energía de la función objetivo de energía de AutoDock 4.2. Por lo tanto, cuatro metaheurísticas tales como dos variantes de algoritmo genético gGA (Algoritmo Genético generacional) y ssGA (Algoritmo Genético de estado estacionario), DE (Evolución Diferencial) y PSO (Optimización de Enjambres de Partículas) fueron aplicadas para resolver el problema del acoplamiento molecular. Esta fase se dividió en dos subfases en las que se usaron dos conjuntos de instancias diferentes, utilizando como receptores HIV-proteasas con cadenas laterales de aminoacidos flexibles y como ligandos inhibidores HIV-proteasas flexibles. El primer conjunto de instancias se usó para un estudio de configuración de parámetros de los algoritmos y el segundo para comparar la precisión de las conformaciones ligando-receptor obtenidas por AutoDock y AutoDock+jMetalCpp. La siguiente fase implicó aplicar una formulación multi-objetivo para resolver problemas de acoplamiento molecular dados los resultados interesantes obtenidos en estudios previos existentes en los que dos objetivos como la energía intermolecular y la energía intramolecular fueron minimizados. Por lo tanto, se comparó y analizó el rendimiento de un conjunto de metaheurísticas multi-objetivo mediante la resolución de complejos flexibles de acoplamiento molecular minimizando la energía inter- e intra-molecular. Estos algoritmos fueron: NSGA-II (Algoritmo Genético de Ordenación No dominada) y su versión de estado estacionario (ssNSGA-II), SMPSO (Optimización Multi-objetivo de Enjambres de Partículas con Modulación de Velocidad), GDE3 (Tercera versión de la Evolución Diferencial Generalizada), MOEA/D (Algoritmo Evolutivo Multi-Objetivo basado en la Decomposición) y SMS-EMOA (Optimización Multi-objetivo Evolutiva con Métrica S). Después de probar enfoques multi-objetivo ya existentes, se probó uno nuevo. En concreto, el uso del RMSD como un objetivo para encontrar soluciones similares a la de la solución de referencia. Se replicó el estudio previo usando este conjunto diferente de objetivos. Por último, se analizó de forma detallada el algoritmo que obtuvo mejores resultados en los estudios previos. En concreto, se realizó un estudio de variantes del SMPSO minimizando la energía intermolecular y el RMSD. Este estudio proporcionó algunas pistas sobre cómo nuevos algoritmos basados en SMPSO pueden ser adaptados para mejorar los resultados de acoplamiento molecular para aquellas simulaciones que involucren ligandos y receptores flexibles. Esta tesis demuestra que la inclusión de técnicas metaheurísticas de jMetalCpp en la herramienta de acoplamiento molecular AutoDock incrementa las posibilidades a los usuarios de ámbito biológico cuando resuelven el problema del acoplamiento molecular. El uso de técnicas de optimización mono-objetivo diferentes aparte de aquéllas ampliamente usadas en las comunidades de acoplamiento molecular podría dar lugar a soluciones de mayor calidad. En nuestro caso de estudio mono-objetivo, el algoritmo de evolución diferencial obtuvo mejores resultados que aquellos obtenidos por AutoDock. También se propone diferentes enfoques multi-objetivo para resolver el problema del acoplamiento molecular, tales como la decomposición de los términos de la energía de unión o el uso del RMSD como un objetivo. Finalmente, se demuestra que el SMPSO, una metaheurística de optimización multi-objetivo de enjambres de partículas, es una técnica remarcable para resolver problemas de acoplamiento molecular cuando se usa un enfoque multi-objetivo, obteniendo incluso mejores soluciones que las técnicas mono-objetivo.Las herramientas de acoplamiento molecular han llegado a ser bastante eficientes en el descubrimiento de fármacos y en el desarrollo de la investigación de la industria farmacéutica. Estas herramientas se utilizan para elucidar la interacción de una pequeña molécula (ligando) y una macro-molécula (diana) a un nivel atómico para determinar cómo el ligando interactúa con el sitio de unión de la proteína diana y las implicaciones que estas interacciones tienen en un proceso bioquímico dado. En el desarrollo computacional de las herramientas de acoplamiento molecular los investigadores de este área se han centrado en mejorar los componentes que determinan la calidad del software de acoplamiento molecular: 1) la función objetivo y 2) los algoritmos de optimización. La función objetivo de energía se encarga de proporcionar una evaluación de las conformaciones entre el ligando y la proteína calculando la energía de unión, que se mide en kcal/mol. En esta tesis, se ha usado AutoDock, ya que es una de las herramientas de acoplamiento molecular más citada y usada, y cuyos resultados son muy precisos en términos de energía y valor de RMSD (desviación de la media cuadrática). Además, se ha seleccionado la función de energía de AutoDock versión 4.2, ya que permite realizar una mayor cantidad de simulaciones realistas incluyendo flexibilidad en el ligando y en las cadenas laterales de los aminoácidos del receptor que están en el sitio de unión. Se han utilizado algoritmos de optimización para mejorar los resultados de acoplamiento molecular de AutoDock 4.2, el cual minimiza la energía libre de unión final que es la suma de todos los términos de energía de la función objetivo de energía. Dado que encontrar la solución óptima en el acoplamiento molecular es un problema de gran complejidad y la mayoría de las veces imposible, se suelen utilizar algoritmos no exactos como las metaheurísticas, para así obtener soluciones lo suficientemente buenas en un tiempo razonable

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Integrating protein structural information

    Get PDF
    Dissertação apresentada para obtenção de Grau de Doutor em Bioquímica,Bioquímica Estrutural, pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaThe central theme of this work is the application of constraint programming and other artificial intelligence techniques to protein structure problems, with the goal of better combining experimental data with structure prediction methods. Part one of the dissertation introduces the main subjects of protein structure and constraint programming, summarises the state of the art in the modelling of protein structures and complexes, sets the context for the techniques described later on, and outlines the main points of the thesis: the integration of experimental data in modelling. The first chapter, Protein Structure, introduces the reader to the basic notions of amino acid structure, protein chains, and protein folding and interaction. These are important concepts to understand the work described in parts two and three. Chapter two, Protein Modelling, gives a brief overview of experimental and theoretical techniques to model protein structures. The information in this chapter provides the context of the investigations described in parts two and three, but is not essential to understanding the methods developed. Chapter three, Constraint Programming, outlines the main concepts of this programming technique. Understanding variable modelling, the notions of consistency and propagation, and search methods should greatly help the reader interested in the details of the algorithms, as described in part two of this book. The fourth chapter, Integrating Structural Information, is a summary of the thesis proposed here. This chapter is an overview of the objectives of this work, and gives an idea of how the algorithms developed here could help in modelling protein structures. The main goal is to provide a flexible and continuously evolving framework for the integration of structural information from a diversity of experimental techniques and theoretical predictions. Part two describes the algorithms developed, which make up the main original contribution of this work. This part is aimed especially at developers interested in the details of the algorithms, in replicating the results, in improving the method or in integrating them in other applications. Biochemical aspects are dealt with briefly and as necessary, and the emphasis is on the algorithms and the code

    Simple models of protein folding and of non--conventional drug design

    Full text link
    While all the information required for the folding of a protein is contained in its amino acid sequence, one has not yet learned how to extract this information to predict the three--dimensional, biologically active, native conformation of a protein whose sequence is known. Using insight obtained from simple model simulations of the folding of proteins, in particular of the fact that this phenomenon is essentially controlled by conserved (native) contacts among (few) strongly interacting ("hot"), as a rule hydrophobic, amino acids, which also stabilize local elementary structures (LES, hidden, incipient secondary structures like α\alpha--helices and β\beta--sheets) formed early in the folding process and leading to the postcritical folding nucleus (i.e., the minimum set of native contacts which bring the system pass beyond the highest free--energy barrier found in the whole folding process) it is possible to work out a succesful strategy for reading the native structure of designed proteins from the knowledge of only their amino acid sequence and of the contact energies among the amino acids. Because LES have undergone millions of years of evolution to selectively dock to their complementary structures, small peptides made out of the same amino acids as the LES are expected to selectively attach to the newly expressed (unfolded) protein and inhibit its folding, or to the native (fluctuating) native conformation and denaturate it. These peptides, or their mimetic molecules, can thus be used as effective non--conventional drugs to those already existing (and directed at neutralizing the active site of enzymes), displaying the advantage of not suffering from the uprise of resistance
    corecore