616 research outputs found

    Geometric algorithms for cavity detection on protein surfaces

    Get PDF
    Macromolecular structures such as proteins heavily empower cellular processes or functions. These biological functions result from interactions between proteins and peptides, catalytic substrates, nucleotides or even human-made chemicals. Thus, several interactions can be distinguished: protein-ligand, protein-protein, protein-DNA, and so on. Furthermore, those interactions only happen under chemical- and shapecomplementarity conditions, and usually take place in regions known as binding sites. Typically, a protein consists of four structural levels. The primary structure of a protein is made up of its amino acid sequences (or chains). Its secondary structure essentially comprises -helices and -sheets, which are sub-sequences (or sub-domains) of amino acids of the primary structure. Its tertiary structure results from the composition of sub-domains into domains, which represent the geometric shape of the protein. Finally, the quaternary structure of a protein results from the aggregate of two or more tertiary structures, usually known as a protein complex. This thesis fits in the scope of structure-based drug design and protein docking. Specifically, one addresses the fundamental problem of detecting and identifying protein cavities, which are often seen as tentative binding sites for ligands in protein-ligand interactions. In general, cavity prediction algorithms split into three main categories: energy-based, geometry-based, and evolution-based. Evolutionary methods build upon evolutionary sequence conservation estimates; that is, these methods allow us to detect functional sites through the computation of the evolutionary conservation of the positions of amino acids in proteins. Energy-based methods build upon the computation of interaction energies between protein and ligand atoms. In turn, geometry-based algorithms build upon the analysis of the geometric shape of the protein (i.e., its tertiary structure) to identify cavities. This thesis focuses on geometric methods. We introduce here three new geometric-based algorithms for protein cavity detection. The main contribution of this thesis lies in the use of computer graphics techniques in the analysis and recognition of cavities in proteins, much in the spirit of molecular graphics and modeling. As seen further ahead, these techniques include field-of-view (FoV), voxel ray casting, back-face culling, shape diameter functions, Morse theory, and critical points. The leading idea is to come up with protein shape segmentation, much like we commonly do in mesh segmentation in computer graphics. In practice, protein cavity algorithms are nothing more than segmentation algorithms designed for proteins.Estruturas macromoleculares tais como as proteínas potencializam processos ou funções celulares. Estas funções resultam das interações entre proteínas e peptídeos, substratos catalíticos, nucleótideos, ou até mesmo substâncias químicas produzidas pelo homem. Assim, há vários tipos de interacções: proteína-ligante, proteína-proteína, proteína-DNA e assim por diante. Além disso, estas interações geralmente ocorrem em regiões conhecidas como locais de ligação (binding sites, do inglês) e só acontecem sob condições de complementaridade química e de forma. É também importante referir que uma proteína pode ser estruturada em quatro níveis. A estrutura primária que consiste em sequências de aminoácidos (ou cadeias), a estrutura secundária que compreende essencialmente por hélices e folhas , que são subsequências (ou subdomínios) dos aminoácidos da estrutura primária, a estrutura terciária que resulta da composição de subdomínios em domínios, que por sua vez representa a forma geométrica da proteína, e por fim a estrutura quaternária que é o resultado da agregação de duas ou mais estruturas terciárias. Este último nível estrutural é frequentemente conhecido por um complexo proteico. Esta tese enquadra-se no âmbito da conceção de fármacos baseados em estrutura e no acoplamento de proteínas. Mais especificamente, aborda-se o problema fundamental da deteção e identificação de cavidades que são frequentemente vistos como possíveis locais de ligação (putative binding sites, do inglês) para os seus ligantes (ligands, do inglês). De forma geral, os algoritmos de identificação de cavidades dividem-se em três categorias principais: baseados em energia, geometria ou evolução. Os métodos evolutivos baseiam-se em estimativas de conservação das sequências evolucionárias. Isto é, estes métodos permitem detectar locais funcionais através do cálculo da conservação evolutiva das posições dos aminoácidos das proteínas. Em relação aos métodos baseados em energia estes baseiam-se no cálculo das energias de interação entre átomos da proteína e do ligante. Por fim, os algoritmos geométricos baseiam-se na análise da forma geométrica da proteína para identificar cavidades. Esta tese foca-se nos métodos geométricos. Apresentamos nesta tese três novos algoritmos geométricos para detecção de cavidades em proteínas. A principal contribuição desta tese está no uso de técnicas de computação gráfica na análise e reconhecimento de cavidades em proteínas, muito no espírito da modelação e visualização molecular. Como pode ser visto mais à frente, estas técnicas incluem o field-of-view (FoV), voxel ray casting, back-face culling, funções de diâmetro de forma, a teoria de Morse, e os pontos críticos. A ideia principal é segmentar a proteína, à semelhança do que acontece na segmentação de malhas em computação gráfica. Na prática, os algoritmos de detecção de cavidades não são nada mais que algoritmos de segmentação de proteínas

    Interactive Visualization of Molecular Dynamics Simulation Data

    Get PDF
    Molecular Dynamics Simulations (MD) plays an essential role in the field of computational biology. The simulations produce extensive high-dimensional, spatio-temporal data describ-ing the motion of atoms and molecules. A central challenge in the field is the extraction and visualization of useful behavioral patterns from these simulations. Throughout this thesis, I collaborated with a computational biologist who works on Molecular Dynamics (MD) Simu-lation data. For the sake of exploration, I was provided with a large and complex membrane simulation. I contributed solutions to his data challenges by developing a set of novel visual-ization tools to help him get a better understanding of his simulation data. I employed both scientific and information visualization, and applied concepts of abstraction and dimensions projection in the proposed solutions. The first solution enables the user to interactively fil-ter and highlight dynamic and complex trajectory constituted by motions of molecules. The molecular dynamic trajectories are identified based on path length, edge length, curvature, and normalized curvature, and their combinations. The tool exploits new interactive visual-ization techniques and provides a combination of 2D-3D path rendering in a dual dimension representation to highlight differences arising from the 2D projection on a plane. The sec-ond solution introduces a novel abstract interaction space for Protein-Lipid interaction. The proposed solution addresses the challenge of visualizing complex, time-dependent interactions between protein and lipid molecules. It also proposes a fast GPU-based implementation that maps lipid-constituents involved in the interaction onto the abstract protein interaction space. I also introduced two abstract level-of-detail (LoD) representations with six levels of detail for lipid molecules and protein interaction. Finally, I proposed a novel framework consisting of four linked views: A time-dependent 3D view, a novel hybrid view, a clustering timeline, and a details-on-demand window. The framework exploits abstraction and projection to enable the user to study the molecular interaction and the behavior of the protein-protein interaction and clusters. I introduced a selection of visual designs to convey the behavior of protein-lipid interaction and protein-protein interaction through a unified coordinate system. Abstraction is used to present proteins in hybrid 2D space, and a projected tiled space is used to present both Protein-Lipid Interaction (PLI) and Protein-Protein Interaction (PPI) at the particle level in a heat-map style visual design. Glyphs are used to represent PPI at the molecular level. I coupled visually separable visual designs in a unified coordinate space. The result lets the user study both PLI and PPI separately, or together in a unified visual analysis framework

    Visualization for the Physical Sciences

    Get PDF

    Boosting analyses in the life sciences via clusters, grids and clouds

    Get PDF
    In the last 20 years, computational methods have become an important part of developing emerging technologies for the field of bioinformatics and biomedicine. Those methods rely heavily on large scale computational resources as they need to manage Tbytes or Pbytes of data with large-scale structural and functional relationships, TFlops or PFlops of computing power for simulating highly complex models, or many-task processes and workflows for processing and analyzing data. This special issue contains papers showing existing solutions and latest developments in Life Sciences and Computing Sciences to collaboratively explore new ideas and approaches to successfully apply distributed IT-systems in translational research, clinical intervention, and decision-making. (C) 2016 Published by Elsevier B.V

    Biomolecular electrostatics with continuum models: a boundary integral implementation and applications to biosensors

    Full text link
    The implicit-solvent model uses continuum electrostatic theory to represent the salt solution around dissolved biomolecules, leading to a coupled system of the Poisson-Boltzmann and Poisson equations. This thesis uses the implicit-solvent model to study solvation, binding and adsorption of proteins. We developed an implicit-solvent model solver that uses the boundary element method (BEM), called PyGBe. BEM numerically solves integral equations along the biomolecule-solvent interface only, therefore, it does not need to discretize the entire domain. PyGBe accelerates the BEM with a treecode algorithm and runs on graphic processing units. We performed extensive verification and validation of the code, comparing it with experimental observations, analytical solutions, and other numerical tools. Our results suggest that a BEM approach is more appropriate than volumetric based methods, like finite-difference or finite-element, for high accuracy calculations. We also discussed the effect of features like solvent-filled cavities and Stern layers in the implicit-solvent model, and realized that they become relevant in binding energy calculations. The application that drove this work was nano-scale biosensors-- devices designed to detect biomolecules. Biosensors are built with a functionalized layer of ligand molecules, to which the target molecule binds when it is detected. With our code, we performed a study of the orientation of proteins near charged surfaces, and investigated the ideal conditions for ligand molecule adsorption. Using immunoglobulin G as a test case, we found out that low salt concentration in the solvent and high positive surface charge density leads to favorable orientations of the ligand molecule for biosensing applications. We also studied the plasmonic response of localized surface plasmon resonance (LSPR) biosensors. LSPR biosensors monitor the plasmon resonance frequency of metallic nanoparticles, which shifts when a target molecule binds to a ligand molecule. Electrostatics is a valid approximation to the LSPR biosensor optical phenomenon in the long-wavelength limit, and BEM was able to reproduce the shift in the plasmon resonance frequency as proteins approach the nanoparticle

    A robust machine learning approach for the prediction of allosteric binding sites

    Get PDF
    Previously held under moratorium from 28 March 2017 until 28 March 2022Allosteric regulatory sites are highly prized targets in drug discovery. They remain difficult to detect by conventional methods, with the vast majority of known examples being found serendipitously. Herein, a rigorous, wholly-computational protocol is presented for the prediction of allosteric sites. Previous attempts to predict the location of allosteric sites by computational means drew on only a small amount of data. Moreover, no attempt was made to modify the initial crystal structure beyond the in silico deletion of the allosteric ligand. This behaviour can leave behind a conformation with a significant structural deformation, often betraying the location of the allosteric binding site. Despite this artificial advantage, modest success rates are observed at best. This work addresses both of these issues. A set of 60 protein crystal structures with known allosteric modulators was collected. To remove the imprint on protein structure caused by the presence of bound modulators, molecular dynamics was performed on each protein prior to analysis. A wide variety of analytical techniques were then employed to extract meaningful data from the trajectories. Upon fusing them into a single, coherent dataset, random forest - a machine learning algorithm - was applied to train a high performance classification model. After successive rounds of optimisation, the final model presented in this work correctly identified the allosteric site for 72% of the proteins tested. This is not only an improvement over alternative strategies in the literature; crucially, this method is unique among site prediction tools in that is does not abuse crystal structures containing imprints of bound ligands - of key importance when making live predictions, where no allosteric regulatory sites are known.Allosteric regulatory sites are highly prized targets in drug discovery. They remain difficult to detect by conventional methods, with the vast majority of known examples being found serendipitously. Herein, a rigorous, wholly-computational protocol is presented for the prediction of allosteric sites. Previous attempts to predict the location of allosteric sites by computational means drew on only a small amount of data. Moreover, no attempt was made to modify the initial crystal structure beyond the in silico deletion of the allosteric ligand. This behaviour can leave behind a conformation with a significant structural deformation, often betraying the location of the allosteric binding site. Despite this artificial advantage, modest success rates are observed at best. This work addresses both of these issues. A set of 60 protein crystal structures with known allosteric modulators was collected. To remove the imprint on protein structure caused by the presence of bound modulators, molecular dynamics was performed on each protein prior to analysis. A wide variety of analytical techniques were then employed to extract meaningful data from the trajectories. Upon fusing them into a single, coherent dataset, random forest - a machine learning algorithm - was applied to train a high performance classification model. After successive rounds of optimisation, the final model presented in this work correctly identified the allosteric site for 72% of the proteins tested. This is not only an improvement over alternative strategies in the literature; crucially, this method is unique among site prediction tools in that is does not abuse crystal structures containing imprints of bound ligands - of key importance when making live predictions, where no allosteric regulatory sites are known
    corecore