215 research outputs found

    Structural Investigation of Binding Events in Proteins

    Full text link
    Understanding the biophysical properties that describe protein binding events has allowed for the advancement of drug discovery through structure-based drug design and in silico methodology. The accuracy of these in silico methods depends entirely on the parameters that we determine for them. Many of these parameters are derived from the structural information we have obtained as a community and therein resides the importance of integrity of the quality of this structural data. First, the curation and contents of the Binding MOAD database are extensively described. This database serves as a repository of 25,759 high-quality, ligand-bound X-ray protein crystal structures complemented by 9138 hand-curated binding affinity data for as many of those ligands as appropriate. The newly implemented extended binding site feature is presented, establishing more robust definitions of ligand binding sites than those provided by other databases. Finally, the contents of Binding MOAD are compared to similar databases, establishing the value of our dataset and which purposes it best serves. Second, a robust dataset of 305 unique protein sequences with at least two ligand-bound and two ligand-free structures for each unique protein is cultivated from Binding MOAD and the PDB. Protein flexibility is assessed using C-alpha RMSD for backbone motion and chi-1 angles to quantify side-chain motions. We establish that there is no statistically significant difference between the available conformational space for the backbones or the side chains of unbound proteins when compared to their bound structures. Examining the change in occupied conformational space upon ligand binding reveals a statistically significant increase in backbone conformational space of miniscule magnitude, but a significant increase of side-chain conformational space. To quantify the conformational space available to the side chains, flexibility profiles are established for each amino acid. We found no correlation between backbone and side-chain flexibility. Parallels are then made to common practices in flexible docking techniques. Six binding-site prediction algorithms are then benchmarked on a derivation of the previously established dataset of 305 proteins. We assessed the performance of ligand-bound vs ligand-free structures with these methods and concluded that five of the six methods showed no preference for either structure type. The remaining method, Fpocket, showed decreased performance for ligand-free structures. There was a staggering amount of inconsistency in performance with the methods; different structures of the exact same protein could achieve wildly different rates of success with the same method. The performance of individual structures for all six methods indicated that success and failure rates were seemingly random. Finally, we establish no correlation between the performance of the same structures with different methods, or the performance of the structures with structure resolution, Cruickshank DPI, or number of unresolved residues in their binding sites. Last, we examine the chemical and physical properties of protein-protein interactions (PPIs) with regard to their geometric location in the interface. First, we found that the relative elevation changes of the protein interface landscapes demonstrate that these interfaces are not as flat as previously described. Second, the hollows of druggable PPI interfaces are more sharply shaped and nonpolar in nature, and the protrusions of these druggable PPI interfaces are very polar in character. Last, no correlations exist between the binding affinity describing the subunits of a PPI and other physical and chemical parameters that we measured.PHDMedicinal ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145943/1/jordanjc_1.pd

    Interactive Visualization of Molecular Dynamics Simulation Data

    Get PDF
    Molecular Dynamics Simulations (MD) plays an essential role in the field of computational biology. The simulations produce extensive high-dimensional, spatio-temporal data describ-ing the motion of atoms and molecules. A central challenge in the field is the extraction and visualization of useful behavioral patterns from these simulations. Throughout this thesis, I collaborated with a computational biologist who works on Molecular Dynamics (MD) Simu-lation data. For the sake of exploration, I was provided with a large and complex membrane simulation. I contributed solutions to his data challenges by developing a set of novel visual-ization tools to help him get a better understanding of his simulation data. I employed both scientific and information visualization, and applied concepts of abstraction and dimensions projection in the proposed solutions. The first solution enables the user to interactively fil-ter and highlight dynamic and complex trajectory constituted by motions of molecules. The molecular dynamic trajectories are identified based on path length, edge length, curvature, and normalized curvature, and their combinations. The tool exploits new interactive visual-ization techniques and provides a combination of 2D-3D path rendering in a dual dimension representation to highlight differences arising from the 2D projection on a plane. The sec-ond solution introduces a novel abstract interaction space for Protein-Lipid interaction. The proposed solution addresses the challenge of visualizing complex, time-dependent interactions between protein and lipid molecules. It also proposes a fast GPU-based implementation that maps lipid-constituents involved in the interaction onto the abstract protein interaction space. I also introduced two abstract level-of-detail (LoD) representations with six levels of detail for lipid molecules and protein interaction. Finally, I proposed a novel framework consisting of four linked views: A time-dependent 3D view, a novel hybrid view, a clustering timeline, and a details-on-demand window. The framework exploits abstraction and projection to enable the user to study the molecular interaction and the behavior of the protein-protein interaction and clusters. I introduced a selection of visual designs to convey the behavior of protein-lipid interaction and protein-protein interaction through a unified coordinate system. Abstraction is used to present proteins in hybrid 2D space, and a projected tiled space is used to present both Protein-Lipid Interaction (PLI) and Protein-Protein Interaction (PPI) at the particle level in a heat-map style visual design. Glyphs are used to represent PPI at the molecular level. I coupled visually separable visual designs in a unified coordinate space. The result lets the user study both PLI and PPI separately, or together in a unified visual analysis framework

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Structural Cheminformatics for Kinase-Centric Drug Design

    Get PDF
    Drug development is a long, expensive, and iterative process with a high failure rate, while patients wait impatiently for treatment. Kinases are one of the main drug targets studied for the last decades to combat cancer, the second leading cause of death worldwide. These efforts resulted in a plethora of structural, chemical, and pharmacological kinase data, which are collected in the KLIFS database. In this thesis, we apply ideas from structural cheminformatics to the rich KLIFS dataset, aiming to provide computational tools that speed up the complex drug discovery process. We focus on methods for target prediction and fragment-based drug design that study characteristics of kinase binding sites (also called pockets). First, we introduce the concept of computational target prediction, which is vital in the early stages of drug discovery. This approach identifies biological entities such as proteins that may (i) modulate a disease of interest (targets or on-targets) or (ii) cause unwanted side effects due to their similarity to on-targets (off-targets). We focus on the research field of binding site comparison, which lacked a freely available and efficient tool to determine similarities between the highly conserved kinase pockets. We fill this gap with the novel method KiSSim, which encodes and compares spatial and physicochemical pocket properties for all kinases (kinome) that are structurally resolved. We study kinase similarities in the form of kinome-wide phylogenetic trees and detect expected and unexpected off-targets. To allow multiple perspectives on kinase similarity, we propose an automated and production-ready pipeline; user-defined kinases can be inspected complementarily based on their pocket sequence and structure (KiSSim), pocket-ligand interactions, and ligand profiles. Second, we introduce the concept of fragment-based drug design, which is useful to identify and optimize active and promising molecules (hits and leads). This approach identifies low-molecular-weight molecules (fragments) that bind weakly to a target and are then grown into larger high-affinity drug-like molecules. With the novel method KinFragLib, we provide a fragment dataset for kinases (fragment library) by viewing kinase inhibitors as combinations of fragments. Kinases have a highly conserved pocket with well-defined regions (subpockets); based on the subpockets that they occupy, we fragment kinase inhibitors in experimentally resolved protein-ligand complexes. The resulting dataset is used to generate novel kinase-focused molecules that are recombinations of the previously fragmented kinase inhibitors while considering their subpockets. The KinFragLib and KiSSim methods are published as freely available Python tools. Third, we advocate for open and reproducible research that applies FAIR principles ---data and software shall be findable, accessible, interoperable, and reusable--- and software best practices. In this context, we present the TeachOpenCADD platform that contains pipelines for computer-aided drug design. We use open source software and data to demonstrate ligand-based applications from cheminformatics and structure-based applications from structural bioinformatics. To emphasize the importance of FAIR data, we dedicate several topics to accessing life science databases such as ChEMBL, PubChem, PDB, and KLIFS. These pipelines are not only useful to novices in the field to gain domain-specific skills but can also serve as a starting point to study research questions. Furthermore, we show an example of how to build a stand-alone tool that formalizes reoccurring project-overarching tasks: OpenCADD-KLIFS offers a clean and user-friendly Python API to interact with the KLIFS database and fetch different kinase data types. This tool has been used in this thesis and beyond to support kinase-focused projects. We believe that the FAIR-based methods, tools, and pipelines presented in this thesis (i) are valuable additions to the toolbox for kinase research, (ii) provide relevant material for scientists who seek to learn, teach, or answer questions in the realm of computer-aided drug design, and (iii) contribute to making drug discovery more efficient, reproducible, and reusable

    Visualization of large molecular trajectories

    Get PDF
    The analysis of protein-ligand interactions is a time-intensive task. Researchers have to analyze multiple physico-chemical properties of the protein at once and combine them to derive conclusions about the protein-ligand interplay. Typically, several charts are inspected, and 3D animations can be played side-by-side to obtain a deeper understanding of the data. With the advances in simulation techniques, larger and larger datasets are available, with up to hundreds of thousands of steps. Unfortunately, such large trajectories are very difficult to investigate with traditional approaches. Therefore, the need for special tools that facilitate inspection of these large trajectories becomes substantial. In this paper, we present a novel system for visual exploration of very large trajectories in an interactive and user-friendly way. Several visualization motifs are automatically derived from the data to give the user the information about interactions between protein and ligand. Our system offers specialized widgets to ease and accelerate data inspection and navigation to interesting parts of the simulation. The system is suitable also for simulations where multiple ligands are involved. We have tested the usefulness of our tool on a set of datasets obtained from protein engineers, and we describe the expert feedback.Peer ReviewedPostprint (author's final draft

    Development of unsupervised learning methods with applications to life sciences data

    Get PDF
    Machine Learning makes computers capable of performing tasks typically requiring human intelligence. A domain where it is having a considerable impact is the life sciences, allowing to devise new biological analysis protocols, develop patients’ treatments efficiently and faster, and reduce healthcare costs. This Thesis work presents new Machine Learning methods and pipelines for the life sciences focusing on the unsupervised field. At a methodological level, two methods are presented. The first is an “Ab Initio Local Principal Path” and it is a revised and improved version of a pre-existing algorithm in the manifold learning realm. The second contribution is an improvement over the Import Vector Domain Description (one-class learning) through the Kullback-Leibler divergence. It hybridizes kernel methods to Deep Learning obtaining a scalable solution, an improved probabilistic model, and state-of-the-art performances. Both methods are tested through several experiments, with a central focus on their relevance in life sciences. Results show that they improve the performances achieved by their previous versions. At the applicative level, two pipelines are presented. The first one is for the analysis of RNA-Seq datasets, both transcriptomic and single-cell data, and is aimed at identifying genes that may be involved in biological processes (e.g., the transition of tissues from normal to cancer). In this project, an R package is released on CRAN to make the pipeline accessible to the bioinformatic Community through high-level APIs. The second pipeline is in the drug discovery domain and is useful for identifying druggable pockets, namely regions of a protein with a high probability of accepting a small molecule (a drug). Both these pipelines achieve remarkable results. Lastly, a detour application is developed to identify the strengths/limitations of the “Principal Path” algorithm by analyzing Convolutional Neural Networks induced vector spaces. This application is conducted in the music and visual arts domains

    Do All Roads Really Lead to Rome? Learnings from Comparative Analysis using SPR, NMR, & X-Ray Crystallography to Optimize Fragment Screening in Drug Discovery.

    Get PDF
    There are several biophysical methods developed to rapidly identify weakly binding fragments to a target protein. X-ray crystallography provides structural information that is crucial for fragment optimization, however there are several criteria that must be met for a successful fragment screening including the production of soakable and well-diffracting crystals. Therefore, having a reliable cascade of screening methods to be used as pre-screens prior to labor-intensive X-ray crystallography would be extremely beneficial. This would allow the filtering of compounds as the screening progresses so that only the most promising hits remain. But which method should be the one to start the screening cascade? In this work, various sets of fragment libraries were screened against three different proteins; namely tRNA guanine transglycosylase (TGT) an important protein in Shigella, membrane associated protein peroxin 14 (PEX14) of T. Brucei, and endothiapepsin (EP), to investigate whether different screening methods will reveal similar collections of putative binders. The detailed comparative analysis of the findings obtained by the different methods is discussed in this thesis. Shigellosis, an acute bacterial infection of the intestine, is caused by the gram-negative Shigella bacterium whose pathogenicity is reliant on virulence factors (VirF) required to invade epithelial cells. The expression of these VirF is modulated by TGT. Strategies developed to inhibit TGT include potent active-site inhibitors to block the binding of tRNA, thereby preventing the transcription of the virulence factors. Our 96-fragment library was screened against TGT using SPR, NMR, and X-ray crystallography, as described in Chapter 2. A total of 81 fragments were screened in SPR using a direct binding assay approach, revealing a hit rate of 12%. A total of 77 fragments were screened in NMR revealing a hit rate of 29%. High-resolution crystal structures were also collected for the entire fragment library by soaking, revealing a hit rate of 8%. Upon comparison of all discovered fragment hits no overlaps from all three methods were found. Several factors are responsible for this finding such as exclusion of fragments from individual screens due to technical reasons. In detail, four X-ray hits were excluded from the SPR and NMR screens, two SPR hits were discarded from the NMR screen, and five NMR hits were never subjected to the SPR screen. SPR and NMR are currently the most commonly applied primary fragment screening techniques, however, our results suggest that if they would have been applied as incipient methods of a screening cascade, they would have missed three binders discovered by a subsequently applied, more elaborate crystallographic screen. X-ray crystallography allows the detection of specific binders that may be too weak binders to be detected by SPR and even by NMR but can still provide valid structural information to support the search for appropriate starting points in lead discovery. Additionally, MD simulations of the apo wild type TGT have predicted the opening of a transient sub-pocket located above the guanine/preQ1 pocket, which suggested a strategy to target this new binding site for the design of new inhibitors against TGT following a structure- based drug design concept which is also discussed in section 2.3. The human African trypanosomiasis (HAT), also known as the sleeping sickness, is a vector-borne parasitic disease caused by T. brucei and transmitted to humans by bites of the tsetse fly. T. brucei lacks feedback allosteric regulation of early steps in glycolysis but compartmentalizes the relevant enzymes within organelles called glycosomes. PEX14, a peroxin protein essential for biogenesis of glycosomes, forms an important protein-protein interaction with PEX5, an import receptor that transports cytoplasmic glycosomal enzymes into the organelle. Disrupting the PEX14/PEX5 interaction leads to the accumulation of glycosomal enzymes in the cytosol, depletion of ATP, glucose toxicity, metabolic collapse and death of T. brucei. This disruption can be achieved through small molecules that bind to and block PEX14, preventing PEX5 binding. A previous NMR screening of a fragment library resulted in fragment hits that bind to the N-terminal domain (NTD) of T. brucei PEX14. In this project, we attempted to validate these hits through X-ray crystallography by soaking, to allow visualization of the fragment interactions. The promising fragment hits would then be optimized into more potent lead compounds. Crystallization of the NTD PEX14 with a mutation in the first residue (E1W) revealed blocked binding pockets, as described in Chapter 3. The purpose of the added tryptophan was to render fluorescent properties to the short NTD construct which lacked fluorescent amino acids. However, this tryptophan was found to block the binding pockets of its neighboring crystal mates in the protein crystal, rendering a crystal form impossible to use for soaking. Attempts to find new crystal forms with free pockets were unsuccessful, as the small size of the protein and the hydrophobic nature of tryptophan rendered tightly packed protein crystals that block the binding pockets of neighboring crystal mates. Virtual Screening to discover novel ligands for co-crystallization revealed a ligand that aids the crystallization of the E1W PEX14 variant in the same space group but with a slightly different packing. This produced a crystal form that proved successful for fragment soaking as it enabled the binding of two additional fragment hits binding to further protein pockets. Additionally, the wild type form of PEX14 which lacks the tryptophan residue and thus has free binding pockets was crystallized. This enabled the soaking of a previously designed lead compound in different pockets of the PEX5 binding site. By obtaining a crystal structure of this complex at a resolution of 1.8 Å, the feasibility of using wild type PEX14 crystals for further fragment screening has been demonstrated. Endothiapepsin is a member of the pepsin-like aspartic proteases responsible for the hydrolytic cleavage of peptide substrates. Owing to its high degree of similarity to other pharmacologically relevant aspartic proteases, it has served as the model enzyme for studying their mechanism and to discover first lead structures. In previous work done by other members from our group to identify and characterize endothiapepsin binders, X-ray crystallography was consulted as a primary fragment screening method and its hit identification potential was compared to several biochemical and biophysical screening methods. The fragment library screened was designed for general purposes and contained 361 entries. Comparison of the overlap in the hit rates of the different methods to that of X-ray crystallography revealed a low overlap, with the RDA having the highest overlap at 7% and MS having the lowest overlap at 1% followed by STD NMR at 3%. To understand the reason behind the low overlap, two of these screening techniques were prioritized for closer analysis as described in Chapter 4. The 71 X-ray detected fragment hits were selected and rescreened again with STD NMR under slightly different buffer conditions, in addition to WaterLOGSY NMR experiments. The second STD NMR screen detected almost double the amount of hits as the initial one, and the Water LOGSY screen had the highest correlation from the NMR methods to the X-ray hits at 69%. This comparative analysis also revealed the phenomena of active site fragment displacement by use of so-called reporter ligands and that non-deuterated water in STD NMR may lead to false negatives. The entire 361 fragment library was also screened with SPR using an inhibition in solution assay, adding another biophysical method for our comparative analysis to give us further insight of which conditions are crucial to maintain while transferring across different techniques. The resulting hit rate from SPR was 34%, correlating to an overlap of 11% with the X-ray hits - the highest correlation between screening methods reported by us thus far. Finally, we also studied fragment detection and cocktailing in crystallography in comparison to fragment cocktailing in NMR. From this we concluded that cocktailing in crystallography can also lead to false negatives due to fragment competitive behavior and can reveal a different binding mode for a given fragment compared to the adopted geometry found when soaked individually. As for NMR, despite the ability to detect competitive binding of fragments due to the temporary binding and unbinding events, the parallel binding and thus detection of fragments is not always guaranteed as seen in 20% of the fragments we screened, in addition to our observation that the detection of fragments in cocktail NMR may also depend on the comparison of the cocktail set they are a part of

    Microparticle-Based Biosensors for Anthropogenic Analytes

    Get PDF
    Anthropogenic pollution of water resources and the environment by various hazardous compounds and classes of substances raises concerns about public health impacts and environmental damage. Commercially available, portable and easy-to-use devices to detect and quantify these compounds are rather sparse, but would contribute to comprehensive monitoring and reliable risk assessment. The Soft Colloidal Probe (SCP) assay is a promising platform for the development of portable analytical devices and thus has a great potential for a transfer to industry. This assay is based on the differential deformation of an elastic particle, i.e., the SCP, as a function of analyte concentration, which affects the extent of interfacial interactions between the SCP and a biochip surface. The objective of this work was to adapt this assay for the detection of anthropogenic pollutants. Biomimetic molecular recognition approaches were used based on naturally occurring target proteins that specifically bind the anthropogenic pollutants of interest. This adaptation included the elaboration of strategies for site-specific immobilization of the respective proteins and functionalization of SCPs. In this work, it is demonstrated that the SCP method can be employed for the highly specific and sensitive detection of the critically discussed pesticide glyphosate by using the target enzyme 5-enolpyruvylshikimate-3-phosphate synthase. Furthermore, a specific detection scheme for estrogens and compounds with estrogenic and antiestrogenic activity was developed by harnessing estrogen sulfotransferase as the biomimetic recognition element. In the second part of the thesis, improvements of the SCP sensing methodology are described. These improvements were achieved by accelerating data analysis and developing a novel synthesis method for SCPs that ensures monodisperse particles with superior reproducibility. Rapid extraction of interaction energies is achieved by using a pattern matching algorithm that reduces the time required for data analysis to a fraction. The microfluidics-assisted synthesis of SCPs enables the production of highly monodisperse SCPs with adjustable size and mechanical properties. Various functionalization approaches have been developed that allow easy and modular introduction of functional groups and biomolecules for SCP-based sensing approaches
    corecore