3,001 research outputs found

    A Constraint Solver for Flexible Protein Models

    Get PDF
    This paper proposes the formalization and implementation of a novel class of constraints aimed at modeling problems related to placement of multi-body systems in the 3-dimensional space. Each multi-body is a system composed of body elements, connected by joint relationships and constrained by geometric properties. The emphasis of this investigation is the use of multi-body systems to model native conformations of protein structures---where each body represents an entity of the protein (e.g., an amino acid, a small peptide) and the geometric constraints are related to the spatial properties of the composing atoms. The paper explores the use of the proposed class of constraints to support a variety of different structural analysis of proteins, such as loop modeling and structure prediction. The declarative nature of a constraint-based encoding provides elaboration tolerance and the ability to make use of any additional knowledge in the analysis studies. The filtering capabilities of the proposed constraints also allow to control the number of representative solutions that are withdrawn from the conformational space of the protein, by means of criteria driven by uniform distribution sampling principles. In this scenario it is possible to select the desired degree of precision and/or number of solutions. The filtering component automatically excludes configurations that violate the spatial and geometric properties of the composing multi-body system. The paper illustrates the implementation of a constraint solver based on the multi-body perspective and its empirical evaluation on protein structure analysis problems

    CLP-based protein fragment assembly

    Full text link
    The paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict the 3D conformation of a protein via fragments assembly. The fragments are extracted by a preprocessor-also developed for this work- from a database of known protein structures that clusters and classifies the fragments according to similarity and frequency. The problem of assembling fragments into a complete conformation is mapped to a constraint solving problem and solved using CLP. The constraint-based model uses a medium discretization degree Ca-side chain centroid protein model that offers efficiency and a good approximation for space filling. The approach adapts existing energy models to the protein representation used and applies a large neighboring search strategy. The results shows the feasibility and efficiency of the method. The declarative nature of the solution allows to include future extensions, e.g., different size fragments for better accuracy.Comment: special issue dedicated to ICLP 201

    Ab initio RNA folding

    Full text link
    RNA molecules are essential cellular machines performing a wide variety of functions for which a specific three-dimensional structure is required. Over the last several years, experimental determination of RNA structures through X-ray crystallography and NMR seems to have reached a plateau in the number of structures resolved each year, but as more and more RNA sequences are being discovered, need for structure prediction tools to complement experimental data is strong. Theoretical approaches to RNA folding have been developed since the late nineties when the first algorithms for secondary structure prediction appeared. Over the last 10 years a number of prediction methods for 3D structures have been developed, first based on bioinformatics and data-mining, and more recently based on a coarse-grained physical representation of the systems. In this review we are going to present the challenges of RNA structure prediction and the main ideas behind bioinformatic approaches and physics-based approaches. We will focus on the description of the more recent physics-based phenomenological models and on how they are built to include the specificity of the interactions of RNA bases, whose role is critical in folding. Through examples from different models, we will point out the strengths of physics-based approaches, which are able not only to predict equilibrium structures, but also to investigate dynamical and thermodynamical behavior, and the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure

    Structure- and Ligand-Based Design of Novel Antimicrobial Agents

    Get PDF
    The use of computer based techniques in the design of novel therapeutic agents is a rapidly emerging field. Although the drug-design techniques utilized by Computational Medicinal Chemists vary greatly, they can roughly be classified into structure-based and ligand-based approaches. Structure-based methods utilize a solved structure of the design target, protein or DNA, usually obtained by X-ray or NMR methods to design or improve compounds with activity against the target. Ligand-based methods use active compounds with known affinity for a target that may yet be unresolved. These methods include Pharmacophore-based searching for novel active compounds or Quantitative Structure-Activity Relationship (QSAR) studies. The research presented here utilized both structure and ligand-based methods against two bacterial targets: Bacillus anthracis and Mycobacterium tuberculosis. The first part of this thesis details our efforts to design novel inhibitors of the enzyme dihydropteroate synthase from B. anthracis using crystal structures with known inhibitors bound. The second part describes a QSAR study that was performed using a series of novel nitrofuranyl compounds with known, whole-cell, inhibitory activity against M. tuberculosis. Dihydropteroate synthase (DHPS) catalyzes the addition of p-amino benzoic acid (pABA) to dihydropterin pyrophosphate (DHPP) to form pteroic acid as a key step in bacterial folate biosynthesis. It is the traditional target of the sulfonamide class of antibiotics. Unfortunately, bacterial resistance and adverse effects have limited the clinical utility of the sulfonamide antibiotics. Although six bacterial crystal structures are available, the flexible loop regions that enclose pABA during binding and contain key sulfonamide resistance sites have yet to be visualized in their functional conformation. To gain a new understanding of the structural basis of sulfonamide resistance, the molecular mechanism of DHPS action, and to generate a screening structure for high-throughput virtual screening, molecular dynamics simulations were applied to model the conformations of the unresolved loops in the active site. Several series of molecular dynamics simulations were designed and performed utilizing enzyme substrates and inhibitors, a transition state analog, and a pterin-sulfamethoxazole adduct. The positions of key mutation sites conserved across several bacterial species were closely monitored during these analyses. These residues were shown to interact closely with the sulfonamide binding site. The simulations helped us gain new understanding of the positions of the flexible loops during inhibitor binding that has allowed the development of a DHPS structural model that could be used for high-through put virtual screening (HTVS). Additionally, insights gained on the location and possible function of key mutation sites on the flexible loops will facilitate the design of new, potent inhibitors of DHPS that can bypass resistance mutations that render sulfonamides inactive. Prior to performing high-throughput virtual screening, the docking and scoring functions to be used were validated using established techniques against the B. anthracis DHPS target. In this validation study, five commonly used docking programs, FlexX, Surflex, Glide, GOLD, and DOCK, as well as nine scoring functions, were evaluated for their utility in virtual screening against the novel pterin binding site. Their performance in ligand docking and virtual screening against this target was examined by their ability to reproduce a known inhibitor conformation and to correctly detect known active compounds seeded into three separate decoy sets. Enrichment was demonstrated by calculated enrichment factors at 1% and Receiver Operating Characteristic (ROC) curves. The effectiveness of post-docking relaxation prior to rescoring and consensus scoring were also evaluated. Of the docking and scoring functions evaluated, Surflex with SurflexScore and Glide with GlideScore performed best overall for virtual screening against the DHPS target. The next phase of the DHPS structure-based drug design project involved high-throughput virtual screening against the DHPS structural model previously developed and docking methodology validated against this target. Two general virtual screening methods were employed. First, large, virtual libraries were pre-filtered by 3D pharmacophore and modified Rule-of-Three fragment constraints. Nearly 5 million compounds from the ZINC databases were screened generating 3,104 unique, fragment-like hits that were subsequently docked and ranked by score. Second, fragment docking without pharmacophore filtering was performed on almost 285,000 fragment-like compounds obtained from databases of commercial vendors. Hits from both virtual screens with high predicted affinity for the pterin binding pocket, as determined by docking score, were selected for in vitro testing. Activity and structure-activity relationship of the active fragment compounds have been developed. Several compounds with micromolar activity were identified and taken to crystallographic trials. Finally, in our ligand-based research into M. tuberculosis active agents, a series of nitrofuranylamide and related aromatic compounds displaying potent activity was investigated utilizing 3-Dimensional Quantitative Structure-Activity Relationship (3D-QSAR) techniques. Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA) methods were used to produce 3D-QSAR models that correlated the Minimum Inhibitory Concentration (MIC) values against M. tuberculosis with the molecular structures of the active compounds. A training set of 95 active compounds was used to develop the models, which were then evaluated by a series of internal and external cross-validation techniques. A test set of 15 compounds was used for the external validation. Different alignment and ionization rules were investigated as well as the effect of global molecular descriptors including lipophilicity (cLogP, LogD), Polar Surface Area (PSA), and steric bulk (CMR), on model predictivity. Models with greater than 70% predictive ability, as determined by external validation and high internal validity (cross validated r2 \u3e .5) were developed. Incorporation of lipophilicity descriptors into the models had negligible effects on model predictivity. The models developed will be used to predict the activity of proposed new structures and advance the development of next generation nitrofuranyl and related nitroaromatic anti-tuberculosis agents

    De Novo Protein Structure Modeling from Cryoem Data Through a Dynamic Programming Algorithm in the Secondary Structure Topology Graph

    Get PDF
    Proteins are the molecules carry out the vital functions and make more than the half of dry weight in every cell. Protein in nature folds into a unique and energetically favorable 3-Dimensional (3-D) structure which is critical and unique to its biological function. In contrast to other methods for protein structure determination, Electron Cryorricroscopy (CryoEM) is able to produce volumetric maps of proteins that are poorly soluble, large and hard to crystallize. Furthermore, it studies the proteins in their native environment. Unfortunately, the volumetric maps generated by current advances in CryoEM technique produces protein maps at medium resolution about (~5 to 10Ă…) in which it is hard to determine the atomic-structure of the protein. However, the resolution of the volumetric maps is improving steadily, and recent works could obtain atomic models at higher resolutions (~3Ă…). De novo protein modeling is the process of building the structure of the protein using its CryoEM volumetric map. Thereupon, the volumetric maps at medium resolution generated by CryoEM technique proposed a new challenge. At the medium resolution, the location and orientation of secondary structure elements (SSE) can be visually and computationally identified. However, the order and direction (called protein topology) of the SSEs detected from the CryoEM volumetric map are not visible. In order to determine the protein structure, the topology of the SSEs has to be figured out and then the backbone can be built. Consequently, the topology problem has become a bottle neck for protein modeling using CryoEM In this dissertation, we focus to establish an effective computational framework to derive the atomic structure of a protein from the medium resolution CryoEM volumetric maps. This framework includes a topology graph component to rank effectively the topologies of the SSEs and a model building component. In order to generate the small subset of candidate topologies, the problem is translated into a layered graph representation. We developed a dynamic programming algorithm (TopoDP) for the new representation to overcome the problem of large search space. Our approach shows the improved accuracy, speed and memory use when compared with existing methods. However, the generating of such set was infeasible using a brute force method. Therefore, the topology graph component effectively reduces the topological space using the geometrical features of the secondary structures through a constrained K-shortest paths method in our layered graph. The model building component involves the bending of a helix and the loop construction using skeleton of the volumetric map. The forward-backward CCD is applied to bend the helices and model the loops

    Computational design with flexible backbone sampling for protein remodeling and scaffolding of complex binding sites

    Get PDF
    Dissertation presented to obtain the Doutoramento (Ph.D.) degree in Biochemistry at the Instituto de Tecnologia Qu mica e Biol ogica da Universidade Nova de LisboaComputational protein design has achieved several milestones, including the design of a new protein fold, the design of enzymes for reactions that lack natural catalysts, and the re-engineering of protein-protein and protein-DNA binding speci city. These achievements have spurred demand to apply protein design methods to a wider array of research problems. However, the existing computational methods have largely relied on xed-backbone approaches that may limit the scope of problems that can be tackled. Here, we describe four computational protocols - side chain grafting, exible backbone remodeling, backbone grafting, and de novo sca old design - that expand the methodological protein design repertoire, three of which incorporate backbone exibility. Brie y, in the side chain grafting method, side chains of a structural motif are transplanted to a protein with a similar backbone conformation; in exible backbone remodeling, de novo segments of backbone are built and designed; in backbone grafting, structural motifs are explicitly grafted onto other proteins; and in de novo sca olding, a protein is folded and designed around a structural motif. We developed these new methods for the design of epitope-sca old vaccines in which viral neutralization epitopes of known three-dimensional structure were transplanted onto nonviral sca old proteins for conformational stabilization and immune presentation.(...

    Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

    Get PDF
    Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)

    Lead optimization for new antimalarials and Successful lead identification for metalloproteinases: A Fragment-based approach Using Virtual Screening

    Get PDF
    Lead optimization for new antimalarials and Successful lead identification for metalloproteinases: A Fragment-based approach Using Virtual Screening Computer-aided drug design is an essential part of the modern medicinal chemistry, and has led to the acceleration of many projects. The herein described thesis presents examples for its application in the field of lead optimization and lead identification for three metalloproteins. DOXP-reductoisomerase (DXR) is a key enzyme of the mevalonate independent isoprenoid biosynthesis. Structure-activity relationships for 43 DXR inhibitors are established, derived from protein-based docking, ligand-based 3D QSAR and a combination of both approaches as realized by AFMoC. As part of an effort to optimize the properties of the established inhibitor Fosmidomycin, analogues have been synthesized and tested to gain further insights into the primary determinants of structural affinity. Unfortunately, these structures still leave the active Fosmidomycin conformation and detailed reaction mechanism undetermined. This fact, together with the small inhibitor data set provides a major challenge for presently available docking programs and 3D QSAR tools. Using the recently developed protein tailored scoring protocol AFMoC precise prediction of binding affinities for related ligands as well as the capability to estimate the affinities of structurally distinct inhibitors has been achieved. Farnesyltransferase is a zinc-metallo enzyme that catalyzes the posttranslational modification of numerous proteins involved in intracellular signal transduction. The development of farnesyltransferase inhibitors is directed towards the so-called non-thiol inhibitors because of adverse drug effects connected to free thiols. A first step on the way to non-thiol farnesyltransferase inhibitors was the development of an CAAX-benzophenone peptidomimetic based on a pharmacophore model. On its basis bisubstrate analogues were developed as one class of non-thiol farnesyltransferase inhibitors. In further studies two aryl binding and two distinct specificity sites were postulated. Flexible docking of model compounds was applied to investigate the sub-pockets and design highly active non-thiol farnesyltransferase inhibitor. In addition to affinity, special attention was paid towards in vivo activity and species specificity. The second part of this thesis describes a possible strategy for computer-aided lead discovery. Assembling a complex ligand from simple fragments has recently been introduced as an alternative to traditional HTS. While frequently applied experimentally, only a few examples are known for computational fragment-based approaches. Mostly, computational tools are applied to compile the libraries and to finally assess the assembled ligands. Using the metalloproteinase thermolysin (TLN) as a model target, a computational fragment-based screening protocol has been established. Starting with a data set of commercially available chemical compounds, a fragment library has been compiled considering (1) fragment likeness and (2) similarity to known drugs. The library is screened for target specificity, resulting in 112 fragments to target the zinc binding area and 75 fragments targeting the hydrophobic specificity pocket of the enzyme. After analyzing the performance of multiple docking programs and scoring functions forand the most 14 candidates are selected for further analysis. Soaking experiments were performed for reference fragment to derive a general applicable crystallization protocol for TLN and subsequently for new protein-fragment complex structures. 3-Methylsaspirin could be determined to bind to TLN. Additional studies addressed a retrospective performance analysis of the applied scoring functions and modification on the screening hit. Curios about the differences of aspirin and 3-methylaspirin, 3-chloroaspirin has been synthesized and affinities could be determined to be 2.42 mM; 1.73 mM und 522 ÎĽM respectively. The results of the thesis show, that computer aided drug design approaches could successfully support projects in lead optimization and lead identification. fragments in general, the fragments derived from the screening are docke

    In-Depth Analysis of Zero-Length Crosslinking for Structural Mass Spectrometry

    Get PDF
    The completion of the Human Genome Project revealed the sequence identity of essentially every human protein. However, in most cases, amino acid sequences alone convey little implication on the protein static structures, its dynamic conformational changes, and most importantly, its functions. To fully understand the behaviors and properties of macromolecular complexes, solving their 3D structures is necessary and highly critical. Under this rationale, structural genomics collaborations were initiated aiming to determine high-resolution structures of as many proteins and protein folds as possible, relying mostly on X-ray crystallography and NMR spectroscopy. Yet, very large, highly flexible or disordered, and dynamic protein complexes can exceed the capabilities of these high-resolution techniques. Although computational molecular modeling can be utilized, such structures are highly speculative and often inaccurate unless supported by actual experimental data. Structural mass spectrometry recently emerged as an alternative method which can provide medium-resolution spatial information capable of complementing computational approaches, and are applicable to heterogeneous samples with potentially no limit on complex sizes. In particular, chemical crosslinking coupled with mass spectrometry, has recently received considerable interest. Most recent progress focused on developing crosslinkers with special properties such as enrichment tags, isotopic labeling sites, or MS-cleavable bonds along with accompanying data analysis strategies and software packages. These crosslinkers insert their spacer arm between proximal amino acid residues, greatly reducing the stringency of the derived distance constraints. In contrast, zero-length crosslinkers are crosslinks which do not add any extra atoms to the product crosslinked peptides, therefore providing the tightest possible spatial constraints but rendering enrichment and isotopic labeling strategies inapplicable. As a result, zero-length crosslinking received limited attention and no software tools have previously been specifically developed for it. In this thesis project, we developed a multi-tiered mass spectrometry data acquisition and computational data analysis strategy along with a dedicated software tool to enhance identification of zero-length crosslinks in complex samples. Label-free comparison and targeted high-resolution mass spectrometry were utilized to filter out the vast majority of non-crosslinked peptides and increase confidence of crosslink identification, compensating for the lack of enrichment techniques and characteristic MS patterns employed by non-zero-length crosslinking methods. Each step from mass spectrometer acquisition parameters to MS/MS spectra evaluation functions was optimized based on zero-length crosslinking datasets of proteins with known crystal structures. Our pipeline was then applied to probe structures and conformational changes of mini-spectrin, a 90 kDa recombinant protein that closely mimics erythrocyte spectrin\u27s dynamic dimer-tetramer equilibrium. Compared to previous analyses performed in our laboratory, the current strategy more than doubled the number of identified crosslinks and significantly reduced analysis time per experiment from months to just several days. Distance constraints derived from mini-spectrin crosslinks were used as inputs in subsequent homology modeling, allowing development of experimentally-verified medium-resolution structures for wild-type mini-spectrin tetramer and both wild-type and hereditary elliptocytosis (HE) mutant mini-spectrin dimers. The structure models, in combination with independent biophysical experiments, illustrated how such distal HE-related mutations destabilized spectrin dimer-tetramer equilibrium by simultaneously lowering thermal stability of tetramer and giving rise to a more-compact, more-stable closed dimer conformation
    • …
    corecore