187 research outputs found

    A Less-Biased Analysis of Metalloproteins Reveals Novel Zinc Coordination Geometries

    Get PDF
    Zinc metalloproteins are involved in many biological processes and play crucial biochemical roles across all domains of life. Local structure around the zinc ion, especially the coordination geometry (CG), is dictated by the protein sequence and is often directly related to the function of the protein. Current methodologies in characterizing zinc metalloproteins\u27 CG consider only previously reported CG models based mainly on nonbiological chemical context. Exceptions to these canonical CG models are either misclassified or discarded as outliers. Thus, we developed a less-biased method that directly handles potential exceptions without pre-assuming any CG model. Our study shows that numerous exceptions could actually be further classified and that new CG models are needed to characterize them. Also, these new CG models are cross-validated by strong correlation between independent structural and functional annotation distance metrics, which is partially lost if these new CGs models are ignored. Furthermore, these new CG models exhibit functional propensities distinct from the canonical CG models

    Structure-function analysis and characterization of metalloproteins.

    Get PDF
    Metalloproteins are proteins that can bind at least one metal ion as a cofactor. They utilize metal ions for a variety of biological purposes, and are essential for all domains of life. Due to the ubiquity of metalloprotein’s involvement across these processes across all domains of life, how proteins coordinate metal ions for different biochemical functions is of great relevance to understanding the implementation of these biological processes. One of the most important aspects of metal binding is its coordination geometry (CG), which often implies functional activities. Most of the current studies are based on the assumption of previously reported CG models founded mainly in a non-biological chemical context. While this general procedure provides us with great measures on the closest CG model a metal site adopts, it also biases and limits the binding ligand selection and coordination results to the canonical CG models examined. Thus, if a CG model exists that has never be reported previously or is not accounted for in a study, instances from the CG would either be misclassified into an expected model and cause a high in-class variation or considered as outliers. To solve this problem, we have developed our analysis, where the less-biased low-variation measure, bond-length, was used determine the binding ligands and the higher-variation measure, angle, was used to cluster the metal shells into canonical or novel CGs with functional associations. This methodology is model-free, and allows us to derive the CG models from the data itself. Thus, we can handle unknown CGs that may cause problems to the classification methods. This new methodology has enabled the discovery of several previously uncharacterized CGs for zinc and other top abundant metalloproteins. By recognizing these novel/aberrant CGs in our clustering analyses, high correlations were achieved between structural and functional descriptions of metal ion coordination

    Machine learning differentiates enzymatic and non-enzymatic metals in proteins

    Get PDF
    Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between catalytic and non-catalytic metal binding sites, finding physicochemical features that distinguish these two types of metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model’s ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design

    System-specific parameter optimization for non-polarizable and polarizable force fields

    Full text link
    The accuracy of classical force fields (FFs) has been shown to be limited for the simulation of cation-protein systems despite their importance in understanding the processes of life. Improvements can result from optimizing the parameters of classical FFs or by extending the FF formulation by terms describing charge transfer and polarization effects. In this work, we introduce our implementation of the CTPOL model in OpenMM, which extends the classical additive FF formula by adding charge transfer (CT) and polarization (POL). Furthermore, we present an open-source parameterization tool, called FFAFFURR that enables the (system specific) parameterization of OPLS-AA and CTPOL models. The performance of our workflow was evaluated by its ability to reproduce quantum chemistry energies and by molecular dynamics simulations of a Zinc finger protein.Comment: 62 pages and 25 figures (including SI), manuscript to be submitted soo

    A Chemical Interpretation of Protein Electron Density Maps in the Worldwide Protein Data Bank

    Get PDF
    High-quality three-dimensional structural data is of great value for the functional interpretation of biomacromolecules, especially proteins; however, structural quality varies greatly across the entries in the worldwide Protein Data Bank (wwPDB). Since 2008, the wwPDB has required the inclusion of structure factors with the deposition of x-ray crystallographic structures to support the independent evaluation of structures with respect to the underlying experimental data used to derive those structures. However, interpreting the discrepancies between the structural model and its underlying electron density data is difficult, since derived sigma-scaled electron density maps use arbitrary electron density units which are inconsistent between maps from different wwPDB entries. Therefore, we have developed a method that converts electron density values from sigma-scaled electron density maps into units of electrons. With this conversion, we have developed new methods that can evaluate specific regions of an x-ray crystallographic structure with respect to a physicochemical interpretation of its corresponding electron density map. We have systematically compared all deposited x-ray crystallographic protein models in the wwPDB with their underlying electron density maps, if available, and characterized the electron density in terms of expected numbers of electrons based on the structural model. The methods generated coherent evaluation metrics throughout all PDB entries with associated electron density data, which are consistent with visualization software that would normally be used for manual quality assessment. To our knowledge, this is the first attempt to derive units of electrons directly from electron density maps without the aid of the underlying structure factors. These new metrics are biochemically-informative and can be extremely useful for filtering out low-quality structural regions from inclusion into systematic analyses that span large numbers of PDB entries. Furthermore, these new metrics will improve the ability of non-crystallographers to evaluate regions of interest within PDB entries, since only the PDB structure and the associated electron density maps are needed. These new methods are available as a well-documented Python package on GitHub and the Python Package Index under a modified Clear BSD open source license

    Machine Learning Approaches for Metalloproteins

    Get PDF
    Metalloproteins are a family of proteins characterized by metal ion binding, whereby the presence of these ions confers key catalytic and ligand-binding properties. Due to their ubiquity among biological systems, researchers have made immense efforts to predict the structural and functional roles of metalloproteins. Ultimately, having a comprehensive understanding of metalloproteins will lead to tangible applications, such as designing potent inhibitors in drug discovery. Recently, there has been an acceleration in the number of studies applying machine learning to predict metalloprotein properties, primarily driven by the advent of more sophisticated machine learning algorithms. This review covers how machine learning tools have consolidated and expanded our comprehension of various aspects of metalloproteins (structure, function, stability, ligand-binding interactions, and inhibitors). Future avenues of exploration are also discussed

    High Resolution Crystal Structures of the Wild Type and Cys-55 right-arrow Ser and Cys-59 right-arrow Ser Variants of the Thioredoxin-like [2Fe-2S] Ferredoxin from Aquifex aeolicus

    Get PDF
    The [2Fe-2S] ferredoxin (Fd4) from Aquifex aeolicus adopts a thioredoxin-like polypeptide fold that is distinct from other [2Fe-2S] ferredoxins. Crystal structures of the Cys-55 right-arrow Ser (C55S) and Cys-59 right-arrow Ser (C59S) variants of this protein have been determined to 1.25 Å and 1.05 Å resolution, respectively, whereas the resolution of the wild type (WT) has been extended to 1.5 Å. The improved WT structure provides a detailed description of the [2Fe-2S] cluster, including two features that have not been noted previously in any [2Fe-2S] cluster-containing protein, namely, pronounced distortions in the cysteine coordination to the cluster and a Calpha -H-Sgamma hydrogen bond between cluster ligands Cys-55 and Cys-9. These features may contribute to the unusual electronic and magnetic properties of the [2Fe-2S] clusters in WT and variants of this ferredoxin. The structures of the two variants of Fd4, in which single cysteine ligands to the [2Fe-2S] cluster are replaced by serine, establish the metric details of serine-ligated Fe-S active sites with unprecedented accuracy. Both the cluster and its surrounding protein matrix change in subtle ways to accommodate this ligand substitution, particularly in terms of distortions of the Fe2S2 inorganic core from planarity and displacements of the polypeptide chain. These high resolution structures illustrate how the interactions between polypeptide chains and Fe-S active sites reflect combinations of flexibility and rigidity on the part of both partners; these themes are also evident in more complex systems, as exemplified by changes associated with serine ligation of the nitrogenase P cluster

    Metal Cations in Protein Force Fields: From Data Set Creation and Benchmarks to Polarizable Force Field Implementation and Adjustment

    Get PDF
    Metal cations are essential to life. About one-third of all proteins require metal cofactors to accurately fold or to function. Computer simulations using empirical parameters and classical molecular mechanics models (force fields) are the standard tool to investigate proteins’ structural dynamics and functions in silico. Despite many successes, the accuracy of force fields is limited when cations are involved. The focus of this thesis is the development of tools and strategies to create system-specific force field parameters to accurately describe cation-protein interactions. The accuracy of a force field mainly relies on (i) the parameters derived from increasingly large quantum chemistry or experimental data and (ii) the physics behind the energy formula. The first part of this thesis presents a large and comprehensive quantum chemistry data set on a consistent computational footing that can be used for force field parameterization and benchmarking. The data set covers dipeptides of the 20 proteinogenic amino acids with different possible side chain protonation states, 3 divalent cations (Ca2+, Mg2+, and Ba2+), and a wide relative energy range. Crucial properties related to force field development, such as partial charges, interaction energies, etc., are also provided. To make the data available, the data set was uploaded to the NOMAD repository and its data structure was formalized in an ontology. Besides a proper data basis for parameterization, the physics covered by the terms of the additive force field formulation model impacts its applicability. The second part of this thesis benchmarks three popular non-polarizable force fields and the polarizable Drude model against a quantum chemistry data set. After some adjustments, the Drude model was found to reproduce the reference interaction energy substantially better than the non-polarizable force fields, which showed the importance of explicitly addressing polarization effects. Tweaking of the Drude model involved Boltzmann-weighted fitting to optimize Thole factors and Lennard-Jones parameters. The obtained parameters were validated by (i) their ability to reproduce reference interaction energies and (ii) molecular dynamics simulations of the N-lobe of calmodulin. This work facilitates the improvement of polarizable force fields for cation-protein interactions by quantum chemistry-driven parameterization combined with molecular dynamics simulations in the condensed phase. While the Drude model exhibits its potential simulating cation-protein interactions, it lacks description of charge transfer effects, which are significant between cation and protein. The CTPOL model extends the classical force field formulation by charge transfer (CT) and polarization (POL). Since the CTPOL model is not readily available in any of the popular molecular-dynamics packages, it was implemented in OpenMM. Furthermore, an open-source parameterization tool, called FFAFFURR, was implemented that enables the (system specific) parameterization of OPLS-AA and CTPOL models. Following the method established in the previous part, the performance of FFAFFURR was evaluated by its ability to reproduce quantum chemistry energies and molecular dynamics simulations of the zinc finger protein. In conclusion, this thesis steps towards the development of next-generation force fields to accurately describe cation-protein interactions by providing (i) reference data, (ii) a force field model that includes charge transfer and polarization, and (iii) a freely-available parameterization tool.Metallkationen sind fĂŒr das Leben unerlĂ€sslich. Etwa ein Drittel aller Proteine benötigen Metall-Cofaktoren, um sich korrekt zu falten oder zu funktionieren. Computersimulationen unter Verwendung empirischer Parameter und klassischer MolekĂŒlmechanik-Modelle (Kraftfelder) sind ein Standardwerkzeug zur Untersuchung der strukturellen Dynamik und Funktionen von Proteinen in silico. Trotz vieler Erfolge ist die Genauigkeit der Kraftfelder begrenzt, wenn Kationen beteiligt sind. Der Schwerpunkt dieser Arbeit liegt auf der Entwicklung von Werkzeugen und Strategien zur Erstellung systemspezifischer Kraftfeldparameter zur genaueren Beschreibung von Kationen-Protein-Wechselwirkungen. Die Genauigkeit eines Kraftfelds hĂ€ngt hauptsĂ€chlich von (i) den Parametern ab, die aus immer grĂ¶ĂŸeren quantenchemischen oder experimentellen Daten abgeleitet werden, und (ii) der Physik hinter der Kraftfeld-Formel. Im ersten Teil dieser Arbeit wird ein großer und umfassender quantenchemischer Datensatz auf einer konsistenten rechnerischen Grundlage vorgestellt, der fĂŒr die Parametrisierung und das Benchmarking von Kraftfeldern verwendet werden kann. Der Datensatz umfasst Dipeptide der 20 proteinogenen AminosĂ€uren mit verschiedenen möglichen Seitenketten-ProtonierungszustĂ€nden, 3 zweiwertige Kationen (Ca2+, Mg2+ und Ba2+) und einen breiten relativen Energiebereich. Wichtige Eigenschaften fĂŒr die Entwicklung von Kraftfeldern, wie Wechselwirkungsenergien, Partialladungen usw., werden ebenfalls bereitgestellt. Um die Daten verfĂŒgbar zu machen, wurde der Datensatz in das NOMAD-Repository hochgeladen und seine Datenstruktur wurde in einer Ontologie formalisiert. Neben einer geeigneten Datenbasis fĂŒr die Parametrisierung beeinflusst die Physik, die von den Termen des additiven Kraftfeld-Modells abgedeckt wird, dessen Anwendbarkeit. Der zweite Teil dieser Arbeit vergleicht drei populĂ€re nichtpolarisierbare Kraftfelder und das polarisierbare Drude-Modell mit einem Datensatz aus der Quantenchemie. Nach einigen Anpassungen stellte sich heraus, dass das Drude-Modell die Referenzwechselwirkungsenergie wesentlich besser reproduziert als die nichtpolarisierbaren Kraftfelder, was zeigt, wie wichtig es ist, Polarisationseffekte explizit zu berĂŒcksichtigen. Die Anpassung des Drude-Modells umfasste eine Boltzmann-gewichtete Optimierung der Thole-Faktoren und Lennard-Jones-Parameter. Die erhaltenen Parameter wurden validiert durch (i) ihre FĂ€higkeit, Referenzwechselwirkungsenergien zu reproduzieren und (ii) Molekulardynamik-Simulationen des Calmodulin-N-Lobe. Diese Arbeit demonstriert die Verbesserung polarisierbarer Kraftfelder fĂŒr Kationen-Protein-Wechselwirkungen durch quantenchemisch gesteuerte Parametrisierung in Kombination mit Molekulardynamiksimulationen in der kondensierten Phase. WĂ€hrend das Drude-Modell sein Potenzial bei der Simulation von Kation - Protein - Wechselwirkungen zeigt, fehlt ihm die Beschreibung von Ladungstransfereffekten, die zwischen Kation und Protein von Bedeutung sind. Das CTPOL-Modell erweitert die klassische Kraftfeldformulierung um den Ladungstransfer (CT) und die Polarisation (POL). Da das CTPOL-Modell in keinem der gĂ€ngigen Molekulardynamik-Pakete verfĂŒgbar ist, wurde es in OpenMM implementiert. Außerdem wurde ein Open-Source-Parametrisierungswerkzeug namens FFAFFURR implementiert, welches die (systemspezifische) Parametrisierung von OPLS-AA und CTPOL-Modellen ermöglicht. In Anlehnung an die im vorangegangenen Teil etablierte Methode wurde die Leistung von FFAFFURR anhand seiner FĂ€higkeit, quantenchemische Energien und Molekulardynamiksimulationen des Zinkfingerproteins zu reproduzieren, bewertet. Zusammenfassend lĂ€sst sich sagen, dass diese Arbeit einen Schritt in Richtung der Entwicklung von Kraftfeldern der nĂ€chsten Generation zur genauen Beschreibung von Kationen-Protein-Wechselwirkungen darstellt, indem sie (i) Referenzdaten, (ii) ein Kraftfeldmodell, das Ladungstransfer und Polarisation einschließt, und (iii) ein frei verfĂŒgbares Parametrisierungswerkzeug bereitstellt

    The Importance of Stereochemically Active Lone Pairs For Influencing Pb II and As III Protein Binding

    Full text link
    The toxicity of heavy metals, which is associated with the high affinity of the metals for thiolate rich proteins, constitutes a problem worldwide. However, despite this tremendous toxicity concern, the binding mode of As III and Pb II to proteins is poorly understood. To clarify the requirements for toxic metal binding to metalloregulatory sensor proteins such as As III in ArsR/ArsD and Pb II in PbrR or replacing Zn II in ή‐aminolevulinc acid dehydratase (ALAD), we have employed computational and experimental methods examining the binding of these heavy metals to designed peptide models. The computational results show that the mode of coordination of As III and Pb II is greatly influenced by the steric bulk within the second coordination environment of the metal. The proposed basis of this selectivity is the large size of the ion and, most important, the influence of the stereochemically active lone pair in hemidirected complexes of the metal ion as being crucial. The experimental data show that switching a bulky leucine layer above the metal binding site by a smaller alanine residue enhances the Pb II  binding affinity by a factor of five, thus supporting experimentally the hypothesis of lone pair steric hindrance. These complementary approaches demonstrate the potential importance of a stereochemically active lone pair as a metal recognition mode in proteins and, specifically, how the second coordination sphere environment affects the affinity and selectivity of protein targets by certain toxic ions. Experimental and computational methods have been employed to study the influence of the lone pair of As III and Pb II for the binding of these ions in proteins using designed peptide models. The results show that the mode of coordination of As III and Pb II is greatly influenced by the steric bulk within the second coordination environment of the metals (see figure).Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90414/1/chem_201102786_sm_miscellaneous_information.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/90414/2/2040_ftp.pd
    • 

    corecore