189 research outputs found

    Inconsistent Distances in Substitution Matrices can be Avoided by Properly Handling Hydrophobic Residues

    Get PDF
    The adequacy of substitution matrices to model evolutionary relationships between amino acid sequences can be numerically evaluated by checking the mathematical property of triangle inequality for all triplets of residues. By converting substitution scores into distances, one can verify that a direct path between two amino acids is shorter than a path passing through a third amino acid in the amino acid space modeled by the matrix. If the triangle inequality is not verified, the intuition is that the evolutionary signal is not well modeled by the matrix, that the space is locally inconsistent and that the matrix construction was probably based on insufficient biological data. Previous analysis on several substitution matrices revealed that the number of triplets violating the triangle inequality increases with sequence divergence. Here, we compare matrices which are dedicated to the alignment of highly divergent proteins. The triangle inequality is tested on several classical substitution matrices as well as in a pair of “complementary” substitution matrices recording the evolutionary pressures inside and outside hydrophobic blocks in protein sequences. The analysis proves the crucial role of hydrophobic residues in substitution matrices dedicated to the alignment of distantly related proteins

    Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

    Get PDF
    Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

    Computational and Experimental Investigation of Allosteric Communication in the Transcriptional Regulator NikR

    Get PDF
    The Ni2+ and DNA binding protein NikR is involved in nickel regulation in Escherichia coli through transcriptional repression of the NikABCDE nickel permease. NikR is a homotetramer and each chain contains both a DNA binding ribbon-helix-helix: RHH) domain and a Ni2+ binding regulatory ACT: aspartokinase, chorismate mutase, TyrA) fold. Work herein combines computational modeling of NikR structure with experimental studies aimed at understanding allosteric communication between the ACT and RHH domains. Hydrogen/deuterium exchange mass spectrometry shows a Ni2+ specific NikR conformational change relative to bound Cu2+, Co2+, and Zn2+. Concurrent coordination geometry and in vivo repressor function studies show that NikR activation is specific to binding Ni2+ in square-planar geometry. These results suggest that regions of the NikR structure distal to the Ni2+ binding sites are involved in allosteric communication. To help determine important residue interactions within and between the RHH and ACT domains that are involved in allostery, an equilibrium molecular dynamics: MD) simulation is utilized to explore the conformational dynamics of the NikR tetramer. This study includes advances in methods development focused on identifying signatures of allosteric communication in MD simulations. Using two different correlation measures based on fluctuations in atomic position and non-covalent bonding, we identify a potential allosteric communication pathway between the Ni2+ and DNA binding sites. We also apply a graph theoretic approach to map the most probable networks of non-covalent contacts connecting the two functionally important binding sites. Several of the residues identified by our analyses have been shown experimentally to be important for NikR function. An additional subset of the selected residues structurally connects experimentally important residues and may help coordinate allosteric communication between the ACT and RHH domains. Based on these analyses and additional structural interpretations, site-directed mutagenesis of E. coli NikR and subsequent characterization of changes in Ni2+ binding and in vivo repressor function of mutants aid our understanding of the role of these residues in allosteric regulation. The combination of computational and experimental methods that are developed or adapted in this study provides a framework for further characterization of NikR, other ACT domain containing proteins, and other allosteric proteins

    The molecular quasi-species

    No full text

    Computational Approaches To Anti-Toxin Therapies And Biomarker Identification

    Get PDF
    This work describes the fundamental study of two bacterial toxins with computational methods, the rational design of a potent inhibitor using molecular dynamics, as well as the development of two bioinformatic methods for mining genomic data. Clostridium difficile is an opportunistic bacillus which produces two large glucosylating toxins. These toxins, TcdA and TcdB cause severe intestinal damage. As Clostridium difficile harbors considerable antibiotic resistance, one treatment strategy is to prevent the tissue damage that the toxins cause. The catalytic glucosyltransferase domain of TcdA and TcdB was studied using molecular dynamics in the presence of both a protein-protein binding partner and several substrates. These experiments were combined with lead optimization techniques to create a potent irreversible inhibitor which protects 95% of cells in vitro. Dynamics studies on a TcdB cysteine protease domain were performed to an allosteric communication pathway. Comparative analysis of the static and dynamic properties of the TcdA and TcdB glucosyltransferase domains were carried out to determine the basis for the differential lethality of these toxins. Large scale biological data is readily available in the post-genomic era, but it can be difficult to effectively use that data. Two bioinformatics methods were developed to process whole-genome data. Software was developed to return all genes containing a motif in single genome. This provides a list of genes which may be within the same regulatory network or targeted by a specific DNA binding factor. A second bioinformatic method was created to link the data from genome-wide association studies (GWAS) to specific genes. GWAS studies are frequently subjected to statistical analysis, but mutations are rarely investigated structurally. HyDn-SNP-S allows a researcher to find mutations in a gene that correlate to a GWAS studied phenotype. Across human DNA polymerases, this resulted in strongly predictive haplotypes for breast and prostate cancer. Molecular dynamics applied to DNA Polymerase Lambda suggested a structural explanation for the decrease in polymerase fidelity with that mutant. When applied to Histone Deacetylases, mutations were found that alter substrate binding, and post-translational modification

    Optimization of electrostatic binding free energy : applications to the analysis and design of ligand binding in protein complexes

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemistry, 2002.Vita.Includes bibliographical references (p. 279-298).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Electrostatic interactions play an important role in determining the energetics of association in biomolecular complexes. Previous work has shown that, within a continuum electrostatic model, for any given complex there exists a ligand charge distribution which optimizes the electrostatic binding free energy - the electrostatic complement of the target receptor. This electrostatic affinity optimization procedure was applied to several systems both in order to understand the role of electrostatic interactions in natural systems and as a tool in the design of ligands with improved affinity. Comparison of the natural and optimal charges of several ligands of glutaminyl-tRNA synthetase from E. coli, an enzyme with a strong natural requirement for specificity, shows remarkable similarity in many areas, suggesting that the optimization of electrostatic interactions played a role in the evolution of this system. The optimization procedure was also applied to the design of improvements to two inhibitors of HIV-1 viral-cell membrane fusion. Two tryptophan residues that are part of a D-peptide inhibitor were identified as contributing most significantly to binding, and a novel computational screening procedure based on the optimization methodology was developed to screen a library of tryptophan derivatives at both positions. Additionally, the optimization methodology was used to predict four mutations to standard amino acids at three positions on 5-Helix, a protein inhibitor of membrane fusion. All mutations were computed to improve the affinity of the inhibitor, with a five hundred-fold improvement calculated for one triple mutant.(cont.) In the complex of b-lactamase inhibitor protein with TEM1 b-lactamase, a novel type of electrostatic interaction was identified, with surface exposed charged groups on the periphery of the binding interface projecting significant energetic effects through as much as 10 A of solvent. Finally, a large number of ab initio methods for determining partial atomic charges on small molecules were evaluated in terms of their ability to reproduce experimental values in continuum electrostatic calculations, with several preferred methods identified.by David Francis Green.Ph.D

    Unsupervised inference methods for protein sequence data

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen
    corecore