21 research outputs found

    Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation"</p><p>http://www.almob.org/content/2/1/12</p><p>Algorithms for molecular biology : AMB 2007;2():12-12.</p><p>Published online 3 Oct 2007</p><p>PMCID:PMC2234412.</p><p></p

    Computing and visually analyzing mutual information in molecular co-evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Selective pressure in molecular evolution leads to uneven distributions of amino acids and nucleotides. In fact one observes correlations among such constituents due to a large number of biophysical mechanisms (folding properties, electrostatics, ...). To quantify these correlations the mutual information -after proper normalization - has proven most effective. The challenge is to navigate the large amount of data, which in a study for a typical protein cannot simply be plotted.</p> <p>Results</p> <p>To visually analyze mutual information we developed a matrix visualization tool that allows different views on the mutual information matrix: filtering, sorting, and weighting are among them. The user can interactively navigate a huge matrix in real-time and search e.g., for patterns and unusual high or low values. A computation of the mutual information matrix for a sequence alignment in FASTA-format is possible. The respective stand-alone program computes in addition proper normalizations for a null model of neutral evolution and maps the mutual information to <it>Z</it>-scores with respect to the null model.</p> <p>Conclusions</p> <p>The new tool allows to compute and visually analyze sequence data for possible co-evolutionary signals. The tool has already been successfully employed in evolutionary studies on HIV1 protease and acetylcholinesterase. The functionality of the tool was defined by users using the tool in real-world research. The software can also be used for visual analysis of other matrix-like data, such as information obtained by DNA microarray experiments. The package is platform-independently implemented in <monospace>Java</monospace> and free for academic use under a GPL license.</p

    Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation-5

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation"</p><p>http://www.almob.org/content/2/1/12</p><p>Algorithms for molecular biology : AMB 2007;2():12-12.</p><p>Published online 3 Oct 2007</p><p>PMCID:PMC2234412.</p><p></p>istically significant at 0.01 with bonferroni correction, except the ones between MDAR and MI Adp, between RCW MDAR and RCW MDAR vs Tree and between MI and Simpl

    Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification

    Get PDF
    Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution

    Direct-coupling analysis of residue co-evolution captures native contacts across many protein families

    Full text link
    The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced Direct Coupling Analysis (DCA) (Weigt et al. (2009) Proc Natl Acad Sci 106:67). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intra- domain residue contacts, arising, e.g., from alternative protein conformations, ligand- mediated residue couplings, and inter-domain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, provided the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.Comment: 28 pages, 7 figures, to appear in PNA

    Direct-coupling analysis of residue co-evolution captures native contacts across many protein families

    Full text link
    The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced Direct Coupling Analysis (DCA) (Weigt et al. (2009) Proc Natl Acad Sci 106:67). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intra- domain residue contacts, arising, e.g., from alternative protein conformations, ligand- mediated residue couplings, and inter-domain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, provided the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.Comment: 28 pages, 7 figures, to appear in PNA

    Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution

    Get PDF
    The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations

    Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Some amino acid residues functionally interact with each other. This interaction will result in an evolutionary co-variation between these residues – coevolution. Our goal is to find these coevolving residues.</p> <p>Results</p> <p>We present six new methods for detecting coevolving residues. Among other things, we suggest measures that are variants of Mutual Information, and measures that use a multidimensional representation of each residue in order to capture the physico-chemical similarities between amino acids. We created a benchmarking system, in silico, able to evaluate these methods through a wide range of realistic conditions. Finally, we use the combination of different methods as a way of improving performance.</p> <p>Conclusion</p> <p>Our best method (Row and Column Weighed Mutual Information) has an estimated accuracy increase of 63% over Mutual Information. Furthermore, we show that the combination of different methods is efficient, and that the methods are quite sensitive to the different conditions tested.</p

    Estudo computacional das interacções proteína-proteína

    Get PDF
    Tese de mestrado. Biologia (Bioinformática e Biologia Computacional - Bioinformática). Universidade de Lisboa, Faculdade de Ciências, 2010O reconhecimento molecular é um processo chave em sistemas biológicos. A replicação e transcrição do ADN, a adesão celular, as cascatas de sinalização e ciclos metabólicos são alguns dos processos que têm por base o reconhecimento molecular. A compreensão destes processos exige que se conheçam as interacções de proteínas que estão na base dos mesmos. O modo como duas proteínas interagem pode ser difícil de prever, sobretudo se estas estabelecerem interacções transientes. O Docking é um método computacional que permite prever o modo de ligação entre duas moléculas e que tem potencial na previsão da interacção de complexos transientes. Os métodos para prever interface de proteínas podem ser baseados unicamente nas propriedades geométricas, físico-químicas e estatísticas da superfície ou podem incorporar também informação evolucionária na forma de certas medidas de conservação derivadas de alinhamentos de múltiplas sequências (MSA). Ao longo do tempo ocorrem substituições de aminoácidos nas proteínas. Substituições que estabilizem a interface entre monómeros são favorecidas por selecção natural. Se uma mutação num monómero induz uma mutação noutro monómero do mesmo complexo, diz-se que as mutações estão correlacionadas. Estas mutações podem ser determinadas analisando as correlações entre alterações em pares de posições em MSA. Já foi demonstrado que pares de aminoácidos correlacionados estão significativamente mais perto uns dos outros do que pares não correlacionados e que estes podem ser usados para descriminar entre soluções correctas e incorrectas em métodos de docking. Neste trabalho desenvolveu-se um sistema automatizado constituído por ferramentas em Python que integraram software disponível online, tal como o BLAST, o ClustalW e algoritmos de determinação de covariações, com o objectivo de determinar dados de coevolução que permitissem filtrar soluções de docking de complexos transientes.Molecular recognition is a key process in biological systems. DNA replication and transcription, cellular adhesion, signaling cascades and metabolic cycles are some of the processes that underlie molecular recognition. In order to understand these processes it is of utmost importance to know the protein interactions that are on their origin. The way in which two proteins interact might be difficult to predict, especially if they establish transient interactions. Docking is a computational method that allows the prediction of the binding mode between two molecules and has potential in predicting transient complexes. Methods that predict protein interfaces can be based solely on geometric, statistical and physical-chemical properties of the surface or they can also incorporate evolutionary data related to amino acid conservation that is extracted from multiple sequence alignment (MSA). Throughout time amino acid substitutions occur. Substitutions that stabilize the interface between monomers are favored by natural selection. If a mutation within a monomer induces a mutation on another monomer of the same complex, it is considered that these mutations are correlated. These mutations can be determined by analysis of the correlations between a pair of amino acids in MSA. It has been demonstrated that pairs of amino acids that are correlated are significantly closer together in the structure when compared to pairs that are not correlated and correlated pairs can be used to distinguish right from wrong solutions in docking methods

    Integrated Analysis of Residue Coevolution and Protein Structure in ABC Transporters

    Get PDF
    Intraprotein side chain contacts can couple the evolutionary process of amino acid substitution at one position to that at another. This coupling, known as residue coevolution, may vary in strength. Conserved contacts thus not only define 3-dimensional protein structure, but also indicate which residue-residue interactions are crucial to a protein’s function. Therefore, prediction of strongly coevolving residue-pairs helps clarify molecular mechanisms underlying function. Previously, various coevolution detectors have been employed separately to predict these pairs purely from multiple sequence alignments, while disregarding available structural information. This study introduces an integrative framework that improves the accuracy of such predictions, relative to previous approaches, by combining multiple coevolution detectors and incorporating structural contact information. This framework is applied to the ABC-B and ABC-C transporter families, which include the drug exporter P-glycoprotein involved in multidrug resistance of cancer cells, as well as the CFTR chloride channel linked to cystic fibrosis disease. The predicted coevolving pairs are further analyzed based on conformational changes inferred from outward- and inward-facing transporter structures. The analysis suggests that some pairs coevolved to directly regulate conformational changes of the alternating-access transport mechanism, while others to stabilize rigid-body-like components of the protein structure. Moreover, some identified pairs correspond to residues previously implicated in cystic fibrosis
    corecore