10,870 research outputs found

    Improved prediction of critical residues for protein function based on network and phylogenetic analyses

    Get PDF
    BACKGROUND: Phylogenetic approaches are commonly used to predict which amino acid residues are critical to the function of a given protein. However, such approaches display inherent limitations, such as the requirement for identification of multiple homologues of the protein under consideration. Therefore, complementary or alternative approaches for the prediction of critical residues would be desirable. Network analyses have been used in the modelling of many complex biological systems, but only very recently have they been used to predict critical residues from a protein's three-dimensional structure. Here we compare a couple of phylogenetic approaches to several different network-based methods for the prediction of critical residues, and show that a combination of one phylogenetic method and one network-based method is superior to other methods previously employed. RESULTS: We associate a network with each member of a set of proteins for which the three-dimensional structure is known and the critical residues have been previously determined experimentally. We show that several network-based centrality measurements (connectivity, 2-connectivity, closeness centrality, betweenness and cluster coefficient) accurately detect residues critical for the protein's function. Phylogenetic approaches render predictions as reliable as the network-based measurements, although, interestingly, the two general approaches tend to predict different sets of critical residues. Hence we propose a hybrid method that is composed of one network-based calculation – the closeness centrality – and one phylogenetic approach – the Conseq server. This hybrid approach predicts critical residues more accurately than the other methods tested here. CONCLUSION: We show that network analysis can be used to improve the prediction of amino acids critical for protein function, when utilized in combination with phylogenetic approaches. It is proposed that such improvement is due to the complementary nature of these approaches: network-based methods tend to predict as critical those residues that are highly connected and internal (i.e., non-surface), although some surface residues are indeed identified as critical by network analyses; whereas residues chosen by phylogenetic approaches display a lower overall probability of being surface inaccessible

    Phylogenetic and functional analysis of the Cation Diffusion Facilitator (CDF) family: improved signature and prediction of substrate specificity

    Get PDF
    BACKGROUND The Cation Diffusion Facilitator (CDF) family is a ubiquitous family of heavy metal transporters. Much interest in this family has focused on implications for human health and bioremediation. In this work a broad phylogenetic study has been undertaken which, considered in the context of the functional characteristics of some fully characterised CDF transporters, has aimed at identifying molecular determinants of substrate selectivity and at suggesting metal specificity for newly identified CDF transporters. RESULTS Representative CDF members from all three kingdoms of life (Archaea, Eubacteria, Eukaryotes) were retrieved from genomic databases. Protein sequence alignment has allowed detection of a modified signature that can be used to identify new hypothetical CDF members. Phylogenetic reconstruction has classified the majority of CDF family members into three groups, each containing characterised members that share the same specificity towards the principally-transported metal, i.e. Zn, Fe/Zn or Mn. The metal selectivity of newly identified CDF transporters can be inferred by their position in one of these groups. The function of some conserved amino acids was assessed by site-directed mutagenesis in the poplar Zn2+ transporter PtdMTP1 and compared with similar experiments performed in prokaryotic members. An essential structural role can be assigned to a widely conserved glycine residue, while aspartate and histidine residues, highly conserved in putative transmembrane domains, might be involved in metal transport. The potential role of group-conserved amino acid residues in metal specificity is discussed. CONCLUSION In the present study phylogenetic and functional analyses have allowed the identification of three major substrate-specific CDF groups. The metal selectivity of newly identified CDF transporters can be inferred by their position in one of these groups. The modified signature sequence proposed in this work can be used to identify new hypothetical CDF members

    Functional divergence of microtubule-associated TPX2 family members in Arabidopsis thaliana

    Get PDF
    TPX2 (Targeting Protein for Xklp2) is an evolutionary conserved microtubule-associated protein important for microtubule nucleation and mitotic spindle assembly. The protein was described as an activator of the mitotic kinase Aurora A in humans and the Arabidopsis AURORA1 (AUR1) kinase. In contrast to animal genomes that encode only one TPX2 gene, higher plant genomes encode a family with several TPX2-LIKE gene members (TPXL). TPXL genes of Arabidopsis can be divided into two groups. Group A proteins (TPXL2, 3, 4, and 8) contain Aurora binding and TPX2_importin domains, while group B proteins (TPXL1, 5, 6, and 7) harbor an Xklp2 domain. Canonical TPX2 contains all the above-mentioned domains. We confirmed using in vitro kinase assays that the group A proteins contain a functional Aurora kinase binding domain. Transient expression of Arabidopsis TPX2-like proteins in Nicotiana benthamiana revealed preferential localization to microtubules and nuclei. Co-expression of AUR1 together with TPX2-like proteins changed the localization of AUR1, indicating that these proteins serve as targeting factors for Aurora kinases. Taken together, we visualize the various localizations of the TPX2-LIKE family in Arabidopsis as a proxy to their functional divergence and provide evidence of their role in the targeted regulation of AUR1 kinase activity

    Computational Molecular Coevolution

    Get PDF
    A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue contact. Thus, high-quality multiple sequence alignments and robust coevolution-inferring statistics can produce structural information from sequence alone. This work characterizes the relationship between coevolution statistics and sequence alignments and highlights the implicit assumptions and caveats associated with coevolutionary inference. An investigation of sequence alignment quality and coevolutionary-inference methods revealed that such methods are very sensitive to the systematic misalignments discovered in public databases. However, repairing the misalignments in such alignments restores the predictive power of coevolution statistics. To overcome the sensitivity to misalignments, two novel coevolution-inferring statistics were developed that show increased contact prediction accuracy, especially in alignments that contain misalignments. These new statistics were developed into a suite of coevolution tools, the MIpToolset. Because systematic misalignments produce a distinctive pattern when analyzed by coevolution-inferring statistics, a new method for detecting systematic misalignments was created to exploit this phenomenon. This new method called ``local covariation\u27\u27 was used to analyze publicly-available multiple sequence alignment databases. Local covariation detected putative misalignments in a database designed to benchmark sequence alignment software accuracy. Local covariation was incorporated into a new software tool, LoCo, which displays regions of potential misalignment during alignment editing assists in their correction. This work represents advances in multiple sequence alignment creation and coevolutionary inference

    Information Theory in Molecular Evolution: From Models to Structures and Dynamics

    Get PDF
    This Special Issue collects novel contributions from scientists in the interdisciplinary field of biomolecular evolution. Works listed here use information theoretical concepts as a core but are tightly integrated with the study of molecular processes. Applications include the analysis of phylogenetic signals to elucidate biomolecular structure and function, the study and quantification of structural dynamics and allostery, as well as models of molecular interaction specificity inspired by evolutionary cues

    Accounting for epistatic interactions improves the functional analysis of protein structures

    Get PDF
    Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    Amino acid positions subject to multiple co-evolutionary constraints can be robustly identified by their eigenvector network centrality scores

    Get PDF
    As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for co-evolution between pairs of positions. Co-evolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of co-evolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded co-evolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; “central” positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise co-evolution scores: Instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints – detectable by divergent algorithms – that occur at key protein locations. Finally, we discuss the fact that multiple patterns co-exist in evolutionary data that, together, give rise to emergent protein functions
    corecore