15,600 research outputs found

    Capturing coevolutionary signals in repeat proteins

    Get PDF
    The analysis of correlations of amino acid occurrences in globular proteins has led to the development of statistical tools that can identify native contacts -- portions of the chains that come to close distance in folded structural ensembles. Here we introduce a statistical coupling analysis for repeat proteins -- natural systems for which the identification of domains remains challenging. We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias reveals true co-evolutionary signals from which local native-contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. The overall procedure can be used to reconstruct the interactions at long distances, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric

    Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space

    Full text link
    The notion of energy landscapes provides conceptual tools for understanding the complexities of protein folding and function. Energy Landscape Theory indicates that it is much easier to find sequences that satisfy the "Principle of Minimal Frustration" when the folded structure is symmetric (Wolynes, P. G. Symmetry and the Energy Landscapes of Biomolecules. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 14249-14255). Similarly, repeats and structural mosaics may be fundamentally related to landscapes with multiple embedded funnels. Here we present analytical tools to detect and compare structural repetitions in protein molecules. By an exhaustive analysis of the distribution of structural repeats using a robust metric we define those portions of a protein molecule that best describe the overall structure as a tessellation of basic units. The patterns produced by such tessellations provide intuitive representations of the repeating regions and their association towards higher order arrangements. We find that some protein architectures can be described as nearly periodic, while in others clear separations between repetitions exist. Since the method is independent of amino acid sequence information we can identify structural units that can be encoded by a variety of distinct amino acid sequences

    ConSole: using modularity of contact maps to locate solenoid domains in protein structures.

    Get PDF
    BackgroundPeriodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.ResultsConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.ConclusionsThe ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org

    Phylogenetic differences in content and intensity of periodic proteins

    Get PDF
    Many proteins exhibit sequence periodicity, often correlated with a visible structural periodicity. The statistical significance of such periodicity can be assessed by means of a chi-square-based test, with significance thresholds being calculated from shuffled sequences. Comparison of the complete proteomes of 45 species reveals striking differences in the proportion of periodic proteins and the intensity of the most significant periodicities. Eukaryotes tend to have a higher proportion of periodic proteins than eubacteria, which in turn tend to have more than archaea. The intensity of periodicity in the most periodic proteins is also greatest in eukaryotes. By contrast, the relatively small group of periodic proteins in archaea also tend to be weakly periodic compared to those of eukaryotes and eubacteria. Exceptions to this general rule are found in those prokaryotes with multicellular life-cycle phases, e.g. Methanosarcina sps. or Anabaena sps., which have more periodicities than prokaryotes in general, and in unicellular eukaryotes, which have fewer than multicellular eukaryotes. The distribution of significantly periodic proteins in eukaryotes is over a wide range of period lengths, whereas prokaryotic proteins typically have a more limited set of period lengths. This is further investigated by repeating the analysis on the NRL-3D database of proteins of solved structure. Some short range periodicities are explicable in terms of basic secondary structure, e.g. alpha helices, while middle range periodicities are frequently found to consist of known short Pfam domains, e.g. leucine-rich repeats, tetratricopeptides or armadillo domains. However, not all can be explained in this way

    Structural and Energetic Characterization of the Ankyrin Repeat Protein Family

    Get PDF
    Ankyrin repeat containing proteins are one of the most abundant solenoid folds. Usually implicated in specific protein-protein interactions, these proteins are readily amenable for design, with promising biotechnological and biomedical applications. Studying repeat protein families presents technical challenges due to the high sequence divergence among the repeating units. We developed and applied a systematic method to consistently identify and annotate the structural repetitions over the members of the complete Ankyrin Repeat Protein Family, with increased sensitivity over previous studies. We statistically characterized the number of repeats, the folding of the repeat-arrays, their structural variations, insertions and deletions. An energetic analysis of the local frustration patterns reveal the basic features underlying fold stability and its relation to the functional binding regions. We found a strong linear correlation between the conservation of the energetic features in the repeat arrays and their sequence variations, and discuss new insights into the organization and function of these ubiquitous proteins.Fil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Ciudad Universitaria. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, RocĂ­o. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Ciudad Universitaria. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Verstraete, Nina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Ciudad Universitaria. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Ferreiro, Diego Ulises. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Ciudad Universitaria. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de QuĂ­mica BiolĂłgica de la Facultad de Ciencias Exactas y Naturales; Argentin

    Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of glycoside hydrolase.

    Get PDF
    BackgroundGut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain.ResultsGenomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded β/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333.ConclusionsWe suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity

    Of bits and bugs

    Get PDF
    Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins

    Structure of the saxiphilin:saxitoxin (STX) complex reveals a convergent molecular recognition strategy for paralytic toxins.

    Get PDF
    Dinoflagelates and cyanobacteria produce saxitoxin (STX), a lethal bis-guanidinium neurotoxin causing paralytic shellfish poisoning. A number of metazoans have soluble STX-binding proteins that may prevent STX intoxication. However, their STX molecular recognition mechanisms remain unknown. Here, we present structures of saxiphilin (Sxph), a bullfrog high-affinity STX-binding protein, alone and bound to STX. The structures reveal a novel high-affinity STX-binding site built from a "proto-pocket" on a transferrin scaffold that also bears thyroglobulin domain protease inhibitor repeats. Comparison of Sxph and voltage-gated sodium channel STX-binding sites reveals a convergent toxin recognition strategy comprising a largely rigid binding site where acidic side chains and a cation-Ď€ interaction engage STX. These studies reveal molecular rules for STX recognition, outline how a toxin-binding site can be built on a naĂŻve scaffold, and open a path to developing protein sensors for environmental STX monitoring and new biologics for STX intoxication mitigation
    • …
    corecore