233 research outputs found

    Dynamics of domain coverage of the protein sequence universe

    Get PDF
    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its ā€œdark matterā€. Results Here we suggest that true size of ā€œdark matterā€ is much larger than stated by current definitions. We propose an approach to reducing the size of ā€œdark matterā€ by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of ā€œdark matterā€; however, its absolute size increases substantially with the growth of sequence data

    DOMMINO: a database of macromolecular interactions

    Get PDF
    With the growing number of experimentally resolved structures of macromolecular complexes, it becomes clear that the interactions that involve protein structures are mediated not only by the protein domains, but also by various non-structured regions, such as interdomain linkers, or terminal sequences. Here, we present DOMMINO (http://dommino.org), a comprehensive database of macromolecular interactions that includes the interactions between protein domains, interdomain linkers, N- and C-terminal regions and protein peptides. The database complements SCOP domain annotations with domain predictions by SUPERFAMILY and is automatically updated every week. The database interface is designed to provide the user with a three-stage pipeline to study macromolecular interactions: (i) a flexible search that can include a PDB ID, type of interaction, SCOP family of interacting proteins, organism name, interaction keyword and a minimal threshold on the number of contact pairs; (ii) visualization of subunit interaction network, where the user can investigate the types of interactions within a macromolecular assembly; and (iii) visualization of an interface structure between any pair of the interacting subunits, where the user can highlight several different types of residues within the interfaces as well as study the structure of the corresponding binary complex of subunits

    Structural properties of the linkers connecting the n- and c- terminal domains in the mocr bacterial transcriptional regulators

    Get PDF
    Peptide inter-domain linkers are peptide segments covalently linking two adjacent domains within a protein. Linkers play a variety of structural and functional roles in naturally occurring proteins. In this work we analyze the sequence properties of the predicted linker regions of the bacterial transcriptional regulators belonging to the recently discovered MocR subfamily of the GntR regulators. Analyses were carried out on the MocR sequences taken from the phyla Actinobacteria, Firmicutes, Alpha-, Beta- and Gammaproteobacteria. The results suggest that MocR linkers display phylum-specific characteristics and unique features different from those already described for other classes of inter-domain linkers. They show an average length significantly higher: 31.8 Ā± 14.3 residues reaching a maximum of about 150 residues. Compositional propensities displayed general and phylum-specific trends. Pro is dominating in all linkers. Dyad propensity analysis indicate Proā€“Pro as the most frequent amino acid pair in all linkers. Physicochemical properties of the linker regions were assessed using amino acid indices relative to different features: in general, MocR linkers are flexible, hydrophilic and display propensity for Ī²-turn or coil conformations. Linker sequences are hypervariable: only similarities between MocR linkers from organisms related at the level of species or genus could be found with sequence searches. The results shed light on the properties of the linker regions of the new MocR subfamily of bacterial regulators and may provide knowledge-based rules for designing artificial linkers with desired properties. Ā© 2016 The Author(s

    Computational modelling of multidomain proteins with covarying residue pairs

    Get PDF
    The vast majority of known protein sequences have no solved three-dimensional structure at all, and the remaining ones usually have not been completely characterised, due to the limitations of experimental structural biology techniques. Structural genomics projects have helped increase the coverage of the protein structure universe, but most available structures still consist of either individual domains or sets of relatively small ones. This has prompted the development of computational methods for protein structure prediction, as well as for multidomain architecture modelling. One appealing idea to achieve this goal consists of detecting residue-residue contacts from multiple sequence alignments, under the assumption that they covary in order to maintain the local microenvironment and the overall stability of protein structures. After early limited success, this type of analysis has lately witnessed substantial progress, thanks to theoretical advances in disentangling genuine from spurious instances of correlation. Unsurprisingly, structural bioinformatics has promptly and successfully applied these improved tools to model globular and transmembrane proteins, along with guiding the assembly of protein complexes. However, the efficacy of these methods in the context of multidomain protein modelling has not yet been investigated. In this thesis state-of-the-art methods for predicting contacts from sequence data have been evaluated and used to build models of two-domain protein structures. Firstly, the ability of alternative methods to identify interdomain contacts was examined in a reference set of experimentally solved structures. Secondly, predicted contacts were employed to score docking models and select near-native solutions accordingly. Finally, predicted contacts were used to guide the assembly of individual domains in a multidomain modelling protocol

    Crystal structure of the ZP-N domain of ZP3 reveals the core fold of animal egg coats

    Get PDF
    Species-specific recognition between the egg extracellular matrix (zona pellucida) and sperm is the first, crucial step of mammalian fertilization. Zona pellucida filament components ZP3 and ZP2 act as sperm receptors, and mice lacking either of the corresponding genes produce oocytes without a zona pellucida and are completely infertile. Like their counterparts in the vitelline envelope of non-mammalian eggs and many other secreted eukaryotic proteins, zona pellucida subunits polymerize using a 'zona pellucida (ZP) domain' module, whose conserved amino-terminal part (ZP-N) was suggested to constitute a domain of its own. No atomic structure has been reported for ZP domain proteins, and there is no structural information on any conserved vertebrate protein that is essential for fertilization and directly involved in egg-sperm binding. Here we describe the 2.3 Ƅngstrƶm (A) resolution structure of the ZP-N fragment of mouse primary sperm receptor ZP3. The ZP-N fold defines a new immunoglobulin superfamily subtype with a beta-sheet extension characterized by an E' strand and an invariant tyrosine residue implicated in polymerization. The structure strongly supports the presence of ZP-N repeats within the N-terminal region of ZP2 and other vertebrate zona pellucida/vitelline envelope proteins, with implications for overall egg coat architecture, the post-fertilization block to polyspermy and speciation. Moreover, it provides an important framework for understanding human diseases caused by mutations in ZP domain proteins and developing new methods of non-hormonal contraception

    Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression

    Get PDF
    Background: Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions and to be able to predict them from sequence alone would benefit various areas of protein research. For example, an understanding of how the sequence features of these regions relate to dynamic properties in multi-domain proteins would aid in the rational design of linkers in therapeutic fusion proteins. Results: The DynDom database of protein domain movements comprises sequences annotated to indicate whether the amino acid residue is located within a hinge-bending region or within an intradomain region. Using statistical methods and Kernel Logistic Regression (KLR) models, this data was used to determine sequence features that favour or disfavour hinge-bending regions. This is a difficult classification problem as the number of negative cases (intradomain residues) is much larger than the number of positive cases (hinge residues). The statistical methods and the KLR models both show that cysteine has the lowest propensity for hinge-bending regions and proline has the highest, even though it is the most rigid amino acid. As hinge-bending regions have been previously shown to occur frequently at the terminal regions of the secondary structures, the propensity for proline at these regions is likely due to its tendency to break secondary structures. The KLR models also indicate that isoleucine may act as a domain-capping residue. We have found that a quadratic KLR model outperforms a linear KLR model and that improvement in performance occurs up to very long window lengths (eighty residues) indicating long-range correlations. Conclusion: In contrast to the only other approach that focused solely on interdomain hinge-bending regions, the method provides a modest and statistically significant improvement over a random classifier. An explanation of the KLR results is that in the prediction of hinge-bending regions a long-range correlation is at play between a small number amino acids that either favour or disfavour hinge-bending regions. The resulting sequence-based prediction tool, HingeSeek, is available to run through a webserver at hingeseek.cmp.uea.ac.uk
    • ā€¦
    corecore