88 research outputs found

    Correlation between local structural dynamics of proteins inferred from NMR ensembles and evolutionary dynamics of homologues of known structure.: Inherent and evolutionary structural dynamics

    Get PDF
    International audienceConformational changes in proteins are extremely important for their biochemical functions. Correlation between inherent conformational variations in a protein and conformational differences in its homologues of known structure is still unclear. In this study, we have used a structural alphabet called Protein Blocks (PBs). PBs are used to perform abstraction of protein 3-D structures into a 1-D strings of 16 alphabets (a-p) based on dihedral angles of overlapping pentapeptides. We have analyzed the variations in local conformations in terms of PBs represented in the ensembles of 801 protein structures determined using NMR spectroscopy. In the analysis of concatenated data over all the residues in all the NMR ensembles, we observe that the overall nature of inherent local structural variations in NMR ensembles is similar to the nature of local structural differences in homologous proteins with a high correlation coefficient of .94. High correlation at the alignment positions corresponding to helical and β-sheet regions is only expected. However, the correlation coefficient by considering only the loop regions is also quite high (.91). Surprisingly, segregated position-wise analysis shows that this high correlation does not hold true to loop regions at the structurally equivalent positions in NMR ensembles and their homologues of known structure. This suggests that the general nature of local structural changes is unique; however most of the local structural variations in loop regions of NMR ensembles do not correlate to their local structural differences at structurally equivalent positions in homologues

    PURE: A webserver for the prediction of domains in unassigned regions in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein domains are the structural and functional units of proteins. The ability to parse proteins into different domains is important for effective classification, understanding of protein structure, function, and evolution and is hence biologically relevant. Several computational methods are available to identify domains in the sequence. Domain finding algorithms often employ stringent thresholds to recognize sequence domains. Identification of additional domains can be tedious involving intense computation and manual intervention but can lead to better understanding of overall biological function. In this context, the problem of identifying new domains in the unassigned regions of a protein sequence assumes a crucial importance.</p> <p>Results</p> <p>We had earlier demonstrated that accumulation of domain information of sequence homologues can substantially aid prediction of new domains. In this paper, we propose a computationally intensive, multi-step bioinformatics protocol as a web server named as <b>PURE </b>(<b>P</b>rediction of <b>U</b>nassigned <b>RE</b>gions in proteins) for the detailed examination of stretches of unassigned regions in proteins. Query sequence is processed using different automated filtering steps based on length, presence of coiled-coil regions, transmembrane regions, homologous sequences and percentage of secondary structure content. Later, the filtered sequence segments and their sequence homologues are fed to PSI-BLAST, cd-hit and Hmmpfam. Data from the various programs are integrated and information regarding the probable domains predicted from the sequence is reported.</p> <p>Conclusion</p> <p>We have implemented PURE protocol as a web server for rapid and comprehensive analysis of unassigned regions in the proteins. This server integrates data from different programs and provides information about the domains encoded in the unassigned regions.</p

    Cross-Genome Comparisons of Newly Identified Domains in Mycoplasma gallisepticum and Domain Architectures with Other Mycoplasma species

    Get PDF
    Accurate functional annotation of protein sequences is hampered by important factors such as the failure of sequence search methods to identify relationships and the inherent diversity in function of proteins related at low sequence similarities. Earlier, we had employed intermediate sequence search approach to establish new domain relationships in the unassigned regions of gene products at the whole genome level by taking Mycoplasma gallisepticum as a specific example and established new domain relationships. In this paper, we report a detailed comparison of the conservation status of the domain and domain architectures of the gene products that bear our newly predicted domains amongst 14 other Mycoplasma genomes and reported the probable implications for the organisms. Some of the domain associations, observed in Mycoplasma that afflict humans and other non-human primates, are involved in regulation of solute transport and DNA binding suggesting specific modes of host-pathogen interactions

    CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Distantly related proteins adopt and retain similar structural scaffolds despite length variations that could be as much as two-fold in some protein superfamilies. In this paper, we describe an analysis of indel regions that accommodate length variations amongst related proteins. We have developed an algorithm CUSP, to examine multi-membered PASS2 superfamily alignments to identify indel regions in an automated manner. Further, we have used the method to characterize the length, structural type and biochemical features of indels in related protein domains.</p> <p>Results</p> <p>CUSP, examines protein domain structural alignments to distinguish regions of conserved structure common to related proteins from structurally unconserved regions that vary in length and type of structure. On a non-redundant dataset of 353 domain superfamily alignments from PASS2, we find that 'length- deviant' protein superfamilies show > 30% length variation from their average domain length. 60% of additional lengths that occur in indels are short-length structures (< 5 residues) while 6% of indels are > 15 residues in length. Structural types in indels also show class-specific trends.</p> <p>Conclusion</p> <p>The extent of length variation varies across different superfamilies and indels show class-specific trends for preferred lengths and structural types. Such indels of different lengths even within a single protein domain superfamily could have structural and functional consequences that drive their selection, underlying their importance in similarity detection and computational modelling. The availability of systematic algorithms, like CUSP, should enable decision making in a domain superfamily-specific manner.</p

    Analysis on conservation of disulphide bonds and their structural features in homologous protein domain families

    Get PDF
    International audienceBackground: Disulphide bridges are well known to play key roles in stability, folding and functions of proteins. Introduction or deletion of disulphides by site-directed mutagenesis have produced varying effects on stability and folding depending upon the protein and location of disulphide in the 3-D structure. Given the lack of complete understanding it is worthwhile to learn from an analysis of extent of conservation of disulphides in homologous proteins. We have also addressed the question of what structural interactions replaces a disulphide in a homologue in another homologue.Results: Using a dataset involving 34,752 pairwise comparisons of homologous protein domains corresponding to 300 protein domain families of known 3-D structures, we provide a comprehensive analysis of extent of conservation of disulphide bridges and their structural features. We report that only 54% of all the disulphide bonds compared between the homologous pairs are conserved, even if, a small fraction of the non-conserved disulphides do include cytoplasmic proteins. Also, only about one fourth of the distinct disulphides are conserved in all the members in protein families. We note that while conservation of disulphide is common in many families, disulphide bond mutations are quite prevalent. Interestingly, we note that there is no clear relationship between sequence identity between two homologous proteins and disulphide bond conservation. Our analysis on structural features at the sites where cysteines forming disulphide in one homologue are replaced by non-Cys residues show that the elimination of a disulphide in a homologue need not always result in stabilizing interactions between equivalent residues.Conclusion: We observe that in the homologous proteins, disulphide bonds are conserved only to a modest extent. Very interestingly, we note that extent of conservation of disulphide in homologous proteins is unrelated to the overall sequence identity between homologues. The non-conserved disulphides are often associated with variable structural features that were recruited to be associated with differentiation or specialisation of protein function

    Genetic background and immunological status influence B cell repertoire diversity in mice

    Get PDF
    International audienceThe relationship between the immune repertoire and the physiopathological status of individuals is essential to apprehend the genesis and the evolution of numerous pathologies. Nevertheless, the methodological approaches to understand these complex interactions are challenging. We performed a study evaluating the diversity harbored by different immune repertoires as a function of their physiopathological status. In this study, we base our analysis on a murine scFv library previously described and representing four different immune repertoires: i) healthy and naïve, ii) healthy and immunized, iii) autoimmune prone and naïve, and iv) autoimmune prone and immunized. This library, 2.6 × 10 9 in size, is submitted to high throughput sequencing (Next Generation Sequencing, NGS) in order to analyze the gene subgroups encoding for immunoglobulins. A comparative study of the distribution of immunoglobulin gene subgroups present in the four libraries has revealed shifts in the B cell repertoire originating from differences in genetic background and immunological status of mice. The adaptive immune system is capable of producing antibodies against a large number of immunogens. This vast diversity of immunoglobulin sequences is not provided by the limited number of genes present in the genome, but by rearrangements of the germline at specific loci. In the case of B cell receptors, rearrangement of variable (V), diversity (D), and joining (J) gene segments in V-Domain creates a combinatorial diversity for the immu-noglobulin heavy chain (IGH), whereas rearrangement of V and J gene segments provides a similar diversity for the lambda or kappa light chains (IGL/IGK) 1 (Fig. 1). Additionally, at the junctions of V-D and D-J segments, a process of random deletion and addition of nucleotides creates an immense junctional diversity. Finally, somatic hypermutations focused on Complementary Determining Regions (CDR) supplement the mechanisms of immu-noglobulin maturation, expanding still further the diversity and leading to affine and specific antibodies. Studies have shown that this vast diversity, as well as other characteristics of the immune repertoire, can be influenced by factors such as immunization 2,3 or pathology, notably autoimmune diseases 4-6. Generation of antibody libraries is a crucial step in the attempt to study in vivo immune repertoires 7,8. Care needs to be taken to ensure the coverage of a large antibody sequence diversity in order to mimic the natural B cell repertoire as close as possible. Recently, we have described an original strategy allowing to improve the library construction process and increase its diversity 9. This strategy is based on a technological optimization relying on Rolling Circle Amplification (RCA), combined with a newly designed set of oligonucleotide primers based on a thorough analysis of the IMGT/LIGM-DB database 10. In the present study, we have used this strategy to generate libraries form two murine inbred strains were used, namely Balb/C (healthy) and SJL/J (susceptible to autoimmune disease), together representing 4 different IgG immune repertoires: i) healthy and naïve (NB for naïve Balb/C), ii) healthy and immunized (IB for immunized Balb/C), iii) autoimmune prone and naïve (NS for naïve SJL/J), and iv) autoimmune prone and immunized (IS for immunized SJL/J) 11. We have decidedly chosen t

    A short survey on protein blocks.

    Get PDF
    International audienceProtein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and β-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as "structural alphabets". We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications

    Systematic search for putative new domain families in Mycoplasma gallisepticum genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.</p> <p>Findings</p> <p>We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of <it>Mycoplasma gallisepticum </it>genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome.</p> <p>Conclusions</p> <p>The systematic analysis of the unassigned regions in the <it>Mycoplasma gallisepticum </it>genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.</p
    corecore