64 research outputs found

    Mining protein loops using a structural alphabet and statistical exceptionality

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.</p> <p>Results</p> <p>We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.</p> <p>Conclusions</p> <p>We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

    Crystal structure of human XLF/Cernunnos reveals unexpected differences from XRCC4 with implications for NHEJ

    Get PDF
    The recently characterised 299-residue human XLF/Cernunnos protein plays a crucial role in DNA repair by non-homologous end joining (NHEJ) and interacts with the XRCC4–DNA Ligase IV complex. Here, we report the crystal structure of the XLF (1–233) homodimer at 2.3 Å resolution, confirming the predicted structural similarity to XRCC4. The XLF coiled-coil, however, is shorter than that of XRCC4 and undergoes an unexpected reverse in direction giving rise to a short distorted four helical bundle and a C-terminal helical structure wedged between the coiled-coil and head domain. The existence of a dimer as the major species is confirmed by size-exclusion chromatography, analytical ultracentrifugation, small-angle X-ray scattering and other biophysical methods. We show that the XLF structure is not easily compatible with a proposed XRCC4:XLF heterodimer. However, we demonstrate interactions between dimers of XLF and XRCC4 by surface plasmon resonance and analyse these in terms of surface properties, amino-acid conservation and mutations in immunodeficient patients. Our data are most consistent with head-to-head interactions in a 2:2:1 XRCC4:XLF:Ligase IV complex

    MSDmotif: exploring protein sites and motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure.</p> <p>Results</p> <p>We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses.</p> <p>Conclusion</p> <p>MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.</p

    Kinase Activity Profiling of Pneumococcal Pneumonia

    Get PDF
    Background: Pneumonia represents a major health burden. Previous work demonstrated that although the induction of inflammation is important for adequate host defense against pneumonia, an inability to regulate the host's inflammatory response within the lung later during infection can be detrimental. Intracellular signaling pathways commonly rely on activation of kinases, and kinases play an essential role in the regulation of the inflammatory response of immune cells. Methodology/Principal Findings: Pneumonia was induced in mice via intranasal instillation of Streptococcus (S.) pneumoniae. Kinomics peptide arrays, exhibiting 1024 specific consensus sequences for protein kinases, were used to produce a systems biology analysis of cellular kinase activity during the course of pneumonia. Several differences in kinase activity revealed by the arrays were validated in lung homogenates of individual mice using western blot. We identified cascades of activated kinases showing that chemotoxic stress and a T helper 1 response were induced during the course of pneumococcal pneumonia. In addition, our data point to a reduction in WNT activity in lungs of S. pneumoniae infected mice. Moreover, this study demonstrated a reduction in overall CDK activity implying alterations in cell cycle biology. Conclusions/Significance: This s

    Highly Precise and Developmentally Programmed Genome Assembly in Paramecium Requires Ligase IV–Dependent End Joining

    Get PDF
    During the sexual cycle of the ciliate Paramecium, assembly of the somatic genome includes the precise excision of tens of thousands of short, non-coding germline sequences (Internal Eliminated Sequences or IESs), each one flanked by two TA dinucleotides. It has been reported previously that these genome rearrangements are initiated by the introduction of developmentally programmed DNA double-strand breaks (DSBs), which depend on the domesticated transposase PiggyMac. These DSBs all exhibit a characteristic geometry, with 4-base 5′ overhangs centered on the conserved TA, and may readily align and undergo ligation with minimal processing. However, the molecular steps and actors involved in the final and precise assembly of somatic genes have remained unknown. We demonstrate here that Ligase IV and Xrcc4p, core components of the non-homologous end-joining pathway (NHEJ), are required both for the repair of IES excision sites and for the circularization of excised IESs. The transcription of LIG4 and XRCC4 is induced early during the sexual cycle and a Lig4p-GFP fusion protein accumulates in the developing somatic nucleus by the time IES excision takes place. RNAi–mediated silencing of either gene results in the persistence of free broken DNA ends, apparently protected against extensive resection. At the nucleotide level, controlled removal of the 5′-terminal nucleotide occurs normally in LIG4-silenced cells, while nucleotide addition to the 3′ ends of the breaks is blocked, together with the final joining step, indicative of a coupling between NHEJ polymerase and ligase activities. Taken together, our data indicate that IES excision is a “cut-and-close” mechanism, which involves the introduction of initiating double-strand cleavages at both ends of each IES, followed by DSB repair via highly precise end joining. This work broadens our current view on how the cellular NHEJ pathway has cooperated with domesticated transposases for the emergence of new mechanisms involved in genome dynamics

    Ancient and Recent Adaptive Evolution of Primate Non-Homologous End Joining Genes

    Get PDF
    In human cells, DNA double-strand breaks are repaired primarily by the non-homologous end joining (NHEJ) pathway. Given their critical nature, we expected NHEJ proteins to be evolutionarily conserved, with relatively little sequence change over time. Here, we report that while critical domains of these proteins are conserved as expected, the sequence of NHEJ proteins has also been shaped by recurrent positive selection, leading to rapid sequence evolution in other protein domains. In order to characterize the molecular evolution of the human NHEJ pathway, we generated large simian primate sequence datasets for NHEJ genes. Codon-based models of gene evolution yielded statistical support for the recurrent positive selection of five NHEJ genes during primate evolution: XRCC4, NBS1, Artemis, POLλ, and CtIP. Analysis of human polymorphism data using the composite of multiple signals (CMS) test revealed that XRCC4 has also been subjected to positive selection in modern humans. Crystal structures are available for XRCC4, Nbs1, and Polλ; and residues under positive selection fall exclusively on the surfaces of these proteins. Despite the positive selection of such residues, biochemical experiments with variants of one positively selected site in Nbs1 confirm that functions necessary for DNA repair and checkpoint signaling have been conserved. However, many viruses interact with the proteins of the NHEJ pathway as part of their infectious lifecycle. We propose that an ongoing evolutionary arms race between viruses and NHEJ genes may be driving the surprisingly rapid evolution of these critical genes
    corecore