1,739 research outputs found

    Mining protein loops using a structural alphabet and statistical exceptionality

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.</p> <p>Results</p> <p>We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.</p> <p>Conclusions</p> <p>We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

    Hierarchy of protein loop-lock structures: a new server for the decomposition of a protein structure into a set of closed loops

    Full text link
    HoPLLS (Hierarchy of protein loop-lock structures) (http://leah.haifa.ac.il/~skogan/Apache/mydata1/main.html) is a web server that identifies closed loops - a structural basis for protein domain hierarchy. The server is based on the loop-and-lock theory for structural organisation of natural proteins. We describe this web server, the algorithms for the decomposition of a 3D protein into loops and the results of scientific investigations into a structural "alphabet" of loops and locks.Comment: 11 pages, 4 figure

    Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function.</p> <p>Results</p> <p>Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM.</p> <p>Conclusions</p> <p>Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.</p

    Structural alphabets derived from attractors in conformational space

    Get PDF
    Background: The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis.Results: A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness.Conclusions: The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics. © 2010 Pandini et al; licensee BioMed Central Ltd

    Artificial and Natural Genetic Information Processing

    Get PDF
    Conventional methods of genetic engineering and more recent genome editing techniques focus on identifying genetic target sequences for manipulation. This is a result of historical concept of the gene which was also the main assumption of the ENCODE project designed to identify all functional elements in the human genome sequence. However, the theoretical core concept changed dramatically. The old concept of genetic sequences which can be assembled and manipulated like molecular bricks has problems in explaining the natural genome-editing competences of viruses and RNA consortia that are able to insert or delete, combine and recombine genetic sequences more precisely than random-like into cellular host organisms according to adaptational needs or even generate sequences de novo. Increasing knowledge about natural genome editing questions the traditional narrative of mutations (error replications) as essential for generating genetic diversity and genetic content arrangements in biological systems. This may have far-reaching consequences for our understanding of artificial genome editing

    SA-Mot: a web server for the identification of motifs of interest extracted from protein loops

    Get PDF
    The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr

    In silico local structure approach: A case study on Outer Membrane Proteins.

    Get PDF
    International audienceThe detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results. Proteins 2007. (c) 2007 Wiley-Liss, Inc

    Encoding folding paths of RNA switches

    Get PDF
    RNA co-transcriptional folding has long been suspected to play an active role in helping proper native folding of ribozymes and structured regulatory motifs in mRNA untranslated regions. Yet, the underlying mechanisms and coding requirements for efficient co-transcriptional folding remain unclear. Traditional approaches have intrinsic limitations to dissect RNA folding paths, as they rely on sequence mutations or circular permutations that typically perturb both RNA folding paths and equilibrium structures. Here, we show that exploiting sequence symmetries instead of mutations can circumvent this problem by essentially decoupling folding paths from equilibrium structures of designed RNA sequences. Using bistable RNA switches with symmetrical helices conserved under sequence reversal, we demonstrate experimentally that native and transiently formed helices can guide efficient co-transcriptional folding into either long-lived structure of these RNA switches. Their folding path is controlled by the order of helix nucleations and subsequent exchanges during transcription, and may also be redirected by transient antisense interactions. Hence, transient intra- and intermolecular base pair interactions can effectively regulate the folding of nascent RNA molecules into different native structures, provided limited coding requirements, as discussed from an information theory perspective. This constitutive coupling between RNA synthesis and RNA folding regulation may have enabled the early emergence of autonomous RNA-based regulation networks.Comment: 9 pages, 6 figure
    corecore