23 research outputs found

    : Protein Long Local Structure Prediction

    Get PDF
    International audienceA relevant and accurate description of three-dimensional (3D) protein structures can be achieved by characterizing recurrent local structures. In a previous study, we developed a library of 120 3D structural prototypes encompassing all known 11-residues long local protein structures and ensuring a good quality of structural approximation. A local structure prediction method was also proposed. Here, overlapping properties of local protein structures in global ones are taken into account to characterize frequent local networks. At the same time, we propose a new long local structure prediction strategy which involves the use of evolutionary information coupled with Support Vector Machines (SVMs). Our prediction is evaluated by a stringent geometrical assessment. Every local structure prediction with a Calpha RMSD less than 2.5 A from the true local structure is considered as correct. A global prediction rate of 63.1% is then reached, corresponding to an improvement of 7.7 points compared with the previous strategy. In the same way, the prediction of 88.33% of the 120 structural classes is improved with 8.65% mean gain. 85.33% of proteins have better prediction results with a 9.43% average gain. An analysis of prediction rate per local network also supports the global improvement and gives insights into the potential of our method for predicting super local structures. Moreover, a confidence index for the direct estimation of prediction quality is proposed. Finally, our method is proved to be very competitive with cutting-edge strategies encompassing three categories of local structure predictions. Proteins 2009. (c) 2009 Wiley-Liss, Inc

    Efficient Sampling of Protein Folding Pathways using HMMSTR and Efficient Sampling of Protein Folding Pathways using HMMSTR and

    No full text

    In Silico Studies on DARC.

    Get PDF
    International audienceThe Duffy Antigen/Receptor for Chemokine (DARC) is a seven segment transmembrane protein. It was firstly discovered as a blood group antigen and was the first specific gene locus assigned to a specific autosome in man. It became more famous as an erythrocyte receptor for malaria parasites (Plasmodium vivax and Plasmodium knowlesi), and finally for chemokines. DARC is an unorthodox chemokine receptor as (i) it binds chemokines of both CC and CXC classes and (ii) it lacks the Asp-Arg-Tyr consensus motif in its second cytoplasmic loop hence cannot couple to G proteins and activate their signaling pathways. DARC had also been associated to cancer progression, numerous inflammatory diseases, and possibly to AIDS. In this review, we will summarize important biological data on DARC. Then we shall focus on recent development of the elaboration and analyzes of structural models of DARC. We underline the difficulty to propose pertinent structural models of transmembrane protein using comparative modeling process, and other dedicated approaches as the Protein Blocks. The chosen structural models encompass most of the biochemical data known to date. Finally, we present recent development of protein - protein docking between DARC structural models and CXCL-8 structures. We propose a hierarchal search based on separated rigid and flexible docking

    Investigating the world of protein design

    Get PDF
    In this thesis, methods are developed for the design of protein structures based only on abstract structural descriptions of the protein fold. The design protocol starts with rough alpha-carbon (backbone) models and progress through several stages of high-resolution refinement, sequence design and filtering according to known principles of protein structure, coarse-grained and high-resolution knowledge-based potentials and secondary and tertiary structural prediction methods. Following this protocol led to the identification of protein solubility as a major limiting factor in the progression of designs. To overcome this problem, a systematic analysis was undertaken focusing on the more restricted problem of sequence redesign of the native structure to find common principles that govern viable protein sequences that can then be applied to the design of novel structures. A factorial design of experiments approach was used to screen a multitude of sequence redesigns for known backbones that each possess a unique set of properties. The behaviour of each redesign was characterised when expressed in E. Coli in the hopes of elucidating some common features indicative of a viable sequence. Even with the use of fractional factorial design, the number of experimental designs required was limiting and to test the approach, we adopted an ab initio prediction method as a proxy for the "wet" experiments. This allowed us to increase the information gain for each experiment and we found that optimizing towards secondary structure prediction had a negative effect on the predictability of sequences. Following this, an improved design methodology was developed that uses a genetic algorithm to produce sequence redesigns which look realistic to a number of computational measures such as sequence composition, both ab initio and comparative modelling prediction methods, and have a high degree of native sequence recapitulation (a good indicator for a viable design method). Although these state-of-the-art measures all point toward 3 our sequence designs being on-par with (or better than) native sequences, the problem of solubility and foldedness remained. Machine learning techniques were used to unravel some of the complex intricacies governing these two properties. Using this approach, we discovered features that a substantial portion of native sequences comply with but were missing from our designs. Using the genetic algorithm design method, we directed our sequences to this area of attribute space in the hopes that this would produce more realistic designs. Unfortunately, solubility of designs still remained a large problem. We also tested the relatively new technique of convergent peptide synthesis to understand how valuable it could be in a synthetic biology context. After designing a single Leucine-rich repeat, we used peptide chemistry to synthetically build the protein before investigating solubility and foldedness. Upon producing a single repeat of this designed protein, multiple repeats were linked together and the resulting properties characterised. A single 28-mer peptide was produced that was soluble and folded

    Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

    Get PDF
    Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied

    New Approaches to Protein Structure Prediction

    Get PDF
    Protein structure prediction is concerned with the prediction of a protein's three dimensional structure from its amino acid sequence. Such predictions are commonly performed by searching the possible structures and evaluating each structure by using some scoring function. If it is assumed that the target protein structure resembles the structure of a known protein, the search space can be significantly reduced. Such an approach is referred to as comparative structure prediction. When such an assumption is not made, the approach is known as ab initio structure prediction. There are several difficulties in devising efficient searches or in computing the scoring function. Many of these problems have ready solutions from known mathematical methods. However, the problems that are yet unsolved have hindered structure prediction methods from more ideal predictions. The objective of this study is to present a complete framework for ab initio protein structure prediction. To achieve this, a new search strategy is proposed, and better techniques are devised for computing the known scoring functions. Some of the remaining problems in protein structure prediction are revisited. Several of them are shown to be intractable. In many of these cases, approximation methods are suggested as alternative solutions. The primary issues addressed in this thesis are concerned with local structures prediction, structure assembly or sampling, side chain packing, model comparison, and structural alignment. For brevity, we do not elaborate on these problems here; a concise introduction is given in the first section of this thesis. Results from these studies prompted the development of several programs, forming a utility suite for ab initio protein structure prediction. Due to the general usefulness of these programs, some of them are released with open source licenses to benefit the community
    corecore