59 research outputs found

    MESSM: a framework for protein threading by neural networks and support vector machines

    Get PDF
    Protein threading, which is also referred to as fold recognition, aligns a probe amino acid sequence onto a library of representative folds of known structure to identify a structural similarity. Following the threading technique of the structural profile approach, this research focused on developing and evaluating a new framework - Mixed Environment Specific Substitution Mapping (MESSM) - for protein threading by artificial neural networks (ANNs) and support vector machines (SVMs). The MESSM presents a new process to develop an efficient tool for protein fold recognition. It achieved better efficiency while retained the effectiveness on protein prediction. The MESSM has three key components, each of which is a step in the protein threading framework. First, building the fold profile library-given a protein structure with a residue level environmental description, Neural Networks are used to generate an environment-specific amino acid substitution (3D-1D) mapping. Second, mixed substitution mapping--a mixed environment-specific substitution mapping is developed by combing the structural-derived substitution score with sequence profile from well-developed amino acid substitution matrices. Third, confidence evaluation--a support vector machine is employed to measure the significance of the sequence-structure alignment. Four computational experiments are carried out to verify the performance of the MESSM. They are Fischer, ProSup, Lindahl and Wallner benchmarks. Tested on Fischer, Lindahl and Wallner benchmarks, MESSM achieved a comparable performance on fold recognition to those energy potential based threading models. For Fischer benchmark, MESSM correctly recognise 56 out of 68 pairs, which has the same performance as that of COBLATH and SPARKS. The computational experiments show that MESSM is a fast program. It could make an alignment between probe sequence (150 amino acids) and a profile of 4775 template proteins in 30 seconds on a PC with IG memory Pentium IV. Also, tested on ProSup benchmark, the MESSM achieved alignment accuracy of 59.7%, which is better than current models. The research work was extended to develop a threading score following the threading technique of the contact potential approach. A TES (Threading with Environment-specific Score) model is constructed by neural networks

    Mutation causing self-aggregation in human γC-crystallin leading to congenital cataract

    Get PDF
    Purpose: Many forms of congenital hereditary cataract are associated with mutations in the crystallin genes. The authors focus attention on congenital lamellar cataract, which is associated with the R168W mutation in γC-crystallin, and congenital zonular pulverulent cataract, which is associated with a 5-bp insertion in the γC-crystallin gene. Methods: To understand the molecular phenotypes-i.e., the functional defects that have occurred in the mutant γC-crystallin molecule in two cases described-the authors cloned, expressed, isolated, and compared the solution state structural features of these mutants with those of normal (wild-type) γC-crystallin. Structural models of the wild-type and mutant have been generated using comparative modeling. Circular dichroism and fluorescence spectroscopic methods were used to determine the conformation of the proteins, and temperature dependent self-aggregation was used to observe the quaternary structural features. The structural stability of the proteins was monitored with the use of chemical and thermal denaturation. Results: The authors found that the 5-bp insertion led to a loss of secondary and tertiary structures of the molecule and to an enhanced tendency of self-aggregation into light-scattering particles, offering a possible factor in lens opacification. The R168W mutant, on the other hand, was remarkably similar to the wild-type molecule in its conformation and structural stability, but it differed in its ability to aggregate and scatter light. Conclusions: These results support the idea that unfolding or structural destabilization is not always necessary for crystallin-associated cataractogenesis

    New Methods to Improve Protein Structure Modeling

    Get PDF
    Proteins are considered the central compound necessary for life, as they play a crucial role in governing several life processes by performing the most essential biological and chemical functions in every living cell. Understanding protein structures and functions will lead to a significant advance in life science and biology. Such knowledge is vital for various fields such as drug development and synthetic biofuels production. Most proteins have definite shapes that they fold into, which are the most stable state they can adopt. Due to the fact that the protein structure information provides important insight into its functions, many research efforts have been conducted to determine the protein 3-dimensional structure from its sequence. The experimental methods for protein 3-dimensional structure determination are often time-consuming, costly, and even not feasible for some proteins. Accordingly, recent research efforts focus more and more on computational approaches to predict protein 3-dimensional structures. Template-based modeling is considered one of the most accurate protein structure prediction methods. The success of template-based modeling relies on correctly identifying one or a few experimentally determined protein structures as structural templates that are likely to resemble the structure of the target sequence as well as accurately producing a sequence alignment that maps the residues in the target sequence to those in the template. In this work, we aim at improving the template-based protein structure modeling by enhancing the correctness of identifying the most appropriate templates and precisely aligning the target and template sequences. Firstly, we investigate employing inter-residue contact score to measure the favorability of a target sequence fitting in the folding topology of a certain template. Secondly, we design a multi-objective alignment algorithm extending the famous Needleman-Wunsch algorithm to obtain a complete set of alignments yielding Pareto optimality. Then, we use protein sequence and structural information as objectives and generate the complete Pareto optimal front of alignments between target sequence and template. The alignments obtained enable one to analyze the trade-offs between the potentially conflicting objectives. These approaches lead to accuracy enhancement in template-based protein structure modeling

    Towards protein function annotations for matching remote homologs

    Get PDF
    Identifying functional similarities for proteins with low sequence identity and low structure similarity often suffers from high false positives and false negatives results. To improve the functional prediction ability based on the local protein structures, we proposed two different refinement and filtering approaches. We built a statistical model (known as Markov Random Field) to describe protein functional site structure. We also developed filters that consider the local environment around the active sites to remove the false positives. Our experimental results, as evaluated in five sets of enzyme families with less than 40% sequence identity, demonstrated that our methods can obtain more remote homologs that could not be detected by traditional sequence-based methods. At the same time, our method could reduce large amount of random matches. Our methods could improve up to 70% of the functional annotation ability (measured by their Area under the ROC curve) in extended motif method

    A multi-template combination algorithm for protein comparative modeling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available.</p> <p>Results</p> <p>Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target.</p> <p>We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10<sup>-4</sup>). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure.</p> <p>Conclusion</p> <p>We have developed a novel multi-template algorithm to improve protein comparative modeling.</p

    Consensus Fold Recognition by Predicted Model Quality

    Get PDF
    Protein structure prediction has been a fundamental challenge in the biological field. In this post-genomic era, the need for automated protein structure prediction has never been more evident and researchers are now focusing on developing computational techniques to predict three-dimensional structures with high throughput. Consensus-based protein structure prediction methods are state-of-the-art in automatic protein structure prediction. A consensus-based server combines the outputs of several individual servers and tends to generate better predictions than any individual server. Consensus-based methods have proved to be successful in recent CASP (Critical Assessment of Structure Prediction). In this thesis, a Support Vector Machine (SVM) regression-based consensus method is proposed for protein fold recognition, a key component for high throughput protein structure prediction and protein function annotation. The SVM first extracts the features of a structural model by comparing the model to the other models produced by all the individual servers. Then, the SVM predicts the quality of each model. The experimental results from several LiveBench data sets confirm that our proposed consensus method, SVM regression, consistently performs better than any individual server. Based on this method, we developed a meta server, the Alignment by Consensus Estimation (ACE)

    A novel framework for protein structure prediction

    Get PDF
    The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file.Title from title screen of research.pdf file (viewed on March 23, 2009)Vita.Thesis (Ph.D.) University of Missouri-Columbia 2007.Proteins are one of the most important molecules in the life processes. The structure of a protein is essential in understanding the function of a protein at the molecular level. Due to rapid progress in sequencing technologies, the gap between the proteins whose structure is known and the proteins whose structure needs to be characterized is rapidly increasing. To address this problem, we are developing a novel framework to computationally predict many aspects of proteins like secondary structure, solvent accessibility, contact map and finally, the tertiary structure itself. We have applied various computational techniques including the fuzzy k-nearest neighbor algorithm, the multi-dimensional scaling method, and the least-squares minimization, in the structure predictions. Our framework uses the evolutionary information more effectively than traditional template based methods, while it has a better potential to utilize the information in PDB than the other evolutionary information based methods. Our methods show better performance in prediction accuracy and computational time than many other tools.Includes bibliographical reference

    STRUCTURAL MODELING OF PROTEIN-PROTEIN INTERACTIONS USING MULTIPLE-CHAIN THREADING AND FRAGMENT ASSEMBLY

    Get PDF
    Since its birth, the study of protein structures has made progress with leaps and bounds. However, owing to the expenses and difficulties involved, the number of protein structures has not been able to catch up with the number of protein sequences and in fact has steadily lost ground. This necessitated the development of high-throughput but accurate computational algorithms capable of predicting the three dimensional structure of proteins from its amino acid sequence. While progress has been made in the realm of protein tertiary structure prediction, the advancement in protein quaternary structure prediction has been limited by the fact that the degree of freedom for protein complexes is even larger and even fewer number of protein complex structures are present in the PDB library. In fact, protein complex structure prediction till date has largely remained a docking problem where automated algorithms aim to predict the protein complex structure starting from the unbound crystal structure of its component subunits and thus has remained largely limited in terms of scope. Secondly, since docking essentially treats the unbound subunits as "rigid-bodies" it has limited accuracy when conformational change accompanies protein-protein interaction. In one of the first of its kind effort, this study aims for the development of protein complex structure algorithms which require only the amino acid sequence of the interacting subunits as input. The study aimed to adapt the best features of protein tertiary structure prediction including template detection and ab initio loop modeling and extend it for protein-protein complexes thus requiring simultaneous modeling of the three dimensional structure of the component subunits as well as ensuring the correct orientation of the chains at the protein-protein interface. Essentially, the algorithms are dependent on knowledge-based statistical potentials for both fold recognition and structure modeling. First, as a way to compare known structure of protein-protein complexes, a complex structure alignment program MM-align was developed. MM-align joins the chains of the complex structures to be aligned to form artificial monomers in every possible order. It then aligns them using a heuristic dynamic programming based approach using TM-score as the objective function. However, the traditional NW dynamic programming was redesigned to prevent the cross alignment of chains during the structure alignment process. Driven by the knowledge obtained from MM-align that protein complex structures share evolutionary relationships and the current protein complex structure library already contains homologous/structurally analogous protein quaternary structure families, a dimeric threading approach, COTH was designed. The new threading-recombination approach boosts the protein complex structure library by combining tertiary structure templates with complex alignments. The query sequences are first aligned to complex templates using the modified dynamic programming algorithm, guided by a number of predicted structural features including ab initio binding-site predictions. Finally, a template-based complex structure prediction approach, TACOS, was designed to build full-length protein complex structures starting from the initial templates identified by COTH. TACOS, fragments the templates aligned regions of templates and reassembles them while building the structure of the threading unaligned region ab inito using a replica-exchange monte-carlo simulation procedure. Simultaneously, TACOS also searches for the best orientation match of the component structures driven by a number of knowledge-based potential terms. Overall, TACOS presents the one of the first approach capable of predicting full length protein complex structures from sequence alone and introduces a new paradigm in the field of protein complex structure modeling
    • …
    corecore