218 research outputs found
From sequence to structure, to function, and back again: Integrating knowledge-based approaches with physical intuitions for protein folding, binding, and design
poster abstractMost biological activities are directed and/or regulated by proteins made of a gene-specified sequence of 20 amino-acid residue types. As a result, function or malfunction of specific proteins is responsible for almost all diseases. Proteins perform their function through their unique, self-assembled (folded) three-dimensional structures and through their specific binding to small molecules, to DNA/RNA (e.g. transcription factors that regulate gene expressions), or to other proteins (e.g. molecular recognition in signal transduction). Thus, how to predict the structure of a protein from its amino-acid sequence, discover the function from its structure and, then, design the sequence from its function or structure are the most essential problems in structural biology. In this poster, we will illustrate how the coupling of physical intuitions with learning from structural databases can go a long way toward untangling the complex relation between sequence, structure and function of proteins
Folding thermodynamics of model four-strand antiparallel beta-sheet proteins
The thermodynamic properties for three different types of off-lattice
four-strand beta-sheet protein models interacting via a hybrid Go-type
potential have been investigated. Discontinuous molecular dynamic simulations
have been performed for different sizes of the bias gap g, an artificial
measure of a model protein's preference for its native state. The thermodynamic
transition temperatures are obtained by calculating the squared radius of
gyration, the root-mean-squared pair separation fluctuation, the specific heat,
the internal energy of the system, and the Lindemann disorder parameter. In
spite of the simplicity, the protein-like heteropolymers have shown a complex
set of protein transitions as observed in experimental studies. Starting from
high temperature, these transitions include a collapse transition, a
disordered-to-ordered globule transition, a folding transition, and a
liquid-to-solid transition. These transitions strongly depend on the
native-state geometry of the model proteins and the size of the bias gap. A
strong transition from the disordered globule state to the ordered globule
state with large energy change and a weak transition from the ordered globule
state to the native state with small energy change were observed for the large
gap models. For the small gap models no native structures were observed at any
temperature, all three beta-sheet proteins fold into a partially-ordered
globule state which is geometrically different from the native state. For small
bias gaps at even lower temperatures, all protein motions are frozen indicating
an inactive solid-like phase.Comment: PDF file, 32 pages including 13 figure page
Accurate single-sequence prediction of solvent accessible surface area using local and global features
We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org
LEAP: highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains
Prediction of protein loop conformations without any prior knowledge (ab initio prediction) is an unsolved problem. Its solution will significantly impact protein homology and template-based modeling as well as ab initio protein-structure prediction. Here, we developed a coarse-grained, optimized scoring function for initial sampling and ranking of loop decoys. The resulting decoys are then further optimized in backbone and side-chain conformations and ranked by all-atom energy scoring functions. The final integrated technique called loop prediction by energy-assisted protocol achieved a median value of 2.1 Ă… root mean square deviation (RMSD) for 325 12-residue test loops and 2.0 Ă… RMSD for 45 12-residue loops from critical assessment of structure-prediction techniques (CASP) 10 target proteins with native core structures (backbone and side chains). If all side-chain conformations in protein cores were predicted in the absence of the target loop, loop-prediction accuracy only reduces slightly (0.2 Ă… difference in RMSD for 12-residue loops in the CASP target proteins). The accuracy obtained is about 1 Ă… RMSD or more improvement over other methods we tested. The executable file for a Linux system is freely available for academic users at http://sparks-lab.org
Web-based toolkits for topology prediction of transmembrane helical proteins, fold recognition, structure and binding scoring, folding-kinetics analysis and comparative analysis of domain combinations
We have developed the following web servers for protein structural modeling and analysis at http:// theory.med.buffalo.edu: THUMBUP, UMDHMMTMHP and TUPS, predictors of trans-membrane helical protein topology based on a mean-burial-propensity scale of amino acid residues (THUMBUP), hidden Markov model (UMDHMMTMHP) and their combinations (TUPS); SPARKS 2.0 and SP3, two profile– profile alignment methods, that match input query sequence(s) to structural templates by integrating sequence profile with knowledge-based structural score (SPARKS 2.0) and structure-derived profile (SP3); DFIRE, a knowledge-based potential for scoring free energy of monomers (DMONOMER), loop conformations (DLOOP), mutant stability (DMUTANT) and binding affinity of protein–protein/ peptide/DNA complexes (DCOMPLEX & DDNA); TCD, a program for protein-folding rate and transition-state analysis of small globular proteins; and DOGMA, a web-server that allows comparative analysis of domain combinations between plant and other 55 organisms. These servers provide tools for prediction and/or analysis of proteins on the secondary structure, tertiary structure and interaction levels, respectively
SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model
How to recognize the structural fold of a protein is one of the challenges in protein structure prediction. We have developed a series of single (non-consensus) methods (SPARKS, SP2, SP3, SP4) that are based on weighted matching of two to four sequence and structure-based profiles. There is a robust improvement of the accuracy and sensitivity of fold recognition as the number of matching profiles increases. Here, we introduce a new profile-profile comparison term based on real-value dihedral torsion angles. Together with updated real-value solvent accessibility profile and a new variable gap-penalty model based on fractional power of insertion/deletion profiles, the new method (SP5) leads to a robust improvement over previous SP method. There is a 2% absolute increase (5% relative improvement) in alignment accuracy over SP4 based on two independent benchmarks. Moreover, SP5 makes 7% absolute increase (22% relative improvement) in success rate of recognizing correct structural folds, and 32% relative improvement in model accuracy of models within the same fold in Lindahl benchmark. In addition, modeling accuracy of top-1 ranked models is improved by 12% over SP4 for the difficult targets in CASP 7 test set. These results highlight the importance of harnessing predicted structural properties in challenging remote-homolog recognition. The SP5 server is available at http://sparks.informatics.iupui.edu
Web-based toolkits for topology prediction of transmembrane helical proteins, fold recognition, structure and binding scoring, folding-kinetics analysis and comparative analysis of domain combinations
We have developed the following web servers for protein structural modeling and analysis at : THUMBUP, UMDHMM(TMHP) and TUPS, predictors of transmembrane helical protein topology based on a mean-burial-propensity scale of amino acid residues (THUMBUP), hidden Markov model (UMDHMM(TMHP)) and their combinations (TUPS); SPARKS 2.0 and SP(3), two profile–profile alignment methods, that match input query sequence(s) to structural templates by integrating sequence profile with knowledge-based structural score (SPARKS 2.0) and structure-derived profile (SP(3)); DFIRE, a knowledge-based potential for scoring free energy of monomers (DMONOMER), loop conformations (DLOOP), mutant stability (DMUTANT) and binding affinity of protein–protein/peptide/DNA complexes (DCOMPLEX & DDNA); TCD, a program for protein-folding rate and transition-state analysis of small globular proteins; and DOGMA, a web-server that allows comparative analysis of domain combinations between plant and other 55 organisms. These servers provide tools for prediction and/or analysis of proteins on the secondary structure, tertiary structure and interaction levels, respectively
Folding rate prediction using total contact distance
© 2002 by the Biophysical SocietyLinear regression analysis found that either contact order (CO) or long-range order (LRO) parameter has a significant correlation with the logarithms of folding rates. This suggests that sequence separation per contact and total number of contacts are both important in determining the rate of folding. Here, the two factors are incorporated into a new parameter, total contact distance (TCD). Using a database of 28 two-state or weakly three-state folding proteins, TCD is found to be the most accurate among the three parameters (CO, LRO, and TCD) in terms of correlation and prediction. It provides even more accurate prediction than the best neural network results with two descriptors (contact order and stability per residue). The improvement is achieved in all three-structural classes (all _, _, and mixed). The accuracy of total contact distance in predicting folding rates is essentially unchanged if “short”-ranged contacts (_i _ j_ _ 14) are not included in calculation. Thus, only long-range contacts with a sequence separation of more than 14 residues are important in determining the rate of folding. This is consistent with the results from the long-range order parameter. One of the significant outliers in prediction is found to be associated with the only protein in the database that involves nonlocal disulfide bonds. Removing the protein leads to a correlation coefficient of 0.89 between experimental observed and predicted folding rates in jackknife cross validation. The corresponding values for CO and LRO are 0.71 and 0.80, respectively
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical science.</p
- …