2,205 research outputs found
Trends in template/fragment-free protein structure prediction
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward
Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements
Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)
Serverification of Molecular Modeling Applications: the Rosetta Online Server that Includes Everyone (ROSIE)
The Rosetta molecular modeling software package provides experimentally
tested and rapidly evolving tools for the 3D structure prediction and
high-resolution design of proteins, nucleic acids, and a growing number of
non-natural polymers. Despite its free availability to academic users and
improving documentation, use of Rosetta has largely remained confined to
developers and their immediate collaborators due to the code's difficulty of
use, the requirement for large computational resources, and the unavailability
of servers for most of the Rosetta applications. Here, we present a unified web
framework for Rosetta applications called ROSIE (Rosetta Online Server that
Includes Everyone). ROSIE provides (a) a common user interface for Rosetta
protocols, (b) a stable application programming interface for developers to add
additional protocols, (c) a flexible back-end to allow leveraging of computer
cluster resources shared by RosettaCommons member institutions, and (d)
centralized administration by the RosettaCommons to ensure continuous
maintenance. This paper describes the ROSIE server infrastructure, a
step-by-step 'serverification' protocol for use by Rosetta developers, and the
deployment of the first nine ROSIE applications by six separate developer
teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance,
Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated
by the number and diversity of these applications, ROSIE offers a general and
speedy paradigm for serverification of Rosetta applications that incurs
negligible cost to developers and lowers barriers to Rosetta use for the
broader biological community. ROSIE is available at
http://rosie.rosettacommons.org
Computational protein design: assessment and applications
Indiana University-Purdue University Indianapolis (IUPUI)Computational protein design aims at designing amino acid sequences that can fold into a target structure and perform a desired function. Many computational design methods have been developed and their applications have been successful during past two decades. However, the success rate of protein design remains too low to be of a useful tool by biochemists whom are not an expert of computational biology. In this dissertation, we first developed novel computational assessment techniques to assess several state-of-the-art computational techniques. We found that significant progresses were made in several important measures by two new scoring functions from RosettaDesign and from OSCAR-design, respectively. We also developed the first machine-learning technique called SPIN that predicts a sequence profile compatible to a given structure with a novel nonlocal energy-based feature. The accuracy of predicted sequences is comparable to RosettaDesign in term of sequence identity to wild type sequences. In the last two application chapters, we have designed self-inhibitory peptides of Escherichia coli methionine aminopeptidase (EcMetAP) and de novo designed barstar. Several peptides were confirmed inhibition of EcMetAP at the micromole-range 50% inhibitory concentration. Meanwhile, the assessment of designed barstar sequences indicates the improvement of OSCAR-design over RosettaDesign
RNA and protein 3D structure modeling: similarities and differences
In analogy to proteins, the function of RNA depends on its structure and dynamics, which are encoded in the linear sequence. While there are numerous methods for computational prediction of protein 3D structure from sequence, there have been very few such methods for RNA. This review discusses template-based and template-free approaches for macromolecular structure prediction, with special emphasis on comparison between the already tried-and-tested methods for protein structure modeling and the very recently developed “protein-like” modeling methods for RNA. We highlight analogies between many successful methods for modeling of these two types of biological macromolecules and argue that RNA 3D structure can be modeled using “protein-like” methodology. We also highlight the areas where the differences between RNA and proteins require the development of RNA-specific solutions
Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10
We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I‐TASSER and QUARK for both free‐modeling (FM) and template‐based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK‐based ab initio models as probes allows the detection of distant‐homology templates which might be ignored by the traditional sequence profile‐based threading alignment algorithms. Further template assembly refinement by I‐TASSER resulted in successful folding of two medium‐sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I‐TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading‐constrained ab initio folding models can consistently improve the quality of the full‐length models as assessed by the GDT‐HA and hydrogen‐bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium‐size proteins (especially beta‐proteins) for nonhomologous targets. Further developments of sensitive fold‐recognition and ab initio folding methods are critical for solving these problems. Proteins 2014; 82(Suppl 2):175–187. © 2013 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/102666/1/prot24341-sup-0001-suppinfo.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/102666/2/prot24341.pd
Recommended from our members
Predicting multibody assembly of proteins
textThis thesis addresses the multi-body assembly (MBA) problem in the context of protein assemblies. [...] In this thesis, we chose the protein assembly domain because accurate and reliable computational modeling, simulation and prediction of such assemblies would clearly accelerate discoveries in understanding of the complexities of metabolic pathways, identifying the molecular basis for normal health and diseases, and in the designing of new drugs and other therapeutics. [...] [We developed] F²Dock (Fast Fourier Docking) which includes a multi-term function which includes both a statistical thermodynamic approximation of molecular free energy as well as several of knowledge-based terms. Parameters of the scoring model were learned based on a large set of positive/negative examples, and when tested on 176 protein complexes of various types, showed excellent accuracy in ranking correct configurations higher (F² Dock ranks the correcti solution as the top ranked one in 22/176 cases, which is better than other unsupervised prediction software on the same benchmark). Most of the protein-protein interaction scoring terms can be expressed as integrals over the occupied volume, boundary, or a set of discrete points (atom locations), of distance dependent decaying kernels. We developed a dynamic adaptive grid (DAG) data structure which computes smooth surface and volumetric representations of a protein complex in O(m log m) time, where m is the number of atoms assuming that the smallest feature size h is [theta](r[subscript max]) where r[subscript max] is the radius of the largest atom; updates in O(log m) time; and uses O(m)memory. We also developed the dynamic packing grids (DPG) data structure which supports quasi-constant time updates (O(log w)) and spherical neighborhood queries (O(log log w)), where w is the word-size in the RAM. DPG and DAG together results in O(k) time approximation of scoring terms where k << m is the size of the contact region between proteins. [...] [W]e consider the symmetric spherical shell assembly case, where multiple copies of identical proteins tile the surface of a sphere. Though this is a restricted subclass of MBA, it is an important one since it would accelerate development of drugs and antibodies to prevent viruses from forming capsids, which have such spherical symmetry in nature. We proved that it is possible to characterize the space of possible symmetric spherical layouts using a small number of representative local arrangements (called tiles), and their global configurations (tiling). We further show that the tilings, and the mapping of proteins to tilings on arbitrary sized shells is parameterized by 3 discrete parameters and 6 continuous degrees of freedom; and the 3 discrete DOF can be restricted to a constant number of cases if the size of the shell is known (in terms of the number of protein n). We also consider the case where a coarse model of the whole complex of proteins are available. We show that even when such coarse models do not show atomic positions, they can be sufficient to identify a general location for each protein and its neighbors, and thereby restricts the configurational space. We developed an iterative refinement search protocol that leverages such multi-resolution structural data to predict accurate high resolution model of protein complexes, and successfully applied the protocol to model gp120, a protein on the spike of HIV and currently the most feasible target for anti-HIV drug design.Computer Science
- …