226 research outputs found

    TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    Get PDF
    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/

    Polyphony: superposition independent methods for ensemble-based drug discovery.

    Get PDF
    BACKGROUND: Structure-based drug design is an iterative process, following cycles of structural biology, computer-aided design, synthetic chemistry and bioassay. In favorable circumstances, this process can lead to the structures of hundreds of protein-ligand crystal structures. In addition, molecular dynamics simulations are increasingly being used to further explore the conformational landscape of these complexes. Currently, methods capable of the analysis of ensembles of crystal structures and MD trajectories are limited and usually rely upon least squares superposition of coordinates. RESULTS: Novel methodologies are described for the analysis of multiple structures of a protein. Statistical approaches that rely upon residue equivalence, but not superposition, are developed. Tasks that can be performed include the identification of hinge regions, allosteric conformational changes and transient binding sites. The approaches are tested on crystal structures of CDK2 and other CMGC protein kinases and a simulation of p38α. Known interaction - conformational change relationships are highlighted but also new ones are revealed. A transient but druggable allosteric pocket in CDK2 is predicted to occur under the CMGC insert. Furthermore, an evolutionarily-conserved conformational link from the location of this pocket, via the αEF-αF loop, to phosphorylation sites on the activation loop is discovered. CONCLUSIONS: New methodologies are described and validated for the superimposition independent conformational analysis of large collections of structures or simulation snapshots of the same protein. The methodologies are encoded in a Python package called Polyphony, which is released as open source to accompany this paper [http://wrpitt.bitbucket.org/polyphony/]

    A minimalist model for the simulation of the structure and dynamics of disordered proteins

    Get PDF
    Proteins are one of the two fundamental classes of biomolecules (together with nucleic acids), covering basically all the functional roles in living sys- tems. In the last fifty years, a big effort was devoted to the computer sim- ulation of the dynamics of these systems, in order to get a better insight in their structure and behavior, and to complement the experimental studies. However, the size of the system, time scale reachable in simulations and ac- curacy of the representation are limited by the computer power. Although the Moore law has ensured - up to now - the exponential increase in time of the latter, currently, simulations with atomic accuracy can address a virus size only on very brief time scales, or very limited portion of the cell, while only single proteins can be represented on the macroscopic time scales. Therefore, in order to study dynamical biological systems, coarse grained models are considered a natural solution to overcome the limits of the atom- istic models. These models represent the system at a lower resolution, re- ducing the number of explicit degrees of freedom of the system and freedom providing a lighter and accelerated dynamics of system. Depending on the level of coarse graining, macroscopic time scales for systems of biologically interesting size can currently be afforded. The hierarchical organization of the protein structure naturally suggest a possible level of coarse graining, namely that of one interacting center (also called ”bead”) per amino-acid, being the latter the basic chemical unit of a protein. Among ”one-bead-models” the subclass of those with the bead placed over Cα emerges as that better representing the conformation of the backbone and the secondary structures. The class of Cα one bead models, also called ”minimalist”, is the focus of this Thesis work. In the last decade a number of minimalist models was developed, all rep- resenting the interactions by means of empirical force fields (FF) consisting of a sum of analytical or numerical terms. Different models differ by the num- ber and composition of the FF terms (more or less phisics-chemistry based), and by the parameterization strategy, which can be based over higher level theories (typically atomistic simulations) or on experimental data (i.e. data set of experimental structures, inclusion of other kind of macroscopic and thermodynamic informations). As a consequence, the model can be more or less general and transferable. Usually, accuracy and transferability are in conflict: the more bias towards a given structure is included, the more that structure will be accurately represented, but the less transferable and predictive is the model. In order to overcome this problem, most of the cur- rently available minimalist models include some a priori knowledge of the secondary or tertiary structures within the parameterization, which can be called a ”partial bias”. Clearly, the general goal is to build a model both accurate and predictive, and therefore, unbiased. This Thesis goal is to make some steps along this route. The chosen strat- egy is to follow a physics based approach, related to the fundamental nature of forces acting within the proteins. Basically, the primary structure of a pro- tein (i.e. sequence and polypeptide chain) is stabilized by covalent chemical bonds, while the secondary structure (e.g. helical or sheet-like structures) is stabilized by specific hydrogen bonds. Higher level structures (tertiary and quaternary) are stabilized by other specific interactions, such as disulphide and salt bridges. The specific aim of this work is to build a general minimalist model to be used for unstructured proteins. Therefore, no hydrogen bonding or other specific interactions are included in the FF, and the model is parameterized based on a data-set of unstructured proteins. This strategy has resulted in a quite general ad unbiased model able to reproduce the structure and dynam- ics of class proteins, namely the “intrinsically disordered proteins” (IDP), which is very interesting per se. In addition it can be considered as a zero- point approximation over which hydrogen bonding and other interactions can be added in order to build models for structured proteins, in rational and physically-based fashion. Besides the primary result (i.e. the model for IDPs, description of the dy- namics of some specific cases, etc) this work has returned interesting insight into the whole class of IDPs. First, due to the absence of stable conforma- tion, the structural data on these proteins are very elusive. Therefore already building the dataset was a particularly hard task, which can be considered a side-result of this work and lead to a deep reconsideration of the definition of secondary structure. Second, these proteins elude one of the paradigms of the biomolecular chemistry, namely the relation between structure and function: they do not have a very well defined structure, but they do have a function. Therefore, a reliable model for this class can help re-defining this paradigm, including into it dynamical information. Finally, for the same reason, this model can shed light on the behavior and function of the unstructured inter- mediates of the folding process for structured proteins

    ORGANIZING FORCES AND CONFORMATIONAL ACCESSIBILITY IN THE UNFOLDED STATE OF PROTEINS

    Get PDF
    For over fifty years, the unfolded state of proteins had been thought to be featureless and random. Experiments by Tanford and Flory confirmed that unfolded proteins possessed the same dimensions as those predicted of a random flight chain in good solvent. In the late eighties and early nineties, however, researchers began to notice structural trends in unfolded proteins. Some experiments showed that the unfolded state was very similar to the native state, while others indicated a conformational preference for the polyproline II helix in unfolded proteins. As a result, a paradox developed. How can unfolded proteins be both random and nonrandom at the same time? Current experiments and most theoretical simulations cannot characterize the unfolded state in high detail, so we have used the simplified hard sphere model of Richards to address this question. By modeling proteins as hard spheres, we can not only determine what interactions are important in the unfolded state of proteins, but we can address the paradox directly by investigating whether nonrandom behavior is in conflict with random coil statistics. Our simulations identify hundreds of disfavored conformations in short peptides, each of which proves that unfolded proteins are not at all random. Some interactions are important for the folded state of proteins as well. For example, we find that an α-helix cannot be followed directly by a β-strand because of steric considerations. The interactions outlined here limit the conformational possibilities of an unfolded protein far beyond what would be expected for a random coil. For a 100-residue protein, we find that approximately 9 orders of magnitude of conformational freedom are lost because of iii local chain organization alone. Furthermore, we show that the existence of this organization is compatible with random coil statistics. Although our simulations cannot settle the controversy surrounding the unfolded state, we can conclude that new methods of characterizing the unfolded state are needed. Since unfolded proteins are not random coils, the methods developed for describing random coils cannot adequately describe the complexities of this diverse structural ensemble

    Unravelling the molecular dynamics of c-MYC’s TAD domain: a journey from simulation optimisation to drug discovery

    Get PDF
    c-MYC, part of the MYC family of transcription factors, is often deregulated in cancer, and since the early 1980’s has been identified as a prime oncogenic factor. Despite much research interest, c-MYC’s structural dynamics remain largely uncharted due to its intrinsic structural disorder. Disordered proteins are challenging to study using solely structural experimental methods, thus lately attention has turned towards the development of reliable in-silico methods to get an accurate molecular description. Molecular Dynamics simulations, commonly and successfully used to study globular proteins, can also be optimised to correctly reproduce natural protein disorder. The simulation results were assessed for convergence and conformational equilibrium, achieved by comparing the c-MYC’s Molecular Dynamics conformational landscape to similar data derived from an abundantly sampled probabilistic distribution. After the preparatory and validation work, the efforts turned to the appraisal of c-MYC’s first 88 amino acids. The revelation of its conformational states and structural dynamics opened the door for drug discovery and proof-of-concept that c-MYC should not be considered ‘undruggable’. Further exploration into the protein first 150 residues, corresponding to its transactivation domain, uncovered important structural dynamics controlled by key phosphodegron residues. Phosphorylation and mutagenesis studies demonstrated how these control mechanisms, which serve to modulate accessibility to crucial regions, are facilitated by isomerisation events within the phosphodegron. Overarchingly, this study substantiates the robustness of well-parameterised computational simulations, and machine learning methods, in uncovering the workings of otherwise difficult to study disordered proteins.Open Acces

    Investigations of the Unique Role of Alanines in the 'Elastin Puzzle' by Solid-State NMR Spectroscopy and Molecular Dynamics Simulations.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Figure 9: Alternative representations of the Ramachandran plot.

    Get PDF
    The Ramachandran plot is important to structural biology as it describes a peptide backbone in the context of its dominant degrees of freedom—the backbone dihedral angles φ and ψ (Ramachandran, Ramakrishnan & Sasisekharan, 1963). Since its introduction, the Ramachandran plot has been a crucial tool to characterize protein backbone features. However, the conformation or twist of a backbone as a function of φ and ψ has not been completely described for both cis and trans backbones. Additionally, little intuitive understanding is available about a peptide’s conformation simply from knowing the φ and ψ values of a peptide (e.g., is the regular peptide defined by φ = ψ =  − 100°  left-handed or right-handed?). This report provides a new metric for backbone handedness (h) based on interpreting a peptide backbone as a helix with axial displacement d and angular displacement θ, both of which are derived from a peptide backbone’s internal coordinates, especially dihedral angles φ, ψ and ω. In particular, h equals sin(θ)d∕|d|, with range [−1, 1] and negative (or positive) values indicating left(or right)-handedness. The metric h is used to characterize the handedness of every region of the Ramachandran plot for both cis (ω = 0°) and trans (ω = 180°) backbones, which provides the first exhaustive survey of twist handedness in Ramachandran (φ, ψ) space. These maps fill in the ‘dead space’ within the Ramachandran plot, which are regions that are not commonly accessed by structured proteins, but which may be accessible to intrinsically disordered proteins, short peptide fragments, and protein mimics such as peptoids. Finally, building on the work of (Zacharias & Knapp, 2013), this report presents a new plot based on d and θ that serves as a universal and intuitive alternative to the Ramachandran plot. The universality arises from the fact that the co-inhabitants of such a plot include every possible peptide backbone including cis and trans backbones. The intuitiveness arises from the fact that d and θ provide, at a glance, numerous aspects of the backbone including compactness, handedness, and planarity

    A Features Analysis Tool For Assessing And Improving Computational Models In Structural Biology

    Get PDF
    The protein-folding problem is to predict, from a protein's amino acid sequence, its folded 3D conformation. State of the art computational models are complex collaboratively maintained prediction software. Like other complex software, they become brittle without support for testing and refactoring. Features analysis, a language of `scientific unit testing', is the visual and quantitative comparison of distributions of features (local geometric measures) sampled from ensembles of native and predicted conformations. To support features analysis I develop a features analysis tool--a modular database framework for extracting and managing sampled feature instance and an exploratory data analysis framework for rapidly comparing feature distributions. In supporting features analysis, the tool supports the creation, tuning, and assessment of computational models, improving protein prediction and design. I demonstrate the features analysis tool through 6 case studies with the Rosetta molecular modeling suite. The first three demonstrate the tool usage mechanics through constructing and checking models. The first evaluates bond angle restraint models when used with the Backrub local sampling heuristic. The second identifies and resolves energy function derivative discontinuities that frustrate gradient-based minimization. The third constructs a model for disulfide bonds. The second three demonstrate using the tool to evaluate and improve how models represent molecular structure. I focus on modeling H-bonds because of their geometric specificity and environmental dependence lead to complex feature distributions. The fourth case study develops a novel functional form for Sp2 acceptor H-bonds. The fifth fits parameters for a refined H-bond model. The sixth combines the refined model with an electrostatics model and harmonizes them with the rest of the energy function. Next, to facilitate assessing model improvements, I develop recovery tests that measure predictive accuracy by asking models to recover native conformations that have been partially randomized. Finally, to demonstrate that the features analysis and recovery test tools support improving protein prediction and design, I evaluated the refined H-bond model and electrostatics model with additional corrections from the Rosetta community. Based on positive results, I recommend a new standard energy function, which has been accepted by the Rosetta community as the largest systematic improvement in nearly a decade.Doctor of Philosoph
    corecore