71 research outputs found

    Computational methods for protein structure prediction and next-generation sequencing data analysis

    Get PDF
    With the wide application of next-generation sequencing technologies, the number of protein sequences is increasing exponentially. However, only a tiny portion of proteins have experimentally verified structures. The huge protein sequence-structure gap could be reduced by computational methods including template-based modeling and template-free modeling. Chapter 2 describes a stochastic point cloud sampling method for multi-template protein model generation. The stochastic sampling and simulated annealing protocol in the method has the capability to improve the global quality and reduce atom clashes in protein models. Two popular approaches for improving protein structure prediction include enlarging the sampling space of template-based modeling and integrating template-based modeling with template-free modeling when no good templates or only partial templates can be found for a target protein. Chapters 3 and 4 introduce a large-scale conformation sampling and evaluation system for protein structure prediction which integrates the two methods. Next-generation sequencing of RNAs (RNA-Seq) generates hundreds of millions of short reads. Analyzing these reads is increasingly being used to foster novel discovery in biomedical research. Chapter 5 describes a bioinformatics pipeline for RNA-Seq data analysis, which converts gigabytes of raw RNA-Seq data into kilobytes of valuable biological knowledge through a five-step data mining and knowledge discovery process

    Methods for the refinement of protein structure 3D models

    Get PDF
    The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge

    Rosetta and the journey to predict proteins' structures, 20 years on

    Get PDF
    For two decades, Rosetta has consistently been at the forefront of protein structure prediction. While it has become a very large package comprising programs, scripts, and tools, for different types of macromolecular modelling such as ligand docking, protein-protein docking, protein design, and loop modelling, it started as the implementation of an algorithm for ab initio protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the literature to describe that algorithm and its contribution to the third edition of the community wide Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers have been contributing to deciphering ’the second half of the genetic code’. Although the focus of Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is associated with its fragment-assembly protein structure prediction approach. Following a presentation of the main concepts underpinning its foundation, especially sequence-structure correlation and usage of fragments, we review the main stages of its developments and highlight the milestones it has achieved in terms of protein structure prediction, particularly in CASP

    Predictive and experimental approaches for elucidating protein–protein interactions and quaternary structures

    Get PDF
    The elucidation of protein–protein interactions is vital for determining the function and action of quaternary protein structures. Here, we discuss the difficulty and importance of establishing protein quaternary structure and review in vitro and in silico methods for doing so. Determining the interacting partner proteins of predicted protein structures is very time-consuming when using in vitro methods, this can be somewhat alleviated by use of predictive methods. However, developing reliably accurate predictive tools has proved to be difficult. We review the current state of the art in predictive protein interaction software and discuss the problem of scoring and therefore ranking predictions. Current community-based predictive exercises are discussed in relation to the growth of protein interaction prediction as an area within these exercises. We suggest a fusion of experimental and predictive methods that make use of sparse experimental data to determine higher resolution predicted protein interactions as being necessary to drive forward development

    Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

    Get PDF
    Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)
    • …
    corecore