58 research outputs found

    Assessing Protein Conformational Sampling Methods Based on Bivariate Lag-Distributions of Backbone Angles

    Get PDF
    Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence–structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu.edu/∌madoliat/LagSVD) that can be used to produce informative animations

    Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

    Get PDF
    This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well

    Skewed Factor Models Using Selection Mechanisms

    Get PDF
    Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-t, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset

    Towards Reliable Automatic Protein Structure Alignment

    Full text link
    A variety of methods have been proposed for structure similarity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we propose a method to incorporate global information in obtaining optimal alignments and superpositions. Our method, when applied to optimizing the TM-score and the GDT score, produces significantly better results than current state-of-the-art protein structure alignment tools. Specifically, if the highest TM-score found by TMalign is lower than (0.6) and the highest TM-score found by one of the tested methods is higher than (0.5), there is a probability of (42%) that TMalign failed to find TM-scores higher than (0.5), while the same probability is reduced to (2%) if our method is used. This could significantly improve the accuracy of fold detection if the cutoff TM-score of (0.5) is used. In addition, existing structure alignment algorithms focus on structure similarity alone and simply ignore other important similarities, such as sequence similarity. Our approach has the capacity to incorporate multiple similarities into the scoring function. Results show that sequence similarity aids in finding high quality protein structure alignments that are more consistent with eye-examined alignments in HOMSTRAD. Even when structure similarity itself fails to find alignments with any consistency with eye-examined alignments, our method remains capable of finding alignments highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Information Theory in Molecular Evolution: From Models to Structures and Dynamics

    Get PDF
    This Special Issue collects novel contributions from scientists in the interdisciplinary field of biomolecular evolution. Works listed here use information theoretical concepts as a core but are tightly integrated with the study of molecular processes. Applications include the analysis of phylogenetic signals to elucidate biomolecular structure and function, the study and quantification of structural dynamics and allostery, as well as models of molecular interaction specificity inspired by evolutionary cues

    Finding Similar Protein Structures Efficiently and Effectively

    Get PDF
    To assess the similarities and the differences among protein structures, a variety of structure alignment algorithms and programs have been designed and implemented. We introduce a low-resolution approach and a high-resolution approach to evaluate the similarities among protein structures. Our results show that both the low-resolution approach and the high-resolution approach outperform state-of-the-art methods. For the low-resolution approach, we eliminate false positives through the comparison of both local similarity and remote similarity with little compromise in speed. Two kinds of contact libraries (ContactLib) are introduced to fingerprint protein structures effectively and efficiently. Each contact group from the contact library consists of one local or two remote fragments and is represented by a concise vector. These vectors are then indexed and used to calculate a new combined hit-rate score to identify similar protein structures effectively and efficiently. We tested our ContactLibs on the high-quality protein structure subset of SCOP30, which contains 3,297 protein structures. For each protein structure of the subset, we retrieved its neighbor protein structures from the rest of the subset. The best area under the ROC curve, archived by a ContactLib, is as high as 0.960. This is a significant improvement over 0.747, the best result achieved by the state-of-the-art method, FragBag. For the high-resolution approach, our PROtein STructure Alignment method (PROSTA) relies on and verifies the fact that the optimal protein structure alignment always contains a small subset of aligned residue pairs, called a seed, such that the rotation and translation (ROTRAN), which minimizes the RMSD of the seed, yields both the optimal ROTRAN and the optimal alignment score. Thus, ROTRANs minimizing the RMSDs of small subsets of residues are sampled, and global alignments are calculated directly from the sampled ROTRANs. Moreover, our method incorporates remote information and filters similar ROTRANs (or alignments) by clustering, rather than by an exhaustive method, to overcome the computational inefficiency. Our high-resolution protein structure alignment method, when applied to optimizing the TM-score and the GDT-TS score, produces a significantly better result than state-of-the-art protein structure alignment methods. Specifically, if the highest TM-score found by TM-align is lower than 0.6 and the highest TM-score found by one of the tested methods is higher than 0.5, our alignment method tends to discover better protein structure alignments with (up to 0.21) higher TM-scores. In such cases, TM-align fails to find TM-scores higher than 0.5 with a probability of 42%; however, our alignment method fails the same task with a probability of only 2%. In addition, existing protein structure alignment scoring functions focus on atom coordinate similarity alone and simply ignore other important similarities, such as sequence similarity. Our scoring function has the capacity for incorporating multiple similarities into the scoring function. Our result shows that sequence similarity aids in finding high quality protein structure alignments that are more consistent with HOMSTRAD alignments, which are protein structure alignments examined by human experts. When atom coordinate similarity itself fails to find alignments with any consistency to HOMSTRAD alignments, our scoring function remains capable of finding alignments highly similar to, or even identical to, HOMSTRAD alignments

    To bind or not to bind - dissociation equilibria studied by pulse dipolar EPR

    Get PDF
    Pulse dipolar EPR is an appealing strategy for structural characterisation of complex systems in solution that complements other biophysical techniques. Significantly, the emergence of genetically encoded self-assembling spin labels exploiting exogenously introduced double-histidine motifs in conjunction with Cull chelates offers high precision distance determination in systems non-permissive to thiol- directed spin labelling. However, the non‐covalent CuII coordination approach is vulnerable to low binding‐affinity. Here, an approach is outlined where dissociation constants (KD) are investigated directly from the modulation depths of relaxation‐induced dipolar modulation enhancement (RIDME) EPR experiments applied to the model protein Streptococcus sp. group G. protein G, B1 domain (GB1). This reveals low‐ to sub‐ΌM CuII-chelate KDS under RIDME conditions at cryogenic temperatures. We show the feasibility of exploiting the double‐histidine motif for EPR applications even at sub‐ΌM protein concentrations in orthogonally labelled CuII–nitroxide systems. Additionally, modulation depth quantitation in CuII–CuII RIDME to simultaneously estimate a pair of non- identical independent KDS is addressed. Furthermore, we develop a general speciation model to optimise CuII labelling efficiency, depending upon pairs of identical or disparate KDS and total label concentration. We find the KD estimates are in excellent agreement with previously determined values. We also investigated the vulnerability of binding to both competition from adventitious divalent metal ions, and pH sensitivity. A combination of room-temperature isothermal titration calorimetry (ITC) and CuII-nitroxide RIDME measurements are applied to GB1. Results demonstrate double-histidine spin labelling using CuII-nitrilotriacetic acid (CuII-NTA) is robust against the competitor ligand ZnII-NTA at >1000-fold excess, and high nM binding affinity is retained at acidic and basic pH, despite room- temperature behaviour suggesting a stronger dependence. ."I gratefully acknowledge the BBSRC Eastbio DTP, and the School of Chemistry St Andrews for their financial support. Specifically, the work shown in this thesis was supported by the Biotechnology and Biological Sciences Research Council (BBSRC)."--Acknowledgement

    Computational Methods for Analysis of Data for Conformational and Phase Equilibria of Disordered Proteins

    Get PDF
    Intrinsically disordered proteins and regions (IDPs / IDRs) are a class of proteins with diverse conformational heterogeneity that do not fold into a tertiary structure due to the lack of a native structural state. Consequently, disordered proteins are remarkably flexible and exhibit multivalent properties that enable them to adopt myriad functional roles within the cell such as: signaling transduction, transcription, enzymatic catalysis, translation, and many more. Due to their multivalency, some IDPs undergo monomeric and heterotypic interactions which can drive phase separation. Such IDPs can form membraneless organelles with specific regulatory roles within the cell which include, but are not limited to: RNA storage, neurotransmission, and cell-cycle regulation. However, the driving forces behind these mechanisms are not well understood. Dysregulation of these roles through the introduction of sequence mutations or cellular stress can lead to the formation of protein aggregates that can detrimentally impact cellular function and ability. Thus, IDPs are also implicated in multiple diseases like Type II diabetes, numerous cancers, and several neurodegenerative disorders such as Alzheimer’s and Parkinson’s disease. Therefore, there is keen interest to understand the sequence-determinants of IDPs and characterize properties of their conformational ensembles that inform their function. This thesis is focused on the development and application of computational tools that can characterize the spatiotemporal properties of IDP simulations, as well as classify and identify possible sequence-determinants of phase separation
    • 

    corecore