2,131 research outputs found

    Distance Matrix-Based Approach to Protein Structure Prediction

    Get PDF
    Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [rij2] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance rij is greater or less than a cutoff value rcutoff .We have performed spectral decomposition of the distance matrices D=∑λkVkVTk , in terms of eigenvalues λk and the corresponding eigenvectors vk and found that it contains at most 5 nonzero terms. A dominant eigenvector is proportional to r2 - the square distance of points from the center of mass, with the next three being the principal components of the system of points. By knowing r2 we can approximate a distance matrix of a protein with an expected RMSD value of about 4.5Å. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement http://predictioncenter.org/caspR)

    Collective estimation of multiple bivariate density functions with application to angular-sampling-based protein loop modeling

    Get PDF
    This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well

    Protein Structure Refinement by Optimization

    Get PDF

    Creation, refinement, and evaluation of conformational ensembles of proteins using the Torsional Network Model

    Full text link
    Máster Universitario en Bioinformática y Biología ComputacionalOne of the main limitations of structural bioinformatics lies in the difficulty of properly accounting for the dynamical aspects of proteins, which are often critical to their functional mechanisms. Among the tools developed to deal with this issue, the Torsional Network Model (TNM) relies on internal degrees of freedom (torsion angles of the protein backbone), and can give a description of the thermal fluctuations of a protein structure, as well as generate structural ensembles. However, the TNM is a coarse-grained model that cannot ensure that the newly created conformations are exempt from any structural defects. Therefore, the main hypothesis of this project is that TNM assembly process can be improved. The ability to generate high-quality structural ensembles describing the dynamical properties of a protein would indeed be highly valuable in various applications. In this thesis, we create, evaluate and refine TNM ensembles from a set of reference protein structures defined experimentally (Levin et al., 2007). An approximation used in Bastolla and Dehouck, 2019, is developed: the evaluation is performed by Molprobity analysis, and the refinement is done by SIDEpro. Furthermore, a new approach is taken when refining the ensembles by Energy Minimization (EM). The results show a potential improvement of the TNM ensembles when adjusting the target RMSD to the protein studied; point to a enhancement when using side-chain reconstructions , and to its combination with Energy Minimization as a way to optimize the structure quality. On the other hand, the pros and cons of the followed methodology are discussed, because the use of the available static-protein oriented measures and methods makes specially important to beware of their limitations when applied to the protein-dynamic oriented TNM. Exploring further target RMSD values, adjusting them to specific protein dynamic simulations or replicating the same pipe-line in different data-sets are some of the proposals for future work. Furthermore, taking into account variables like the temperature, the flexibility of the protein, and the estimated optimal RMSD would be interesting for the next studies

    Conformational variability in adenylosuccinate synthetase as revealed by crystal structures of mouse-muscle and Escherichia coli enzymes

    Get PDF
    Adenylosuccinate synthetase governs the committed step of AMP biosynthesis from IMP: the generation of 6-phosphoryl-IMP from GTP and IMP, followed by the formation of adenylosuccinate from 6-phosphoryl-IMP and L-aspartate. Prokaryotes such as Escherichia coli (E. coli) have a single form of adenylosuccinate synthetase but vertebrates have two isozymes of the synthetase. The basic isozyme, which predominates in muscle, participates putatively in the purine nucleotide cycle, has a higher Km for IMP and exhibits substrate inhibition by this nucleotide. Compared to E. coli structures, there are conformational variations in the IMP pocket both in the ligand-free and IMP-ligated structures of the mouse basic isozyme. Furthermore, IMP has alternative modes of binding to the IMP pocket, and can also bind to the GTP pocket. GDP- and IMP-ligated complexes of the mouse muscle and E. coli systems, which differ only in the type of anion (SO42- Cl - acetate) or cation (Mg2+ or Li+) present in their crystallization milieus, exhibit significant conformational differences in active site loops, consistent with the following two rules: (i) IMP requires a bound anion to the beta-phosphoryl pocket of GTP in order to organize the active site. (ii) Mg2+ may bind preferentially to an active site ligated by both GDP (or GTP) and IMP, and does not bind as a stoichiometric complex of Mg2+/GTP. The feedback inhibitor of the synthetase, AMP, depending on the state of active site ligation, behaves as an analogue of IMP or as an analogue of adenylosuccinate. The mouse basic isozyme complex of adenylosuccinate/GDP/Mg2+/sulfate, reveals significant geometric distortions, tight non-bonded contacts, and a probable state of protonation for adenylosuccinate consistent with the formation of a C-6 purine cation. To a first approximation, adenylosuccinate forms from 6-phosphoryl-IMP and L-aspartate by the movement of the purine ring into a stationary alpha-amino group of L-aspartate

    Optimization and machine learning methods for Computational Protein Docking

    Full text link
    Computational Protein Docking (CPD) is defined as determining the stable complex of docked proteins given information about two individual partners, called receptor and ligand. The problem is often formulated as an energy/score minimization where the decision variables are the 6 rigid body transformation variables for the ligand in addition to more variables corresponding to flexibilities in the protein structures. The scoring functions used in CPD are highly nonlinear and nonconvex with a very large number of local minima, making the optimization problem particularly challenging. Consequently, most docking procedures employ a multistage strategy of (i) Global Sampling using a coarse scoring function to identify promising areas followed by (ii) a Refinement stage using more accurate scoring functions and possibly allowing more degrees of freedom. In the first part of this work, the problem of local optimization in the refinement stage is addressed. The goal of local optimization is to remove steric clashes between protein partners and obtain more realistic score values. The problem is formulated as optimization on the space of rigid motions of the ligand. Employing a recently introduced representation of the space of rigid motions as a manifold, a new Riemannian metric is introduced that is closely related to the Root Mean Square Deviation (RMSD) distance measure widely used in Protein Docking. It is argued that the new metric puts rotational and translational variables on equal footing as far local changes of RMSD is concerned. The implications and modifications for gradient-based local optimization algorithms are discussed. In the second part, a new methodology for resampling and refinement of ligand conformations is introduced. The algorithm is a refinement method where the inputs to the algorithm are ensembles of ligand conformations and the goal is to generate new ensembles of refined conformations, closer to the native complex. The algorithm builds upon a previous work and introduces multiple new innovations: Clustering the input conformations, performing dimensionality reduction using Principle Component Analysis (PCA), underestimating the scoring function and resampling and refinement of new conformations. The performance of the algorithm on a comprehensive benchmark of protein complexes is reported. The third part of this work focuses on using machine learning framework for addressing two specific problems in Protein Docking: (i) Constructing a machine learning model in order to predict whether a given receptor and ligand pair interact. This is of significant importance for constructing the so-called protein interaction networks, an critical step in the Drug Discovery process. The success of the algorithm is verified on a benchmark for discrimination between Biological and Crystallographic Dimers. (ii) A ranking scheme for output predictions of a protein docking server is devised. The machine learning model employs the features of the docking server predictions to produce a ranked list with the top ranked predictions having higher probability of being close to the native solution. Two state-of-the-art approaches to the ranking problem are presented and compared in detail and the implications of using the superior approach for a structural docking server is discussed

    Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes

    Get PDF
    We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.Published versio
    corecore