798 research outputs found

    De Novo Protein Structure Modeling and Energy Function Design

    Get PDF
    The two major challenges in protein structure prediction problems are (1) the lack of an accurate energy function and (2) the lack of an efficient search algorithm. A protein energy function accurately describing the interaction between residues is able to supervise the optimization of a protein conformation, as well as select native or native-like structures from numerous possible conformations. An efficient search algorithm must be able to reduce a conformational space to a reasonable size without missing the native conformation. My PhD research studies focused on these two directions. A protein energy function—the distance and orientation dependent energy function of amino acid key blocks (DOKB), containing a distance term, an orientation term, and a highly packed term—was proposed to evaluate the stability of proteins. In this energy function, key blocks of each amino acids were used to represent each residue; a novel reference state was used to normalize block distributions. The dependent relationship between the orientation term and the distance term was revealed, representing the preference of different orientations at different distances between key blocks. Compared with four widely used energy functions using six general benchmark decoy sets, the DOKB appeared to perform very well in recognizing native conformations. Additionally, the highly packed term in the DOKB played its important role in stabilizing protein structures containing highly packed residues. The cluster potential adjusted the reference state of highly packed areas and significantly improved the recognition of the native conformations in the ig_structal data set. The DOKB is not only an alternative protein energy function for protein structure prediction, but it also provides a different view of the interaction between residues. The top-k search algorithm was optimized to be used for proteins containing both α-helices and β-sheets. Secondary structure elements (SSEs) are visible in cryo-electron microscopy (cryo-EM) density maps. Combined with the SSEs predicted in a protein sequence, it is feasible to determine the topologies referring to the order and direction of the SSEs in the cryo-EM density map with respect to the SSEs in the protein sequence. Our group member Dr. Al Nasr proposed the top-k search algorithm, searching the top-k possible topologies for a target protein. It was the most effective algorithm so far. However, this algorithm only works well for pure a-helix proteins due to the complexity of the topologies of β-sheets. Based on the known protein structures in the Protein Data Bank (PDB), we noticed that some topologies in β-sheets had a high preference; on the contrary, some topologies never appeared. The preference of different topologies of β-sheets was introduced into the optimized top-k search algorithm to adjust the edge weight between nodes. Compared with the previous results, this optimization significantly improved the performance of the top-k algorithm in the proteins containing both α-helices and β-sheets

    Combining Cryo-EM Density Map and Residue Contact for Protein Secondary Structure Topologies

    Get PDF
    Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. A topology of secondary structures defines the mapping between a set of sequence segments and a set of traces of secondary structures in three-dimensional space. In order to enhance accuracy in ranking secondary structure topologies, we explored a method that combines three sources of information: a set of sequence segments in 1D, a set of amino acid contact pairs in 2D, and a set of traces in 3D at the secondary structure level. A test of fourteen cases shows that the accuracy of predicted secondary structures is critical for deriving topologies. The use of significant long-range contact pairs is most effective at enriching the rank of the maximum-match topology for proteins with a large number of secondary structures, if the secondary structure prediction is fairly accurate. It was observed that the enrichment depends on the quality of initial topology candidates in this approach. We provide detailed analysis in various cases to show the potential and challenge when combining three sources of information

    An Effective Computational Method Incorporating Multiple Secondary Structure Predictions in Topology Determination for Cryo-EM Images

    Get PDF
    A key idea in de novo modeling of a medium-resolution density image obtained from cryo-electron microscopy is to compute the optimal mapping between the secondary structure traces observed in the density image and those predicted on the protein sequence. When secondary structures are not determined precisely, either from the image or from the amino acid sequence of the protein, the computational problem becomes more complex. We present an efficient method that addresses the secondary structure placement problem in presence of multiple secondary structure predictions and computes the optimal mapping. We tested the method using 12 simulated images from alpha-proteins and two Cryo-EM images of α-β proteins. We observed that the rank of the true topologies is consistently improved by using multiple secondary structure predictions instead of a single prediction. The results show that the algorithm is robust and works well even when errors/ misses in the predicted secondary structures are present in the image or the sequence. The results also show that the algorithm is efficient and is able to handle proteins with as many as 33 helices

    De Novo Protein Structure Modeling from Cryoem Data Through a Dynamic Programming Algorithm in the Secondary Structure Topology Graph

    Get PDF
    Proteins are the molecules carry out the vital functions and make more than the half of dry weight in every cell. Protein in nature folds into a unique and energetically favorable 3-Dimensional (3-D) structure which is critical and unique to its biological function. In contrast to other methods for protein structure determination, Electron Cryorricroscopy (CryoEM) is able to produce volumetric maps of proteins that are poorly soluble, large and hard to crystallize. Furthermore, it studies the proteins in their native environment. Unfortunately, the volumetric maps generated by current advances in CryoEM technique produces protein maps at medium resolution about (~5 to 10Ã…) in which it is hard to determine the atomic-structure of the protein. However, the resolution of the volumetric maps is improving steadily, and recent works could obtain atomic models at higher resolutions (~3Ã…). De novo protein modeling is the process of building the structure of the protein using its CryoEM volumetric map. Thereupon, the volumetric maps at medium resolution generated by CryoEM technique proposed a new challenge. At the medium resolution, the location and orientation of secondary structure elements (SSE) can be visually and computationally identified. However, the order and direction (called protein topology) of the SSEs detected from the CryoEM volumetric map are not visible. In order to determine the protein structure, the topology of the SSEs has to be figured out and then the backbone can be built. Consequently, the topology problem has become a bottle neck for protein modeling using CryoEM In this dissertation, we focus to establish an effective computational framework to derive the atomic structure of a protein from the medium resolution CryoEM volumetric maps. This framework includes a topology graph component to rank effectively the topologies of the SSEs and a model building component. In order to generate the small subset of candidate topologies, the problem is translated into a layered graph representation. We developed a dynamic programming algorithm (TopoDP) for the new representation to overcome the problem of large search space. Our approach shows the improved accuracy, speed and memory use when compared with existing methods. However, the generating of such set was infeasible using a brute force method. Therefore, the topology graph component effectively reduces the topological space using the geometrical features of the secondary structures through a constrained K-shortest paths method in our layered graph. The model building component involves the bending of a helix and the loop construction using skeleton of the volumetric map. The forward-backward CCD is applied to bend the helices and model the loops

    Intensity-Based Skeletonization of CryoEM Gray-Scale Images Using a True Segmentation-Free Algorithm

    Get PDF
    Cryo-electron microscopy is an experimental technique that is able to produce 3D gray-scale images of protein molecules. In contrast to other experimental techniques, cryo-electron microscopy is capable of visualizing large molecular complexes such as viruses and ribosomes. At medium resolution, the positions of the atoms are not visible and the process cannot proceed. The medium-resolution images produced by cryo-electron microscopy are used to derive the atomic structure of the proteins in de novo modeling. The skeletons of the 3D gray-scale images are used to interpret important information that is helpful in de novo modeling. Unfortunately, not all features of the image can be captured using a single segmentation. In this paper, we present a segmentation-free approach to extract the gray-scale curve-like skeletons. The approach relies on a novel representation of the 3D image, where the image is modeled as a graph and a set of volume trees. A test containing 36 synthesized maps and one authentic map shows that our approach can improve the performance of the two tested tools used in de novo modeling. The improvements were 62 and 13 percent for Gorgon and DP-TOSS, respectively

    Direct correlation analysis improves fold recognition

    Get PDF
    AbstractThe extraction of correlated mutations through the method of direct information (DI) provides predicted contact residue pairs that can be used to constrain the three dimensional structures of proteins. We apply this method to a large set of decoy protein folds consisting of many thousand well-constructed models, only tens of which have the correct fold. We find that DI is able to greatly improve the ranking of the true (native) fold but others still remain high scoring that would be difficult to discard due to small shifts in the core beta sheets

    Template Based Modeling and Structural Refinement of Protein-Protein Interactions.

    Full text link
    Determining protein structures from sequence is a fundamental problem in molecular biology, as protein structure is essential to understanding protein function. In this study, I developed one of the first fully automated pipelines for template based quaternary structure prediction starting from sequence. Two critical steps for template based modeling are identifying the correct homologous structures by threading which generates sequence to structure alignments and refining the initial threading template coordinates closer to the native conformation. I developed SPRING (single-chain-based prediction of interactions and geometries), a monomer threading to dimer template mapping program, which was compared to the dimer co-threading program, COTH, using 1838 non homologous target complex structures. SPRING’s similarity score outperformed COTH in the first place ranking of templates, correctly identifying 798 and 527 interfaces respectively. More importantly the results were found to be complementary and the programs could be combined in a consensus based threading program showing a 5.1% improvement compared to SPRING. Template based modeling requires a structural analog being present in the PDB. A full search of the PDB, using threading and structural alignment, revealed that only 48.7% of the PDB has a suitable template whereas only 39.4% of the PDB has templates that can be identified by threading. In order to circumvent this, I included intramolecular domain-domain interfaces into the PDB library to boost template recognition of protein dimers; the merging of the two classes of interfaces improved recognition of heterodimers by 40% using benchmark settings. Next the template based assembly of protein complexes pipeline, TACOS, was created. The pipeline combines threading templates and domain knowledge from the PDB into a knowledge based energy score. The energy score is integrated into a Monte Carlo sampling simulation that drives the initial template closer to the native topology. The full pipeline was benchmarked using 350 non homologous structures and compared to two state of the art programs for dimeric structure prediction: ZDOCK and MODELLER. On average, TACOS models global and interface structure have a better quality than the models generated by MODELLER and ZDOCK.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135847/1/bgovi_1.pd

    Mechanical Strength of 17 134 Model Proteins and Cysteine Slipknots

    Get PDF
    A new theoretical survey of proteins' resistance to constant speed stretching is performed for a set of 17 134 proteins as described by a structure-based model. The proteins selected have no gaps in their structure determination and consist of no more than 250 amino acids. Our previous studies have dealt with 7510 proteins of no more than 150 amino acids. The proteins are ranked according to the strength of the resistance. Most of the predicted top-strength proteins have not yet been studied experimentally. Architectures and folds which are likely to yield large forces are identified. New types of potent force clamps are discovered. They involve disulphide bridges and, in particular, cysteine slipknots. An effective energy parameter of the model is estimated by comparing the theoretical data on characteristic forces to the corresponding experimental values combined with an extrapolation of the theoretical data to the experimental pulling speeds. These studies provide guidance for future experiments on single molecule manipulation and should lead to selection of proteins for applications. A new class of proteins, involving cystein slipknots, is identified as one that is expected to lead to the strongest force clamps known. This class is characterized through molecular dynamics simulations.Comment: 40 pages, 13 PostScript figure

    Protein physics by advanced computational techniques: conformational sampling and folded state discrimination

    Get PDF
    Proteins are essential parts of organisms and participate in virtually every process within cells. Many proteins are enzymes that catalyze biochemical reactions and are vital to metabolism. Proteins also have structural or mechanical functions, such as actin and myosin in muscle that are in charge of motion and locomotion of cells and organisms. Others proteins are important for transporting materials, cell signaling, immune response, and several other functions. Proteins are the main building blocks of life. A protein is a polymer chain of amino acids whose sequence is defined in a gene: three nucleo type basis specify one out of the 20 natural amino acids. All amino acids possess common structural features. They have an \u3b1-carbon to which an amino group, a carboxyl group, a hydrogen atom and a variable side chain are attached. In a protein, the amino acids are linked together by peptide bonds between the carboxyl and amino groups of adjacent residues..
    • …
    corecore