25 research outputs found

    Cryo-EM Guided de novo Protein Fold Elucidation

    Get PDF

    BCL::Fold--de novo prediction of complex and large protein topologies by assembly of secondary structure elements.

    Get PDF
    Computational de novo protein structure prediction is limited to small proteins of simple topology. The present work explores an approach to extend beyond the current limitations through assembling protein topologies from idealized α-helices and β-strands. The algorithm performs a Monte Carlo Metropolis simulated annealing folding simulation. It optimizes a knowledge-based potential that analyzes radius of gyration, β-strand pairing, secondary structure element (SSE) packing, amino acid pair distance, amino acid environment, contact order, secondary structure prediction agreement and loop closure. Discontinuation of the protein chain favors sampling of non-local contacts and thereby creation of complex protein topologies. The folding simulation is accelerated through exclusion of flexible loop regions further reducing the size of the conformational search space. The algorithm is benchmarked on 66 proteins with lengths between 83 and 293 amino acids. For 61 out of these proteins, the best SSE-only models obtained have an RMSD100 below 8.0 Å and recover more than 20% of the native contacts. The algorithm assembles protein topologies with up to 215 residues and a relative contact order of 0.46. The method is tailored to be used in conjunction with low-resolution or sparse experimental data sets which often provide restraints for regions of defined secondary structure

    Secondary structure pool statistics for the benchmark proteins.

    No full text
    <p>The table depicts pool Q3 score, %found (percent of native SSEs identified by predictions) and average shifts for the pools generated using secondary structure prediction methods PSIPRED and JUFO for all of the 66 proteins in the benchmark set. The last two rows show the average and the standard deviation for pool agreement score and Q3 measure. The statistics is repeated for the combined pool of PSIPRED and JUFO.</p

    Benchmark set of proteins.

    No full text
    <p>For each of the 64 proteins in the benchmark set, following are displayed: 4 letter code PDB id and 1 letter code chain id, number of amino acids (N<sub>aa</sub>), number of secondary structure elements(N<sub>sse</sub>), number of α-helices (N<sub>α</sub>), number of β-strands (N<sub>β</sub>), contact order (CO), relative contact order (RCO). The left section of the table identified as “original sequence” displays statistics for the full sequence protein, while the “filtered sequence” statistics are calculated only on amino acids that are found in secondary structure elements that satisfy the length criteria; at least 5 residues for α-helices and 3 residues for β-strands.</p

    Amino acid pair distance potentials.

    No full text
    <p>In A the idealized structure of 1ubi with C<sub>β</sub> and H<sub>α</sub>2 atoms is shown with the distances between ILE 32 and LEU 56 (4.7 Å) and between LYS 11 and GLU 34 (8.3 Å). B shows selected amino acid pair distance potentials for Trp-Trp as an example for π-stacking interaction, ILE-LEU as an example for vdW apolar interaction, ARG-GLU as an example for Coulomb attraction, and Arg-Lys as an example for Coulomb repulsion.</p

    BCL::Score—Knowledge Based Energy Potentials for Ranking Protein Models Represented by Idealized Secondary Structure Elements

    Get PDF
    <div><p>The topology of most experimentally determined protein domains is defined by the relative arrangement of secondary structure elements, i.e. α-helices and β-strands, which make up 50–70% of the sequence. Pairing of β-strands defines the topology of β-sheets. The packing of side chains between α-helices and β-sheets defines the majority of the protein core. Often, limited experimental datasets restrain the position of secondary structure elements while lacking detail with respect to loop or side chain conformation. At the same time the regular structure and reduced flexibility of secondary structure elements make these interactions more predictable when compared to flexible loops and side chains. To determine the topology of the protein in such settings, we introduce a tailored knowledge-based energy function that evaluates arrangement of secondary structure elements only. Based on the amino acid C<sub>β</sub> atom coordinates within secondary structure elements, potentials for amino acid pair distance, amino acid environment, secondary structure element packing, β-strand pairing, loop length, radius of gyration, contact order and secondary structure prediction agreement are defined. Separate penalty functions exclude conformations with clashes between amino acids or secondary structure elements and loops that cannot be closed. Each individual term discriminates for native-like protein structures. The composite potential significantly enriches for native-like models in three different databases of 10,000–12,000 protein models in 80–94% of the cases. The corresponding application, “BCL::ScoreProtein,” is available at <a href="http://www.meilerlab.org">www.meilerlab.org</a>.</p> </div

    SSE Fragment packing.

    No full text
    <p>SSE fragments are shown with their geometric packing descriptors. A α<sub>1</sub> and α<sub>2</sub> are orthogonal, if the shortest connection between the main axes is orthogonal. B connection is not orthogonal, since the minimal interface length m cannot be achieved. C θ is the twist angle around the shortest connection – which is equivalent to the dihedral angle between main axis 1 – shortest connection – main axis 2. D ω is the offset from the optimal expected position for a helix-strand interaction, if it is 0°, the helix is on top of the strand, if it is 90°, the helix would interact with the backbone of the strand. ω<sub>1</sub> and ω<sub>2</sub> are the offsets for a strand-strand packing – for omegas close to 90°, it is a strand backbone pairing interaction dominated by hydrogen bond interaction within a sheet, if they are close to 0°, it is dominated by side chain interactions like seen in sheet-sandwiches. E every SSE is represented as multiple fragments and the SSE interaction is described by the list of all fragment interactions, leaving out additional fragments of the longer SSE with suboptimal packing (bottom grey helix fragment).</p

    Weight set for consensus scoring function.

    No full text
    <p>Monte Carlo optimization maximized the enrichment over the Rosetta model set. Loop closure, AA pair and SSE clash weights were set to 500. This weight set was used to calculate the score sum, as used to calculate enrichments for the benchmark set.</p

    Best RMSD100 and CR values for models generated by BCL and Rosetta.

    No full text
    <p>The table lists for all proteins, the best RMSD100 and best CR observed for models generated by BCL and Rosetta. BCL results are presented in 4 columns: SSE-only models using native SSE definitions (BCL<sub>N-SSE</sub>), complete models using native SSE definitions (BCL<sub>N</sub>), SSE-only models using predicted SSE definitions (BCL<sub>P-SSE</sub>), complete models using predicted SSE definitions (BCL<sub>P</sub>). The 5th columns under RMSD100 and CR are for Rosetta models.</p
    corecore