15,909 research outputs found

    An Exact Algorithm for Side-Chain Placement in Protein Design

    Get PDF
    Computational protein design aims at constructing novel or improved functions on the structure of a given protein backbone and has important applications in the pharmaceutical and biotechnical industry. The underlying combinatorial side-chain placement problem consists of choosing a side-chain placement for each residue position such that the resulting overall energy is minimum. The choice of the side-chain then also determines the amino acid for this position. Many algorithms for this NP-hard problem have been proposed in the context of homology modeling, which, however, reach their limits when faced with large protein design instances. In this paper, we propose a new exact method for the side-chain placement problem that works well even for large instance sizes as they appear in protein design. Our main contribution is a dedicated branch-and-bound algorithm that combines tight upper and lower bounds resulting from a novel Lagrangian relaxation approach for side-chain placement. Our experimental results show that our method outperforms alternative state-of-the art exact approaches and makes it possible to optimally solve large protein design instances routinely

    Protein side-chain placement: probabilistic inference and integer programming methods

    Get PDF
    The prediction of energetically favorable side-chain conformations is a fundamental element in homology modeling of proteins and the design of novel protein sequences. The space of side-chain conformations can be approximated by a discrete space of probabilistically representative side-chain conformations (called rotamers). The problem is, then, to find a rotamer selection for each amino acid that minimizes a potential energy function. This is called the Global Minimum Energy Conformation (GMEC) problem. This problem is an NP-hard optimization problem. The Dead-End Elimination theorem together with the A* algorithm (DEE/A*) has been successfully applied to this problem. However, DEE fails to converge for some complex instances. In this paper, we explore two alternatives to DEE/A* in solving the GMEC problem. We use a probabilistic inference method, the max-product (MP) belief-propagation algorithm, to estimate (often exactly) the GMEC. We also investigate integer programming formulations to obtain the exact solution. There are known ILP formulations that can be directly applied to the GMEC problem. We review these formulations and compare their effectiveness using CPLEX optimizers. We also present preliminary work towards applying the branch-and-price approach to the GMEC problem. The preliminary results suggest that the max-product algorithm is very effective for the GMEC problem. Though the max-product algorithm is an approximate method, its speed and accuracy are comparable to those of DEE/A* in large side-chain placement problems and may be superior in sequence design.Singapore-MIT Alliance (SMA

    Flat-Bottom Strategy for Improved Accuracy in Protein Side-Chain Placements

    Get PDF
    We present a new strategy for protein side-chain placement that uses flat-bottom potentials for rotamer scoring. The extent of the flat bottom depends on the coarseness of the rotamer library and is optimized for libraries ranging from diversities of 0.2 Å to 5.0 Å. The parameters reported here were optimized for forcefields using Lennard-Jones 12−6 van der Waals potential with DREIDING parameters but are expected to be similar for AMBER, CHARMM, and other forcefields. This Side-Chain Rotamer Excitation Analysis Method is implemented in the SCREAM software package. Similar scoring function strategies should be useful for ligand docking, virtual ligand screening, and protein folding applications

    Computational Protein Design Using AND/OR Branch-and-Bound Search

    Full text link
    The computation of the global minimum energy conformation (GMEC) is an important and challenging topic in structure-based computational protein design. In this paper, we propose a new protein design algorithm based on the AND/OR branch-and-bound (AOBB) search, which is a variant of the traditional branch-and-bound search algorithm, to solve this combinatorial optimization problem. By integrating with a powerful heuristic function, AOBB is able to fully exploit the graph structure of the underlying residue interaction network of a backbone template to significantly accelerate the design process. Tests on real protein data show that our new protein design algorithm is able to solve many prob- lems that were previously unsolvable by the traditional exact search algorithms, and for the problems that can be solved with traditional provable algorithms, our new method can provide a large speedup by several orders of magnitude while still guaranteeing to find the global minimum energy conformation (GMEC) solution.Comment: RECOMB 201

    Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models

    Full text link
    In simple models side chains are often represented implicitly (e.g., by spin-states) or simplified as one atom. We study side chain effects using square lattice and tetrahedral lattice models, with explicitly side chains of two atoms. We distinguish effects due to chirality and effects due to side chain flexibilities, since residues in proteins are L-residues, and their side chains adopt different rotameric states. Short chains are enumerated exhaustively. For long chains, we sample effectively rare events (eg, compact conformations) and obtain complete pictures of ensemble properties of these models at all compactness region. We find that both chirality and reduced side chain flexibility lower the folding entropy significantly for globally compact conformations, suggesting that they are important properties of residues to ensure fast folding and stable native structure. This corresponds well with our finding that natural amino acid residues have reduced effective flexibility, as evidenced by analysis of rotamer libraries and side chain rotatable bonds. We further develop a method calculating the exact side-chain entropy for a given back bone structure. We show that simple rotamer counting often underestimates side chain entropy significantly, and side chain entropy does not always correlate well with main chain packing. Among compact backbones with maximum side chain entropy, helical structures emerges as the dominating configurations. Our results suggest that side chain entropy may be an important factor contributing to the formation of alpha helices for compact conformations.Comment: 16 pages, 15 figures, 2 tables. Accepted by J. Chem. Phy

    Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version

    Full text link
    Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.Comment: 17 pages, 7 figure

    Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles

    Get PDF
    The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology
    • …
    corecore