16,317 research outputs found
An Exact Algorithm for Side-Chain Placement in Protein Design
Computational protein design aims at constructing novel or improved functions
on the structure of a given protein backbone and has important applications in
the pharmaceutical and biotechnical industry. The underlying combinatorial
side-chain placement problem consists of choosing a side-chain placement for
each residue position such that the resulting overall energy is minimum. The
choice of the side-chain then also determines the amino acid for this position.
Many algorithms for this NP-hard problem have been proposed in the context of
homology modeling, which, however, reach their limits when faced with large
protein design instances.
In this paper, we propose a new exact method for the side-chain placement
problem that works well even for large instance sizes as they appear in protein
design. Our main contribution is a dedicated branch-and-bound algorithm that
combines tight upper and lower bounds resulting from a novel Lagrangian
relaxation approach for side-chain placement. Our experimental results show
that our method outperforms alternative state-of-the art exact approaches and
makes it possible to optimally solve large protein design instances routinely
Protein side-chain placement: probabilistic inference and integer programming methods
The prediction of energetically favorable side-chain conformations is a fundamental element in homology modeling of proteins and the design of novel protein sequences. The space of side-chain conformations can be approximated by a discrete space of probabilistically representative side-chain conformations (called rotamers). The problem is, then, to find a rotamer selection for each amino acid that minimizes a potential energy function. This is called the Global Minimum Energy Conformation (GMEC) problem. This problem is an NP-hard optimization problem. The Dead-End Elimination theorem together with the A* algorithm (DEE/A*) has been successfully applied to this problem. However, DEE fails to converge for some complex instances. In this paper, we explore two alternatives to DEE/A* in solving the GMEC problem. We use a probabilistic inference method, the max-product (MP) belief-propagation algorithm, to estimate (often exactly) the GMEC. We also investigate integer programming formulations to obtain the exact solution. There are known ILP formulations that can be directly applied to the GMEC problem. We review these formulations and compare their effectiveness using CPLEX optimizers. We also present preliminary work towards applying the branch-and-price approach to the GMEC problem. The preliminary results suggest that the max-product algorithm is very effective for the GMEC problem. Though the max-product algorithm is an approximate method, its speed and accuracy are comparable to those of DEE/A* in large side-chain placement problems and may be superior in sequence design.Singapore-MIT Alliance (SMA
Flat-Bottom Strategy for Improved Accuracy in Protein Side-Chain Placements
We present a new strategy for protein side-chain placement that uses flat-bottom potentials for rotamer scoring. The extent of the flat bottom depends on the coarseness of the rotamer library and is optimized for libraries ranging from diversities of 0.2 Å to 5.0 Å. The parameters reported here were optimized for forcefields using Lennard-Jones 12−6 van der Waals potential with DREIDING parameters but are expected to be similar for AMBER, CHARMM, and other forcefields. This Side-Chain Rotamer Excitation Analysis Method is implemented in the SCREAM software package. Similar scoring function strategies should be useful for ligand docking, virtual ligand screening, and protein folding applications
Computational Protein Design Using AND/OR Branch-and-Bound Search
The computation of the global minimum energy conformation (GMEC) is an
important and challenging topic in structure-based computational protein
design. In this paper, we propose a new protein design algorithm based on the
AND/OR branch-and-bound (AOBB) search, which is a variant of the traditional
branch-and-bound search algorithm, to solve this combinatorial optimization
problem. By integrating with a powerful heuristic function, AOBB is able to
fully exploit the graph structure of the underlying residue interaction network
of a backbone template to significantly accelerate the design process. Tests on
real protein data show that our new protein design algorithm is able to solve
many prob- lems that were previously unsolvable by the traditional exact search
algorithms, and for the problems that can be solved with traditional provable
algorithms, our new method can provide a large speedup by several orders of
magnitude while still guaranteeing to find the global minimum energy
conformation (GMEC) solution.Comment: RECOMB 201
Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models
In simple models side chains are often represented implicitly (e.g., by
spin-states) or simplified as one atom. We study side chain effects using
square lattice and tetrahedral lattice models, with explicitly side chains of
two atoms. We distinguish effects due to chirality and effects due to side
chain flexibilities, since residues in proteins are L-residues, and their side
chains adopt different rotameric states. Short chains are enumerated
exhaustively. For long chains, we sample effectively rare events (eg, compact
conformations) and obtain complete pictures of ensemble properties of these
models at all compactness region. We find that both chirality and reduced side
chain flexibility lower the folding entropy significantly for globally compact
conformations, suggesting that they are important properties of residues to
ensure fast folding and stable native structure. This corresponds well with our
finding that natural amino acid residues have reduced effective flexibility, as
evidenced by analysis of rotamer libraries and side chain rotatable bonds. We
further develop a method calculating the exact side-chain entropy for a given
back bone structure. We show that simple rotamer counting often underestimates
side chain entropy significantly, and side chain entropy does not always
correlate well with main chain packing. Among compact backbones with maximum
side chain entropy, helical structures emerges as the dominating
configurations. Our results suggest that side chain entropy may be an important
factor contributing to the formation of alpha helices for compact
conformations.Comment: 16 pages, 15 figures, 2 tables. Accepted by J. Chem. Phy
Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version
Although computationally aligning sequence is a crucial step in the vast
majority of comparative genomics studies our understanding of alignment biases
still needs to be improved. To infer true structural or homologous regions
computational alignments need further evaluation. It has been shown that the
accuracy of aligned positions can drop substantially in particular around gaps.
Here we focus on re-evaluation of score-based alignments with affine gap
penalty costs. We exploit their relationships with pair hidden Markov models
and develop efficient algorithms by which to identify gaps which are
significant in terms of length and multiplicity. We evaluate our statistics
with respect to the well-established structural alignments from SABmark and
find that indel reliability substantially increases with their significance in
particular in worst-case twilight zone alignments. This points out that our
statistics can reliably complement other methods which mostly focus on the
reliability of match positions.Comment: 17 pages, 7 figure
Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles
The stability, activity, and solubility of a protein sequence are determined by a delicate balance of molecular interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate a more thorough analysis, we developed new methods for the design and high-throughput stability determination of combinatorial mutation libraries based on protein design calculations. The application of these methods to the core design of a small model system produced many variants with improved thermodynamic stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and experimentally measured stability values shows clearly that a design procedure need not reproduce experimental data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technology
- …