38,903 research outputs found
Aligning Multiple Sequences with Genetic Algorithm
The alignment of biological sequences is a crucial
tool in molecular biology and genome analysis. It helps to build
a phylogenetic tree of related DNA sequences and also to predict
the function and structure of unknown protein sequences by
aligning with other sequences whose function and structure is
already known. However, finding an optimal multiple sequence
alignment takes time and space exponential with the length or
number of sequences increases. Genetic Algorithms (GAs) are
strategies of random searching that optimize an objective
function which is a measure of alignment quality (distance) and
has the ability for exploratory search through the solution space
and exploitation of current results
Towards Reliable Automatic Protein Structure Alignment
A variety of methods have been proposed for structure similarity calculation,
which are called structure alignment or superposition. One major shortcoming in
current structure alignment algorithms is in their inherent design, which is
based on local structure similarity. In this work, we propose a method to
incorporate global information in obtaining optimal alignments and
superpositions. Our method, when applied to optimizing the TM-score and the GDT
score, produces significantly better results than current state-of-the-art
protein structure alignment tools. Specifically, if the highest TM-score found
by TMalign is lower than (0.6) and the highest TM-score found by one of the
tested methods is higher than (0.5), there is a probability of (42%) that
TMalign failed to find TM-scores higher than (0.5), while the same probability
is reduced to (2%) if our method is used. This could significantly improve the
accuracy of fold detection if the cutoff TM-score of (0.5) is used.
In addition, existing structure alignment algorithms focus on structure
similarity alone and simply ignore other important similarities, such as
sequence similarity. Our approach has the capacity to incorporate multiple
similarities into the scoring function. Results show that sequence similarity
aids in finding high quality protein structure alignments that are more
consistent with eye-examined alignments in HOMSTRAD. Even when structure
similarity itself fails to find alignments with any consistency with
eye-examined alignments, our method remains capable of finding alignments
highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Docking by structural similarity at protein-protein interfaces
Rapid accumulation of experimental data on protein-protein complexes drives the paradigm shift in protein docking from ‘traditional,’ template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for up to 20% of known protein-protein interactions. When highly homologous templates for the target complex are not available, but the structure of the target monomers is known, docking by local structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from co-crystallized protein-protein complexes in PDB. The partial structure alignment algorithm was validated on the Dockground benchmark sets. The optimal performance of the partial (interface) structure alignment was achieved with the interface residues defined by 12Å distance across the interface. Overall, the partial structural alignment yielded more accurate models than the full structure alignment. Most templates identified by the partial structural alignment had low sequence identity to the target, which makes them hard to detect by sequence-based methods. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structural alignment and template free docking success rate significantly surpassing that of the free docking alone
Docking by structural similarity at protein-protein interfaces
Rapid accumulation of experimental data on protein-protein complexes drives the paradigm shift in protein docking from ‘traditional,’ template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for up to 20% of known protein-protein interactions. When highly homologous templates for the target complex are not available, but the structure of the target monomers is known, docking by local structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from co-crystallized protein-protein complexes in PDB. The partial structure alignment algorithm was validated on the Dockground benchmark sets. The optimal performance of the partial (interface) structure alignment was achieved with the interface residues defined by 12Å distance across the interface. Overall, the partial structural alignment yielded more accurate models than the full structure alignment. Most templates identified by the partial structural alignment had low sequence identity to the target, which makes them hard to detect by sequence-based methods. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structural alignment and template free docking success rate significantly surpassing that of the free docking alone
MODELING PROTEIN INTERACTIONS THROUGH STRUCTURE ALIGNMENT
Rapid accumulation of the experimental data on protein-protein complexes drives the paradigm shift in protein docking from "traditional" template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for ~ 20% of known protein-protein interactions. When homologous templates for the target complex are not available, but the structure of the target monomers is known, docking through structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from the biological units. The success of the structure alignment of the interfaces depends on the way the interface is defined in terms of its structural content. We performed a systematic large-scale study to find the optimal definition/size of the interface for the structure alignment-based docking applications. The performance was the best when the interface was defined with a distance cutoff of 12 Ã…. The structure alignment protocol was validated, for both full and partial alignment, on the DOCKGROUND benchmark sets. Both protocols performed equally for higher-accuracy models (i-RMSD &le 5 Ã…). Overall, the partial structure alignment yielded more acceptable models than the full structure alignment (86 acceptable models were provided by partial structure alignment only, compared to 31 by full structure alignment only). Most templates identified by the partial structure alignment had very low sequence identity to targets and such templates were hard to detect by sequence-based methods. Detailed analysis of the models obtained for 372 test cases concluded that templates for higher-accuracy models often shared not only local but also global structural similarity with the targets. However, interface similarity even in these cases was more prominent, reflected in more accurate models yielded by partial structure alignment. Conservation of protein-protein interfaces was observed in very diverse proteins. For example, target complexes shared interface structural similarity not only with hetero- and homo-complexes but also, in few cases, with crystal packing interfaces. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structure alignment and template free docking success rate significantly surpassing that of the free docking alone
Algorithm engineering for optimal alignment of protein structure distance matrices
Protein structural alignment is an important problem in computational
biology. In this paper, we present first successes on provably optimal pairwise
alignment of protein inter-residue distance matrices, using the popular Dali
scoring function. We introduce the structural alignment problem formally, which
enables us to express a variety of scoring functions used in previous work as
special cases in a unified framework. Further, we propose the first
mathematical model for computing optimal structural alignments based on dense
inter-residue distance matrices. We therefore reformulate the problem as a
special graph problem and give a tight integer linear programming model. We
then present algorithm engineering techniques to handle the huge integer linear
programs of real-life distance matrix alignment problems. Applying these
techniques, we can compute provably optimal Dali alignments for the very first
time
Multiple structure alignment and consensus identification for proteins
<p>Abstract</p> <p>Background</p> <p>An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins.</p> <p>Results</p> <p>Experimental results show that the algorithm converges quite rapidly and generates consensus structures that are visually similar to the input proteins. A comparison with other coordinate-based alignment algorithms (MAMMOTH and MATT) shows that the proposed algorithm is competitive in terms of speed and the sizes of the conserved regions discovered in an extensive benchmark dataset derived from the HOMSTRAD and SABmark databases.</p> <p>The algorithm has been implemented in C++ and can be downloaded from the project's web page. Alternatively, the algorithm can be used via a web server which makes it possible to align protein structures by uploading files from local disk or by downloading protein data from the RCSB Protein Data Bank.</p> <p>Conclusions</p> <p>An algorithm is presented to compute a multiple structure alignment for a set of proteins, together with their consensus structure. Experimental results show its effectiveness in terms of the quality of the alignment and computational cost.</p
- …