38,903 research outputs found

    Aligning Multiple Sequences with Genetic Algorithm

    Get PDF
    The alignment of biological sequences is a crucial tool in molecular biology and genome analysis. It helps to build a phylogenetic tree of related DNA sequences and also to predict the function and structure of unknown protein sequences by aligning with other sequences whose function and structure is already known. However, finding an optimal multiple sequence alignment takes time and space exponential with the length or number of sequences increases. Genetic Algorithms (GAs) are strategies of random searching that optimize an objective function which is a measure of alignment quality (distance) and has the ability for exploratory search through the solution space and exploitation of current results

    Towards Reliable Automatic Protein Structure Alignment

    Full text link
    A variety of methods have been proposed for structure similarity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we propose a method to incorporate global information in obtaining optimal alignments and superpositions. Our method, when applied to optimizing the TM-score and the GDT score, produces significantly better results than current state-of-the-art protein structure alignment tools. Specifically, if the highest TM-score found by TMalign is lower than (0.6) and the highest TM-score found by one of the tested methods is higher than (0.5), there is a probability of (42%) that TMalign failed to find TM-scores higher than (0.5), while the same probability is reduced to (2%) if our method is used. This could significantly improve the accuracy of fold detection if the cutoff TM-score of (0.5) is used. In addition, existing structure alignment algorithms focus on structure similarity alone and simply ignore other important similarities, such as sequence similarity. Our approach has the capacity to incorporate multiple similarities into the scoring function. Results show that sequence similarity aids in finding high quality protein structure alignments that are more consistent with eye-examined alignments in HOMSTRAD. Even when structure similarity itself fails to find alignments with any consistency with eye-examined alignments, our method remains capable of finding alignments highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Docking by structural similarity at protein-protein interfaces

    Get PDF
    Rapid accumulation of experimental data on protein-protein complexes drives the paradigm shift in protein docking from ‘traditional,’ template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for up to 20% of known protein-protein interactions. When highly homologous templates for the target complex are not available, but the structure of the target monomers is known, docking by local structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from co-crystallized protein-protein complexes in PDB. The partial structure alignment algorithm was validated on the Dockground benchmark sets. The optimal performance of the partial (interface) structure alignment was achieved with the interface residues defined by 12Å distance across the interface. Overall, the partial structural alignment yielded more accurate models than the full structure alignment. Most templates identified by the partial structural alignment had low sequence identity to the target, which makes them hard to detect by sequence-based methods. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structural alignment and template free docking success rate significantly surpassing that of the free docking alone

    Docking by structural similarity at protein-protein interfaces

    Get PDF
    Rapid accumulation of experimental data on protein-protein complexes drives the paradigm shift in protein docking from ‘traditional,’ template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for up to 20% of known protein-protein interactions. When highly homologous templates for the target complex are not available, but the structure of the target monomers is known, docking by local structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from co-crystallized protein-protein complexes in PDB. The partial structure alignment algorithm was validated on the Dockground benchmark sets. The optimal performance of the partial (interface) structure alignment was achieved with the interface residues defined by 12Å distance across the interface. Overall, the partial structural alignment yielded more accurate models than the full structure alignment. Most templates identified by the partial structural alignment had low sequence identity to the target, which makes them hard to detect by sequence-based methods. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structural alignment and template free docking success rate significantly surpassing that of the free docking alone

    MODELING PROTEIN INTERACTIONS THROUGH STRUCTURE ALIGNMENT

    Get PDF
    Rapid accumulation of the experimental data on protein-protein complexes drives the paradigm shift in protein docking from "traditional" template free approaches to template based techniques. Homology docking algorithms based on sequence similarity between target and template complexes can account for ~ 20% of known protein-protein interactions. When homologous templates for the target complex are not available, but the structure of the target monomers is known, docking through structural alignment may provide an adequate solution. Such an algorithm was developed based on the structural comparison of monomers to co-crystallized interfaces. A library of the interfaces was generated from the biological units. The success of the structure alignment of the interfaces depends on the way the interface is defined in terms of its structural content. We performed a systematic large-scale study to find the optimal definition/size of the interface for the structure alignment-based docking applications. The performance was the best when the interface was defined with a distance cutoff of 12 Ã…. The structure alignment protocol was validated, for both full and partial alignment, on the DOCKGROUND benchmark sets. Both protocols performed equally for higher-accuracy models (i-RMSD &le 5 Ã…). Overall, the partial structure alignment yielded more acceptable models than the full structure alignment (86 acceptable models were provided by partial structure alignment only, compared to 31 by full structure alignment only). Most templates identified by the partial structure alignment had very low sequence identity to targets and such templates were hard to detect by sequence-based methods. Detailed analysis of the models obtained for 372 test cases concluded that templates for higher-accuracy models often shared not only local but also global structural similarity with the targets. However, interface similarity even in these cases was more prominent, reflected in more accurate models yielded by partial structure alignment. Conservation of protein-protein interfaces was observed in very diverse proteins. For example, target complexes shared interface structural similarity not only with hetero- and homo-complexes but also, in few cases, with crystal packing interfaces. The results indicate that the structure alignment techniques provide a much needed addition to the docking arsenal, with the combined structure alignment and template free docking success rate significantly surpassing that of the free docking alone

    Algorithm engineering for optimal alignment of protein structure distance matrices

    Get PDF
    Protein structural alignment is an important problem in computational biology. In this paper, we present first successes on provably optimal pairwise alignment of protein inter-residue distance matrices, using the popular Dali scoring function. We introduce the structural alignment problem formally, which enables us to express a variety of scoring functions used in previous work as special cases in a unified framework. Further, we propose the first mathematical model for computing optimal structural alignments based on dense inter-residue distance matrices. We therefore reformulate the problem as a special graph problem and give a tight integer linear programming model. We then present algorithm engineering techniques to handle the huge integer linear programs of real-life distance matrix alignment problems. Applying these techniques, we can compute provably optimal Dali alignments for the very first time

    Multiple structure alignment and consensus identification for proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins.</p> <p>Results</p> <p>Experimental results show that the algorithm converges quite rapidly and generates consensus structures that are visually similar to the input proteins. A comparison with other coordinate-based alignment algorithms (MAMMOTH and MATT) shows that the proposed algorithm is competitive in terms of speed and the sizes of the conserved regions discovered in an extensive benchmark dataset derived from the HOMSTRAD and SABmark databases.</p> <p>The algorithm has been implemented in C++ and can be downloaded from the project's web page. Alternatively, the algorithm can be used via a web server which makes it possible to align protein structures by uploading files from local disk or by downloading protein data from the RCSB Protein Data Bank.</p> <p>Conclusions</p> <p>An algorithm is presented to compute a multiple structure alignment for a set of proteins, together with their consensus structure. Experimental results show its effectiveness in terms of the quality of the alignment and computational cost.</p
    • …
    corecore