713,482 research outputs found
Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment
Motivation: The ability to generate massive amounts of sequencing data
continues to overwhelm the processing capability of existing algorithms and
compute infrastructures. In this work, we explore the use of hardware/software
co-design and hardware acceleration to significantly reduce the execution time
of short sequence alignment, a crucial step in analyzing sequenced genomes. We
introduce Shouji, a highly-parallel and accurate pre-alignment filter that
remarkably reduces the need for computationally-costly dynamic programming
algorithms. The first key idea of our proposed pre-alignment filter is to
provide high filtering accuracy by correctly detecting all common subsequences
shared between two given sequences. The second key idea is to design a hardware
accelerator that adopts modern FPGA (Field-Programmable Gate Array)
architectures to further boost the performance of our algorithm.
Results: Shouji significantly improves the accuracy of pre-alignment
filtering by up to two orders of magnitude compared to the state-of-the-art
pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to
three orders of magnitude faster than the equivalent CPU implementation of
Shouji. Using a single FPGA chip, we benchmark the benefits of integrating
Shouji with five state-of-the-art sequence aligners, designed for different
computing platforms. The addition of Shouji as a pre-alignment step reduces the
execution time of the five state-of-the-art sequence aligners by up to 18.8x.
Shouji can be adapted for any bioinformatics pipeline that performs sequence
alignment for verification. Unlike most existing methods that aim to accelerate
sequence alignment, Shouji does not sacrifice any of the aligner capabilities,
as it does not modify or replace the alignment step.
Availability: https://github.com/CMU-SAFARI/ShoujiComment: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz234/5421509,
Bioinformatics Journal 201
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.
BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used
Higher accuracy protein Multiple Sequence Alignment by Stochastic Algorithm
Multiple Sequence Alignment gives insight into evolutionary, structural and functional relationships among the proteins. Here, a novel Protein Alignment by Stochastic Algorithm (PASA) is developed. Evolutionary operators of a genetic algorithm, namely, mutation and selection are utilized in combining the output of two most important sequence alignment programs and then developing an optimized new algorithm. Efficiency of protein alignments is evaluated in terms of Total Column score which is equal to the number of correctly aligned columns between a test alignment and the reference alignment divided by the total number of columns in the reference alignment. The PASA optimizer achieves, on an average, significant better alignment over the well known individual bioinformatics tools. This PASA is statistically the most accurate protein alignment method today. It can have potential applications in drug discovery processes in the biotechnology industry
PhAST : pharmacophore alignment search tool
We developed the Pharmacophore Alignment Search Tool (PhAST), a text-based technique for rapid hit and lead structure searching in large compound databases. For each molecule, a two-dimensional graph of potential pharmacophoric points (PPPs) is created, which has an identical topology as the original molecule with implicit hydrogen atoms. Each vertex is coloured by a symbol representing the corresponding PPP. The vertices of the graph are canonically labelled. The symbols associated with the vertices are combined to a so-called PhAST-Sequence beginning with the vertex with the lowest canonical label. Due to the canonical labelling the created PhAST-Sequence is characteristic for each molecule. For similarity assessment, PhAST-Sequences are compared using the sequence identity in their global pairwise alignment. The alignment score lies between 0 (no similarity) and 1 (identical PhAST-Sequences). In order to use global pairwise sequence alignment, a score matrix for pharmacophoric symbols was developed and gap penalties were optimized. PhAST performed comparably and sometimes superior to other similarity search tools (CATS2D, MOE pharmacophore quadruples) in retrospective virtual screenings using the COBRA collection of drugs and lead structures. Most importantly, the PhAST alignment technique allows for the computation of significance estimates that help prioritize a virtual hit list
- …
