17 research outputs found

    High-throughput Protein Sequence Alignment on Multi-core Systems

    Get PDF
    Rapid evolution in sequencing technologies results in generating data on an enormous scale. A focal and main challenge in analyzing data at such a large scale is the alignment of the DNA/Protein sequences, whereby reads are compared to the reference sequences. To find similar sequences, alignment algorithms are used to align a query sequence with the database. Alignment algorithms can be utilized to classify the source of a sequence, to discover similarities among the organisms, or to deduce a progenitor connection. A wide range of algorithms for alignment has been developed in recent years.In this paper, an accurate method of accelerating such algorithms using GPUs has been investigated. A Swiss-Prot database has been processed using GPU implemented Smith-Waterman Sequence Alignment Algorithm. The first step in the process generates the alignment scores but not the actual alignment. Various available alignment tools like ssearch2 are then utilized to align the output file generated during the first step.The performance of GPU-accelerated implementation as compared to other techniques is then evaluated for performance /throughput improvement. Swiss-Prot database was aligned using various alignment tools. NVIDIA TESLA K40 GPU is being utilized for generating the results for this research. This implementation achieves the performance of 44.3 Giga cell updates per second (GCUPS), which is 22.9 times better than its implementation on GTX 275. Performance is improved as the workload of sequences of equal length is equally distributed among all the threads on Multiprocessors of GPU

    160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive.</p> <p>Results</p> <p>In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor.</p> <p>Conclusion</p> <p>This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching.</p

    Avoiding Ambiguity and Assessing Uniqueness in Minisatellite Alignment

    Get PDF
    Several algorithms have been suggested for minisatellite alignment. Their time complexity is high -- close to O(n^3) -- due to the necessary reconstruction of duplication histories. We investigate the uniqueness of optimal alignments computed under the common single-copy duplication model. To this extent, it is necessary to avoid ambiguity in the algorithm employed. We re-code the ARLEM algorithm in the form of a grammar, and apply a disambiguation technique which uses a mapping to a canonical representation of minisatellite alignments. Having arrived at a non-ambiguous algorithm this way, we demonstrate that the underlying model -- independent of the algorithm -- gives rise to an exorbitant number of different, co-optimal alignments when applied to real-world data. We conclude that alignment-free methods should be considered for minisatellite comparison

    Online Abelian Pattern Matching

    Get PDF
    Ejaz T, Rahmann S, Stoye J. Online Abelian Pattern Matching. Forschungsberichte der Technischen FakultÀt, Abteilung Informationstechnik / UniversitÀt Bielefeld. Bielefeld: Technische FakultÀt der UniversitÀt Bielefeld; 2008.An abelian pattern describes the set of strings that comprise of the same combination of characters. Given an abelian pattern P and a text T [Epsilon] [Sigma]^n, the task is to find all occurrences of P in T, i.e. all substrings S = T_i...T_j such that the frequency of each character in S matches the specified frequency of that character in P. In this report we present simple online algorithms for abelian pattern matching, and give a lower bound for online algorithms which is [Omega](n)

    DOPA: GPU-based protein alignment using database and memory access optimizations

    Get PDF
    Background Smith-Waterman (S-W) algorithm is an optimal sequence alignment method for biological databases, but its computational complexity makes it too slow for practical purposes. Heuristics based approximate methods like FASTA and BLAST provide faster solutions but at the cost of reduced accuracy. Also, the expanding volume and varying lengths of sequences necessitate performance efficient restructuring of these databases. Thus to come up with an accurate and fast solution, it is highly desired to speed up the S-W algorithm. Findings This paper presents a high performance protein sequence alignment implementation for Graphics Processing Units (GPUs). The new implementation improves performance by optimizing the database organization and reducing the number of memory accesses to eliminate bandwidth bottlenecks. The implementation is called Database Optimized Protein Alignment (DOPA) and it achieves a performance of 21.4 Giga Cell Updates Per Second (GCUPS), which is 1.13 times better than the fastest GPU implementation to date. Conclusions In the new GPU-based implementation for protein sequence alignment (DOPA), the database is organized in equal length sequence sets. This equally distributes the workload among all the threads on the GPU's multiprocessors. The result is an improved performance which is better than the fastest available GPU implementation.MicroelectronicsElectrical Engineering, Mathematics and Computer Scienc

    Structural analysis of aligned RNAs

    Get PDF
    The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at

    Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

    Get PDF
    BACKGROUND: The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n(6))time and O(n(4)) space algorithm by Rivas and Eddy is currently the best available program. RESULTS: We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n(4)) time and O(n(2)) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm

    A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms

    Get PDF
    We extend an hypergraph representation, introduced by Finkelstein and Roytberg, to unify dynamic programming algorithms in the context of RNA folding with pseudoknots. Classic applications of RNA dynamic programming energy minimization, partition function, base-pair probabilities...) are reformulated within this framework, giving rise to very simple algorithms. This reformulation allows one to conceptually detach the conformation space/energy model -- captured by the hypergraph model -- from the specific application, assuming unambiguity of the decomposition. To ensure the latter property, we propose a new combinatorial methodology based on generating functions. We extend the set of generic applications by proposing an exact algorithm for extracting generalized moments in weighted distribution, generalizing a prior contribution by Miklos and al. Finally, we illustrate our full-fledged programme on three exemplary conformation spaces (secondary structures, Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets of algorithms that are either novel or have complexity comparable to classic implementations for minimization and Boltzmann ensemble applications of dynamic programming

    An energy‐aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures

    Get PDF
    Alignment is essential in many areas such as biological, chemical and criminal forensics. The well‐known Smith–Waterman (SW) algorithm is able to retrieve the optimal local alignment with quadratic time and space complexity. There are several implementations that take advantage of computing parallelization, such as manycores, FPGAs or GPUs, in order to reduce the alignment effort. In this research, we adapt, develop and tune the SW algorithm named SWIMM on a heterogeneous platform based on Intel's Xeon and Xeon Phi coprocessor. SWIMM is a free tool available in a public git repository https://github.com/enzorucci/SWIMM. We efficiently exploit data and thread‐level parallelism, reaching up to 380 GCUPS on heterogeneous architecture, 350 GCUPS for the isolated Xeon and 50 GCUPS on Xeon Phi. Despite the heterogeneous implementation obtaining the best performance, it is also the most energy‐demanding. In fact, we also present a trade‐off analysis between performance and power consumption. The greenest configuration is based on an isolated multicore system that exploits AVX2 instruction set architecture reaching 1.5 GCUPS/Watts.Facultad de Informátic
    corecore