Search CORE

4,907 research outputs found

MaxSSmap: A GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

Author: Roshan Usman
Turki Turki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap's accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10\% to 30\% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code is freely available from http://www.cs.njit.edu/usman/MaxSSmap

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Parallel Smith-Waterman Algorithm for Gene Sequencing

Author: Deepa. B. C., Nagaveni. V
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2015
Field of study

Smith-Waterman Algorithm represents a highly robust and efficient parallel computing system development for biological gene sequence. The research work here gives a deep understanding and knowledge transfer about exiting approach for gene sequencing and alignment using Smith-waterman their strength and weaknesses. Smith-Waterman algorithm calculates the local alignment of two given sequences used to identify similar RNA, DNA and protein segments. To identify the enhanced local alignments of biological gene pairs Smith-Waterman algorithm uses dynamic programming approach. It is proficient in finding the optimal local alignment considering the given scoring system. DOI: 10.17762/ijritcc2321-8169.150515

International Journal on Recent and Innovation Trends in Computing and Communication

Revisiting the Speed-versus-Sensitivity Tradeoff in Pairwise Sequence Search

Author: Aji Ashwin M.
Feng Wu-chun
Publication venue
Publication date: 01/01/2008
Field of study

The Smith-Waterman algorithm is a dynamic programming method for determining optimal local alignments between nucleotide or protein sequences. However, it suffers from quadratic time and space complexity. As a result, many algorithmic and architectural enhancements have been proposed to solve this problem, but at the cost of reduced sensitivity in the algorithms or signiﬁcant expense in hardware, respectively. Hence, there exists a need to evaluate the tradeoffs between the different solutions. This motivation, coupled with the lack of an evaluation metric to quantify these tradeoffs leads us to formally deﬁne and quantify the sensitivity of homology search methods so that tradeoffs between sequence-search solutions can be evaluated in a quantitative manner. As an example, though the BLAST algorithm executes signiﬁcantly faster than Smith-Waterman, we ﬁnd that BLAST misses 80% of the signiﬁcant sequence alignments. This paper then presents a highly efﬁcient parallelization of the Smith-Waterman algorithm on the Cell Broadband Engine, a novel hybrid multicore architecture that drives the PlayStation 3 (PS3) game consoles, and emulates BLAST by repeatedly executing the parallelized Smith-Waterman algorithm to search for a query in a given sequence database. Through an innovative mapping of the optimal Smith-Waterman algorithm onto a cluster of PlayStation 3 nodes, our implementation delivers a 10-fold speed-up over a high-end multicore architecture and an 88-fold speed-up over a non-accelerated PS3. Finally, we compare the performance of our implementation of the Smith-Waterman algorithm to that of BLAST and the canonical Smith-Waterman implementation, based on a combination of three factors — execution time (speed), sensitivity, and the actual cost of de-ploying each solution. In the end, our parallelized Smith-Waterman algorithm approaches the speed of BLAST while maintaining ideal sensitivity and achieving low cost through the use of PlayStation 3 game consoles

Computer Science Technical Reports @Virginia Tech

OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases

Author: Botella Juan Guillermo
De Giusti Armando Eduardo
García Sénchez Carlos
Naiouf Marcelo
Prieto-Matias Manuel
Rucci Enzo
Publication venue
Publication date: 08/10/2019
Field of study

The well-known Smith–Waterman algorithm is a high-sensitivity method for local sequence alignment. Unfortunately, the Smith–Waterman algorithm has quadratic time complexity, which makes it computationally demanding for large protein databases. In this paper, we present OSWALD, a portable, fully functional and general implementation to accelerate Smith–Waterman database searches in heterogeneous platforms based on Altera’s FPGA. OSWALD exploits OpenMP multithreading and SIMD computing through SSE and AVX2 extensions on the host while taking advantage of pipeline and vectorial parallelism by way of OpenCL on the FPGAs. Performance evaluations on two different heterogeneous architectures with real amino acid datasets show that OSWALD is competitive in comparison with other top-performing Smith–Waterman implementations, attaining up to 442 GCUPS peak with the best GCUPS/watts ratio.First published June 30, 2016. Article available in: Vol. 32, Issue 3, 2018.Facultad de Informátic

Computer Based Test Using the Fisher-Yates Shuffle and Smith Waterman Algorithm

Author: Aini Nuru
Aini Nuru
Cahyani Laili
Dellia Prita
Effindi Muhamad Afif
Risnasari Medika
Publication venue: 'Knowledge E'
Publication date: 02/06/2021
Field of study

Tests are used to determine a person’s level of understanding of a subject. The inhibiting factors in tests are less varied questions, questions with insufficient difficulty, subjective assessments, and the length of time in their correction. This research aimed to develop a Computer Based Test (CBT) application. The type of questions in this CBT are multiple choice and essays. This CBT employs categorization of questions, randomization of the questions, and automatic assessment. Questions were categorized manually based on Bloom’s Taxonomy of a lecture. Then the randomization process was carried out using the Fisher-Yates Shuffle algorithm for each question category. The Smith Waterman algorithm was used to automatically assess the essay-type questions. The steps of the Smith Waterman algorithm were preprocessing, data comparison using Smith Waterman, and percentage similarities conversion to test scores. The results of the study showed that the CBT application was able to randomize questions using the Fisher-Yates Shuffle algorithm and automatically assess answers using the Smith Waterman algorithm. RMSE was used to measure of the accuracy of the Smith Waterman algorithm: a value of 1.86 was obtained. Keywords: Computer based test, assessment, Fisher-Yates Shuffle, Smith Waterma

Neliti

Crossref

KnE Publishing Platform

SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications

Author: Garrison Erik
Lee Wan-Ping
Marth Gabor T.
Zhao Mengyao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Summary: The Smith Waterman (SW) algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools, but current implementations are either designed as monolithic protein database searching tools or are embedded into other tools. To facilitate easy integration of the fast Single Instruction Multiple Data (SIMD) SW algorithm into third party software, we wrote a C/C++ library, which extends Farrars Striped SW (SSW) to return alignment information in addition to the optimal SW score. Availability: SSW is available both as a C/C++ software library, as well as a stand alone alignment tool wrapping the librarys functionality at https://github.com/mengyao/Complete- Striped-Smith-Waterman-Library Contact: [email protected]: 3 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases

Author: Botella Juan Guillermo
De Giusti Armando Eduardo
García Sénchez Carlos
Naiouf Marcelo
Prieto-Matias Manuel
Rucci Enzo
Publication venue
Publication date: 30/06/2016
Field of study

State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms

Author: Botella Guillermo
De Giusti Armando Eduardo
García Sánchez Carlos
Naiouf Marcelo
Prieto-Matías Manuel
Rucci Enzo
Wong Ka-Chun
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 07/09/2020
Field of study

Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri

Servicio de Difusión de la Creación Intelectual