Search CORE

130 research outputs found

Distributed Algorithm for Parallel Edit Distance Computation

Author: Sadiq Muhammad Umair
Yousaf Muhammad Murtaza
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 12/01/2021
Field of study

The edit distance is the measure that quantifies the difference between two strings. It is an important concept because it has its usage in many domains such as natural language processing, spell checking, genome matching, and pattern recognition. Edit distance is also known as Levenshtein distance. Sequentially, the edit distance is computed by using dynamic programming based strategy that may not provide results in reasonable time when input strings are large. In this work, a distributed algorithm is presented for parallel edit distance computation. The proposed algorithm is both time and space efficient. It is evaluated on a hybrid setup of distributed and shared memory systems. Results suggest that the proposed algorithm achieves significant performance gain over the existing parallel approach

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Accelerating pairwise sequence alignment on GPUs using the Wavefront Algorithm

Author: Aguado Puig Quim
Publication venue: Universitat Politècnica de Catalunya
Publication date: 19/10/2022
Field of study

Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio, and Nanopore technologies. The recently proposed Wavefront Alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. Notwithstanding the advantages of the WFA algorithm, modern high performance computing (HPC) platforms rely on accelerator-based architectures that exploit parallel computing resources to improve over classical computing CPUs. Hence, a GPU-enabled implementation of the WFA could exploit the hardware resources of modern GPUs and further accelerate sequence alignment in current genome analysis pipelines. This thesis presents two GPU-accelerated implementations based on the WFA for fast pairwise DNA sequence alignment: eWFA-GPU and WFA-GPU. Our first proposal, eWFA-GPU, computes the exact edit-distance alignment between two short sequences (up to a few thousand bases), taking full advantage of the massive parallel capabilities of modern GPUs. We propose a succinct representation of the alignment data that successfully reduces the overall amount of memory required, allowing the exploitation of the fast on-chip memory of a GPU. Our results show that eWFA-GPU outperforms by 3-9X the edit-distance WFA implementation running on a 20 core machine. Compared to other state-of-the-art tools computing the edit-distance, eWFA-GPU is up to 265X faster than CPU tools and up to 56 times faster than other GPU-enabled implementations. Our second contribution, the WFA-GPU tool, extends the work of eWFA-GPU to compute the exact gap-affine distance (i.e., a more general alignment problem) between arbitrary long sequences. In this work, we propose a CPU-GPU co-design capable of performing inter and intra-sequence parallel alignment of multiple sequences, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original WFA implementation between 1.5-7.7X times when computing the alignment path, and between 2.6-16X when computing only the alignment score. Moreover, compared to other state-of-the-art tools, the WFA-GPU is up to 26.7X faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations

UPCommons. Portal del coneixement obert de la UPC

Thread-cooperative, bit-parallel computation of Levenshtein distance on GPU

Author: Chacón de San Baldomero Alejandro
Marco-Sola Santiago
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Approximate string matching is a very important problem in computational biology; it requires the fast computation of string distance as one of its essential components. Myers' bit-parallel algorithm improves the classical dynamic programming approach to Levenshtein distance computation, and offers competitive performance on CPUs. The main challenge when designing an efficient GPU implementation is to expose enough SIMD parallelism while at the same time keeping a relatively small working set for each thread. In this work we implement and optimise a CUDA version of Myers' algorithm suitable to be used as a building block for DNA sequence alignment. We achieve high efficiency by means of a cooperative parallelisation strategy for (1) very-long integer addition and shift operations, and (2) several simultaneous pattern matching tasks. In addition, we explore the performance impact obtained when using features specific to the Kepler architecture. Our results show an overall performance of the order of tera cells updates per second using a single high-end Nvidia GPU, and factor speedups in excess of 20 with respect to a sixteen-core, non-vectorised CPU implementation

Diposit Digital de Documents de la UAB

Lossless seeds for searching short patterns with high error rates

Author: Salson Mikaël
Touzet Hélène
Vroland Christophe
Publication venue: HAL CCSD
Publication date: 01/10/2014
Field of study

International audienceWe address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P , find alllocations in T that differ by at most k errors from P . For that purpose, we propose a filtration algorithm that is based on a novel type of seeds,combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically well-suited for short patterns with a large number of error

HAL - Lille 3

INRIA a CCSD electronic archive server

Detección de duplicados: una guía metodológica

Author: Amón Uribe Iván
Jiménez Claudia
Publication venue: 'Universidad Autonoma de Bucaramanga'
Publication date: 01/12/2010
Field of study

Directory of Open Access Journals

UNAB Revistas Académicas

Duplicate detection: a methodological guide

Author: Amón Uribe Iván
Jiménez Claudia
Publication venue: Universidad Autónoma de Bucaramanga UNAB
Publication date: 01/12/2010
Field of study

Cuando una misma entidad del mundo real se almacena más de una vez, a través de una o varias bases de datos, en tuplas con igual estructura pero sin un identificador único y éstas presentan diferencias en sus valores, se presenta el fenómeno conocido como detección de duplicados. Para esta tarea, se han desarrollado múltiples funciones de similitud las cuales detectan las cadenas de texto que son similares mas no idénticas. En este artículo se propone una guía metodológica para seleccionar entre nueve de estas funciones de similitud (Levenshtein, Brecha Afín, Smith-Waterman, Jaro, Jaro-Winkler, Bi-grams, Tri-grams, Monge-Elkan y SoftTF-IDF) la más adecuada para un caso específico o situación particular, de acuerdo con la naturaleza de los datos que se estén analizando.When the same real-world entity is stored more than once, across one or more several databases, in tuples with the same structure but without a unique identifier and these present differences in their values, the phenomenon known as detection of duplicates. For this task, multiple similarity functions have been developed which they detect text strings that are similar but not identical. This article proposes a methodological guide to selecting among nine of these similarity functions (Levenshtein, Affine Gap, Smith-Waterman, Jaro, Jaro-Winkler, Bi-grams, Tri-grams, Monge-Elkan and SoftTF-IDF) the most suitable for a specific case or situation according to the nature of the data being analyzed

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Proceedings of the Eindhoven FASTAR Days 2004 : Eindhoven, The Netherlands, September 3-4, 2004

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

The Eindhoven FASTAR Days (EFD) 2004 were organized by the Software Construction group of the Department of Mathematics and Computer Science at the Technische Universiteit Eindhoven. On September 3rd and 4th 2004, over thirty participants|hailing from the Czech Republic, Finland, France, The Netherlands, Poland and South Africa|gathered at the Department to attend the EFD. The EFD were organized in connection with the research on finite automata by the FASTAR Research Group, which is centered in Eindhoven and at the University of Pretoria, South Africa. FASTAR (Finite Automata Systems|Theoretical and Applied Research) is an in- ternational research group that aims to lead in all areas related to finite state systems. The work in FASTAR includes both core and applied parts of this field. The EFD therefore focused on the field of finite automata, with an emphasis on practical aspects and applications. Eighteen presentations, mostly on subjects within this field, were given, by researchers as well as students from participating universities and industrial research facilities. This report contains the proceedings of the conference, in the form of papers for twelve of the presentations at the EFD. Most of them were initially reviewed and distributed as handouts during the EFD. After the EFD took place, the papers were revised for publication in these proceedings. We would like to thank the participants for their attendance and presentations, making the EFD 2004 as successful as they were. Based on this success, it is our intention to make the EFD into a recurring event. Eindhoven, December 2004 Loek Cleophas Bruce W. Watso

Pure OAI Repository