Search CORE

37,134 research outputs found

Improved Algorithms for Approximate String Matching (Extended Abstract)

Author: D Lowrance
D Sankoff
Dimitris Papamichail
DR Powell
DS Hirschberg
E Ukkonen
EW Myers
Georgios Papamichail
P Sellers
R Wagner
S Needleman
T Vintsyuk
V Levenstein
VL Arlazarov
W Masek
Publication venue
Publication date: 28/07/2008
Field of study

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm excels also in practice, especially in cases where the two strings compared differ significantly in length. Source code of our algorithm is available at http://www.cs.miami.edu/\~dimitris/edit_distanceComment: 10 page

arXiv.org e-Print Archive

Springer - Publisher Connector

Directory of Open Access Journals

Average-Case Optimal Approximate Circular String Matching

Author: CS Iliopoulos
E Ukkonen
F Fernandes
GM Landau
K Fredriksson
P-H Hsu
T Hirvola
T Lee
WI Chang
Publication venue
Publication date: 24/02/2015
Field of study

Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time O(n(k + log m)/m). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using x and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach

arXiv.org e-Print Archive

CiteSeerX

King's Research Portal

Acceleration of Algorithms for Approximate String Matching

Author: Voženílek Jan
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2008
Field of study

Cílem této bakalářské práce je návrh a implementace architektury pro FPGA čipy akcelerující porovnávání dvou řetězců a jejich ohodnocení na podobnost. Použité postupy vycházejí z bioinformatických algoritmů, především Needleman-Wunsch a Smith-Waterman. Jednotka může díky obecnému návrhu a generickému zpracování v jazyce VHDL porovnávat libovolné sekvence znaků, což je úloha prostupující mnoha oblastmi informatiky od prohledávání databází (kde porovnání na podobnost umožňuje odhalit překlepy) po detekci nevyžádané elektronické pošty - spamu. V závislosti na specifikaci úlohy se může zrychlení oproti běžnému softwarovému řešení pohybovat v řádu stovek až tisíců.The objective of this bachelor's thesis is to design and implement architecture for FPGA chips that accelerates matching of two strings and scoring them for similarity. Used processes come from bioinformatics algorithms, especially Needleman-Wunsch and Smith-Waterman. Due to general design and generic implementation in VHDL the unit is able to compare any sequences of characters, which is a task widely used in many branches of informatics from database searches (where approximate matching allows discovery of spelling errors) to spam detection. Depending on task specification the acceleration speed up against common software solution can reach orders of hundreds or even thousands.

Digital library of Brno University of Technology

National Repository of Grey Literature

Data structures and algorithms for approximate string matching Zvi Galil, Raffaele Giancarlo

Author: Galil Zvi
Giancarlo Raffaele
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1987
Field of study

This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching

Exact string matching algorithms : survey, issues, and future research directions

Author: Gilkar Gulshan
Hakak Saqib
Imran Muhammad
Kamsin Amirrudin
Khan Wazir
Shivakumara Palaiahnakote
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

String matching has been an extensively studied research domain in the past two decades due to its various applications in the fields of text, image, signal, and speech processing. As a result, choosing an appropriate string matching algorithm for current applications and addressing challenges is difficult. Understanding different string matching approaches (such as exact string matching and approximate string matching algorithms), integrating several algorithms, and modifying algorithms to address related issues are also difficult. This paper presents a survey on single-pattern exact string matching algorithms. The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exact string matching algorithms. © 2013 IEEE

Federation ResearchOnline

Fast parallel algorithms for approximate string matching

Author: Jiang Yi
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/1992
Field of study

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX