37,134 research outputs found
Improved Algorithms for Approximate String Matching (Extended Abstract)
The problem of approximate string matching is important in many different
areas such as computational biology, text processing and pattern recognition. A
great effort has been made to design efficient algorithms addressing several
variants of the problem, including comparison of two strings, approximate
pattern identification in a string or calculation of the longest common
subsequence that two strings share.
We designed an output sensitive algorithm solving the edit distance problem
between two strings of lengths n and m respectively in time
O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance
between the two strings. This worst-case time bound sets the quadratic factor
of the algorithm independent of the longest string length and improves existing
theoretical bounds for this problem. The implementation of our algorithm excels
also in practice, especially in cases where the two strings compared differ
significantly in length. Source code of our algorithm is available at
http://www.cs.miami.edu/\~dimitris/edit_distanceComment: 10 page
Average-Case Optimal Approximate Circular String Matching
Approximate string matching is the problem of finding all factors of a text t
of length n that are at a distance at most k from a pattern x of length m.
Approximate circular string matching is the problem of finding all factors of t
that are at a distance at most k from x or from any of its rotations. In this
article, we present a new algorithm for approximate circular string matching
under the edit distance model with optimal average-case search time O(n(k + log
m)/m). Optimal average-case search time can also be achieved by the algorithms
for multiple approximate string matching (Fredriksson and Navarro, 2004) using
x and its rotations as the set of multiple patterns. Here we reduce the
preprocessing time and space requirements compared to that approach
Acceleration of Algorithms for Approximate String Matching
Cílem této bakalářské práce je návrh a implementace architektury pro FPGA čipy akcelerující porovnávání dvou řetězců a jejich ohodnocení na podobnost. Použité postupy vycházejí z bioinformatických algoritmů, především Needleman-Wunsch a Smith-Waterman. Jednotka může díky obecnému návrhu a generickému zpracování v jazyce VHDL porovnávat libovolné sekvence znaků, což je úloha prostupující mnoha oblastmi informatiky od prohledávání databází (kde porovnání na podobnost umožňuje odhalit překlepy) po detekci nevyžádané elektronické pošty - spamu. V závislosti na specifikaci úlohy se může zrychlení oproti běžnému softwarovému řešení pohybovat v řádu stovek až tisíců.The objective of this bachelor's thesis is to design and implement architecture for FPGA chips that accelerates matching of two strings and scoring them for similarity. Used processes come from bioinformatics algorithms, especially Needleman-Wunsch and Smith-Waterman. Due to general design and generic implementation in VHDL the unit is able to compare any sequences of characters, which is a task widely used in many branches of informatics from database searches (where approximate matching allows discovery of spelling errors) to spam detection. Depending on task specification the acceleration speed up against common software solution can reach orders of hundreds or even thousands.
Data structures and algorithms for approximate string matching Zvi Galil, Raffaele Giancarlo
This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching
Exact string matching algorithms : survey, issues, and future research directions
String matching has been an extensively studied research domain in the past two decades due to its various applications in the fields of text, image, signal, and speech processing. As a result, choosing an appropriate string matching algorithm for current applications and addressing challenges is difficult. Understanding different string matching approaches (such as exact string matching and approximate string matching algorithms), integrating several algorithms, and modifying algorithms to address related issues are also difficult. This paper presents a survey on single-pattern exact string matching algorithms. The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exact string matching algorithms. © 2013 IEEE
Fast and Compact Regular Expression Matching
We study 4 problems in string matching, namely, regular expression matching,
approximate regular expression matching, string edit distance, and subsequence
indexing, on a standard word RAM model of computation that allows
logarithmic-sized words to be manipulated in constant time. We show how to
improve the space and/or remove a dependency on the alphabet size for each
problem using either an improved tabulation technique of an existing algorithm
or by combining known algorithms in a new way
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
- …