Search CORE

5,444 research outputs found

Locating regions in a sequence under density constraints

Author: Benjamin A. Burton
Boztaş S.
Greenberg R. I.
Huang X.
Lin Y.-L.
Mathias Hiron
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Queensland eSpace

Average-Case Optimal Approximate Circular String Matching

Author: CS Iliopoulos
E Ukkonen
F Fernandes
GM Landau
K Fredriksson
P-H Hsu
T Hirvola
T Lee
WI Chang
Publication venue
Publication date: 24/02/2015
Field of study

Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time O(n(k + log m)/m). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using x and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

PARALLEL PROCESSING OUTCOMES OF E-ABDULRAZZAQ ALGORITHM USING MULTI-CORE TECHNIQUE

Author: Abdul Rashid Nur’Aini
Akram AbdulRazzaq Atheer
Publication venue: University of Information and Technology Communications
Publication date: 24/12/2022
Field of study

The string matching problem is considered one of the substantial problems in the fields of computer science like speech and pattern recognition, signal and image processing, and artificial intelligence (AI). The increase in the speedup of performance is considered an important factor in meeting the growth rate of databases, Subsequently, one of the determinations to address this issue is the parallelization for exact string matching algorithms. In this study, the E-Abdulrazzaq string matching algorithm is chosen to be executed with the multi-core environment utilizing the OpenMP paradigm which can be utilized to decrease the execution time and increase the speedup of the algorithm. The parallelization algorithm got positive results within the parallel execution time, and excellent speeding-up capabilities, in comparison to the successive result. The Protein database showed optimal results in parallel execution time, and when utilizing short and long pattern lengths. The DNA database showed optimal speedup execution when utilizing short and long pattern lengths, while no specific database obtained the worst results

Iraqi Journal for Computers and Informatics