Search CORE

6 research outputs found

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

A Fast Algorithm for Approximate String Matching on Gene Sequences

Author: A. Cornish-Bowden
G. Navarro
G. Navarro
J. Tarhio
L. Valinsky
N. El-Mabrouk
R.A. Baeza-Yates
R.N. Horspool
R.S. Boyer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref

Approximate Regular Expression Searching with Arbitrary Integer Weights

Author: A. Aho
E. Myers
E. Myers
E. Ukkonen
G. Berry
G. Navarro
G. Navarro
K. Thompson
P. Sellers
S. Wu
S. Wu
V. Glushkov
Publication venue
Publication date: 01/01/2003
Field of study

We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/log_k s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the best previous complexities, but it is much simpler and faster in practice. In our way, we define a new recurrence for approximate searching where the current values depend only on previous values

CiteSeerX

Crossref

Fine-grained Complexity Meets IP = PSPACE

Author: Chen Lijie
Goldwasser Shafi
Lyu Kaifeng
Rothblum Guy N.
Rubinstein Aviad
Publication venue
Publication date: 03/11/2018
Field of study

In this paper we study the fine-grained complexity of finding exact and approximate solutions to problems in P. Our main contribution is showing reductions from exact to approximate solution for a host of such problems. As one (notable) example, we show that the Closest-LCS-Pair problem (Given two sets of strings

A

and

B

, compute exactly the maximum

\textsf{LCS}(a, b)

with

(a, b) \in A \times B

) is equivalent to its approximation version (under near-linear time reductions, and with a constant approximation factor). More generally, we identify a class of problems, which we call BP-Pair-Class, comprising both exact and approximate solutions, and show that they are all equivalent under near-linear time reductions. Exploring this class and its properties, we also show:

\bullet

Under the NC-SETH assumption (a significantly more relaxed assumption than SETH), solving any of the problems in this class requires essentially quadratic time.

\bullet

Modest improvements on the running time of known algorithms (shaving log factors) would imply that NEXP is not in non-uniform

\textsf{NC}^1

\bullet

Finally, we leverage our techniques to show new barriers for deterministic approximation algorithms for LCS. At the heart of these new results is a deep connection between interactive proof systems for bounded-space computations and the fine-grained complexity of exact and approximate solutions to problems in P. In particular, our results build on the proof techniques from the classical IP = PSPACE result

arXiv.org e-Print Archive

Bit-parallel and SIMD alignment algorithms for biological sequence analysis

Author: Loving Joshua
Publication venue
Publication date: 21/11/2017
Field of study

High-throughput next-generation sequencing techniques have hugely decreased the cost and increased the speed of sequencing, resulting in an explosion of sequencing data. This motivates the development of high-efficiency sequence alignment algorithms. In this thesis, I present multiple bit-parallel and Single Instruction Multiple Data (SIMD) algorithms that greatly accelerate the processing of biological sequences. The first chapter describes the BitPAl bit-parallel algorithms for global alignment with general integer scoring, which assigns integer weights for match, mismatch, and insertion/deletion. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations. Bit-parallelism has previously been applied to other pattern matching problems, producing fast algorithms. In timed tests, we show that BitPAl runs 7 - 25 times faster than a standard iterative algorithm. The second part involves two approaches to alignment with substitution scoring, which assigns a potentially different substitution weight to every pair of alphabet characters, better representing the relative rates of different mutations. The first approach extends the existing BitPAl method. The second approach is a new SIMD algorithm that uses partial sums of adjacent score differences. I present a simple partial sum method as well as one that uses parallel scan for additional acceleration. Results demonstrate that these algorithms are significantly faster than existing SIMD dynamic programming algorithms. Finally, I describe two extensions to the partial sums algorithm. The first adds support for affine gap penalty scoring. Affine gap scoring represents the biological likelihood that it is more likely for gaps to be continuous than to be distributed throughout a region by introducing a gap opening penalty and a gap extension penalty. The second extension is an algorithm that uses the partial sums method to calculate the tandem alignment of a pattern against a text sequence using a single pattern copy. Next generation sequencing data provides a wealth of information to researchers. Extracting that information in a timely manner increases the utility and practicality of sequence analysis algorithms. This thesis presents a family of algorithms which provide alignment scores in less time than previous algorithms

Boston University Institutional Repository (OpenBU)