Search CORE

4 research outputs found

Longest Common Prefixes with $k$ -Errors and Applications

Author: A Apostolico
AF Smit
B Bollobás
C Leimeister
C Pizzi
DE Willard
G Kucherov
G Manzini
G Navarro
H Alamro
I Ulitsky
J Fischer
KR Rasmussen
M Alzamel
MA Bender
MI Abouelhoda
N Välimäki
P Eades
R Kolpakov
S Faro
S Grabowski
S Karlin
SV Thankachan
SV Thankachan
SV Thankachan
T Derrien
T Flouri
TH Cormen
U Manber
Publication venue
Publication date: 01/01/2018
Field of study

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we study the problem of computing the longest prefix of each suffix of a given string of length

n

over a constant-sized alphabet that occurs elsewhere in the string with

k

-errors. This problem has already been studied under the Hamming distance model. Our first result is an improvement upon the state-of-the-art average-case time complexity for non-constant

k

and using only linear space under the Hamming distance model. Notably, we show that our technique can be extended to the edit distance model with the same time and space complexities. Specifically, our algorithms run in

\mathcal{O}(n \log^k n \log \log n)

time on average using

\mathcal{O}(n)

space. We show that our technique is applicable to several algorithmic problems in computational biology and elsewhere

arXiv.org e-Print Archive

Crossref

King's Research Portal

Longest common substring with approximately k mismatches

Author: Starikovskaia Tatiana
Publication venue
Publication date: 01/01/2016
Field of study

In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimester and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature, and several algorithms have been suggested. The running time of these algorithms is n^{2-o(1)}, and unfortunately, conditional lower bounds have been shown which imply that there is little hope to improve this bound. In this paper we study a different but closely related problem of the longest common substring with approximately k mismatches and use computational geometry techniques to show that it admits a randomised solution with strongly subquadratic running time

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Explore Bristol Research