4 research outputs found
Longest Common Prefixes with -Errors and Applications
Although real-world text datasets, such as DNA sequences, are far from being
uniformly random, average-case string searching algorithms perform
significantly better than worst-case ones in most applications of interest. In
this paper, we study the problem of computing the longest prefix of each suffix
of a given string of length over a constant-sized alphabet that occurs
elsewhere in the string with -errors. This problem has already been studied
under the Hamming distance model. Our first result is an improvement upon the
state-of-the-art average-case time complexity for non-constant and using
only linear space under the Hamming distance model. Notably, we show that our
technique can be extended to the edit distance model with the same time and
space complexities. Specifically, our algorithms run in time on average using space. We show that our
technique is applicable to several algorithmic problems in computational
biology and elsewhere
Longest common substring with approximately k mismatches
In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimester and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature, and several algorithms have been suggested. The running time of these algorithms is n^{2-o(1)}, and unfortunately, conditional lower bounds have been shown which imply that there is little hope to improve this bound.
In this paper we study a different but closely related problem of the longest common substring with approximately k mismatches and use computational geometry techniques to show that it admits a randomised solution with strongly subquadratic running time