Time-Space Tradeoffs for Finding a Long Common Substring

Ben-Nun, Stav; Golan, Shay; Kociumaka, Tomasz; Kraus, Matan

Time-Space Tradeoffs for Finding a Long Common Substring

Authors: Stav Ben-Nun
Shay Golan
Tomasz Kociumaka
Matan Kraus
Publication date: 1 January 2020
Publisher: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Doi

Abstract

We consider the problem of finding, given two documents of total length

n

, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic

O(n)

-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require

\Theta(n)

space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildh{\o}j (CPM 2013) showed that for

n^{2/3} \le s \le n^{1-o(1)}

, the LCS problem can be solved in

O(s)

space and

O(\frac{n^2}{s})

time. Kociumaka et al. (ESA 2014) generalized this tradeoff to

1 \leq s \leq n

, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length

L

of the sought LCS is large. For

1 \leq s \leq n

, we show that the LCS problem can be solved in

O(s)

space and

\tilde{O}(\frac{n^2}{L\cdot s}+n)

time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

DROPS Dagstuhl Research Online Publication Server

oai:drops-oai.dagstuhl.de:1213...

Last time updated on 23/06/2020