Time-Space Tradeoffs for Finding a Long Common Substring

Abstract

We consider the problem of finding, given two documents of total length nn, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic O(n)O(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require Θ(n)\Theta(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildh{\o}j (CPM 2013) showed that for n2/3sn1o(1)n^{2/3} \le s \le n^{1-o(1)}, the LCS problem can be solved in O(s)O(s) space and O(n2s)O(\frac{n^2}{s}) time. Kociumaka et al. (ESA 2014) generalized this tradeoff to 1sn1 \leq s \leq n, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length LL of the sought LCS is large. For 1sn1 \leq s \leq n, we show that the LCS problem can be solved in O(s)O(s) space and O~(n2Ls+n)\tilde{O}(\frac{n^2}{L\cdot s}+n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents

    Similar works