142,072 research outputs found

    Linear pattern matching on sparse suffix trees

    Get PDF
    Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to logσn\log_{\sigma}n characters (σ\sigma the alphabet size), our index takes O(n/logσn)O(n/\log_{\sigma}n) space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time O(m+r2+rocc)O(m+r^2+r\cdot occ), where mm is the length of the pattern, rr is the actual number of characters stored in a word and occocc is the number of pattern occurrences

    The Number of Repetitions in 2D-Strings

    Get PDF
    The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are O(n3)O(n^3) of them in an n×nn \times n 2D-string and presented a simple construction giving a lower bound of Ω(n2)\Omega(n^2) for their number (TCS 2020). We make a significant step towards closing the gap between these bounds by showing that the number of 2D-runs in an n×nn \times n 2D-string is O(n2log2n)O(n^2 \log^2 n). In particular, our bound implies that the O(n2logn+output)O(n^2\log n + \textsf{output}) run-time of the algorithm of Amir et al. for computing 2D-runs is also O(n2log2n)O(n^2 \log^2 n). We expect this result to allow for exploiting 2D-runs algorithmically in the area of 2D pattern matching. A quartic is a 2D-string composed of 2×22 \times 2 identical blocks (2D-strings) that was introduced by Apostolico and Brimkov (TCS 2000), where by quartics they meant only primitively rooted quartics, i.e. built of a primitive block. Here our notion of quartics is more general and analogous to that of squares in 1D-strings. Apostolico and Brimkov showed that there are O(n2log2n)O(n^2 \log^2 n) occurrences of primitively rooted quartics in an n×nn \times n 2D-string and that this bound is attainable. Consequently the number of distinct primitively rooted quartics is O(n2log2n)O(n^2 \log^2 n). Here, we prove that the number of distinct general quartics is also O(n2log2n)O(n^2 \log^2 n). This extends the rich combinatorial study of the number of distinct squares in a 1D-string, that was initiated by Fraenkel and Simpson (J. Comb. Theory A 1998), to two dimensions. Finally, we show some algorithmic applications of 2D-runs. (Abstract shortened due to arXiv requirements.)Comment: To appear in the ESA 2020 proceeding

    A linearly computable measure of string complexity

    Get PDF
    AbstractWe present a measure of string complexity, called I-complexity, computable in linear time and space. It counts the number of different substrings in a given string. The least complex strings are the runs of a single symbol, the most complex are the de Bruijn strings. Although the I-complexity of a string is not the length of any minimal description of the string, it satisfies many basic properties of classical description complexity. In particular, the number of strings with I-complexity up to a given value is bounded, and most strings of each length have high I-complexity

    Faster Online Elastic Degenerate String Matching

    Get PDF
    An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time
    corecore