142,072 research outputs found
Linear pattern matching on sparse suffix trees
Packing several characters into one computer word is a simple and natural way
to compress the representation of a string and to speed up its processing.
Exploiting this idea, we propose an index for a packed string, based on a {\em
sparse suffix tree} \cite{KU-96} with appropriately defined suffix links.
Assuming, under the standard unit-cost RAM model, that a word can store up to
characters ( the alphabet size), our index takes
space, i.e. the same space as the packed string itself.
The resulting pattern matching algorithm runs in time ,
where is the length of the pattern, is the actual number of characters
stored in a word and is the number of pattern occurrences
The Number of Repetitions in 2D-Strings
The notions of periodicity and repetitions in strings, and hence these of
runs and squares, naturally extend to two-dimensional strings. We consider two
types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a
2D-version of squares in standard strings). Amir et al. introduced 2D-runs,
showed that there are of them in an 2D-string and
presented a simple construction giving a lower bound of for their
number (TCS 2020). We make a significant step towards closing the gap between
these bounds by showing that the number of 2D-runs in an 2D-string
is . In particular, our bound implies that the run-time of the algorithm of Amir et al. for computing
2D-runs is also . We expect this result to allow for
exploiting 2D-runs algorithmically in the area of 2D pattern matching.
A quartic is a 2D-string composed of identical blocks
(2D-strings) that was introduced by Apostolico and Brimkov (TCS 2000), where by
quartics they meant only primitively rooted quartics, i.e. built of a primitive
block. Here our notion of quartics is more general and analogous to that of
squares in 1D-strings. Apostolico and Brimkov showed that there are occurrences of primitively rooted quartics in an
2D-string and that this bound is attainable. Consequently the number of
distinct primitively rooted quartics is . Here, we prove that
the number of distinct general quartics is also . This extends
the rich combinatorial study of the number of distinct squares in a 1D-string,
that was initiated by Fraenkel and Simpson (J. Comb. Theory A 1998), to two
dimensions.
Finally, we show some algorithmic applications of 2D-runs. (Abstract
shortened due to arXiv requirements.)Comment: To appear in the ESA 2020 proceeding
A linearly computable measure of string complexity
AbstractWe present a measure of string complexity, called I-complexity, computable in linear time and space. It counts the number of different substrings in a given string. The least complex strings are the runs of a single symbol, the most complex are the de Bruijn strings. Although the I-complexity of a string is not the length of any minimal description of the string, it satisfies many basic properties of classical description complexity. In particular, the number of strings with I-complexity up to a given value is bounded, and most strings of each length have high I-complexity
Faster Online Elastic Degenerate String Matching
An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time
- …