Search CORE

142,072 research outputs found

Linear pattern matching on sparse suffix trees

Author: Kolpakov Roman
Kucherov Gregory
Starikovskaya Tatiana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2011
Field of study

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to

\log_{\sigma}n

characters (

\sigma

the alphabet size), our index takes

O(n/\log_{\sigma}n)

space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time

O(m+r^2+r\cdot occ)

, where

m

is the length of the pattern,

r

is the actual number of characters stored in a word and

occ

is the number of pattern occurrences

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

The Number of Repetitions in 2D-Strings

Author: Charalampopoulos Panagiotis
Radoszewski Jakub
Rytter Wojciech
Wale? Tomasz
Zuba Wiktor
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual European Symposium on Algorithms (ESA 2020)
Publication date: 01/01/2020
Field of study

The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are

O(n^3)

of them in an

n \times n

2D-string and presented a simple construction giving a lower bound of

\Omega(n^2)

for their number (TCS 2020). We make a significant step towards closing the gap between these bounds by showing that the number of 2D-runs in an

n \times n

2D-string is

O(n^2 \log^2 n)

. In particular, our bound implies that the

O(n^2\log n + \textsf{output})

run-time of the algorithm of Amir et al. for computing 2D-runs is also

O(n^2 \log^2 n)

. We expect this result to allow for exploiting 2D-runs algorithmically in the area of 2D pattern matching. A quartic is a 2D-string composed of

2 \times 2

identical blocks (2D-strings) that was introduced by Apostolico and Brimkov (TCS 2000), where by quartics they meant only primitively rooted quartics, i.e. built of a primitive block. Here our notion of quartics is more general and analogous to that of squares in 1D-strings. Apostolico and Brimkov showed that there are

O(n^2 \log^2 n)

occurrences of primitively rooted quartics in an

n \times n

2D-string and that this bound is attainable. Consequently the number of distinct primitively rooted quartics is

O(n^2 \log^2 n)

. Here, we prove that the number of distinct general quartics is also

O(n^2 \log^2 n)

. This extends the rich combinatorial study of the number of distinct squares in a 1D-string, that was initiated by Fraenkel and Simpson (J. Comb. Theory A 1998), to two dimensions. Finally, we show some algorithmic applications of 2D-runs. (Abstract shortened due to arXiv requirements.)Comment: To appear in the ESA 2020 proceeding

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A linearly computable measure of string complexity

Author: Becher Verónica
Heiber Pablo Ariel
Publication venue: Elsevier B.V.
Publication date: 01/01/2012
Field of study

AbstractWe present a measure of string complexity, called I-complexity, computable in linear time and space. It counts the number of different substrings in a given string. The least complex strings are the runs of a single symbol, the most complex are the de Bruijn strings. Although the I-complexity of a string is not the length of any minimal description of the string, it satisfies many basic properties of classical description complexity. In particular, the number of strings with I-complexity up to a given value is bounded, and most strings of each length have high I-complexity

CiteSeerX

Faster Online Elastic Degenerate String Matching

Author: Aoyama Kotaro
Bannai Hideo
I Tomohiro
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time

Dagstuhl Research Online Publication Server