Search CORE

8 research outputs found

Faster Online Computation of the Succinct Longest Previous Factor Array

Author: Prezza N.
Rosone G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We consider the problem of computing online the Longest Previous Factor array LPF[1, n] of a text T of length n. For each, LPF[i] stores the length of the longest factor of T with at least two occurrences, one ending at i and the other at a previous position. We present an improvement over the previous solution by Okanohara and Sadakane (ESA 2008): our solution uses less space (compressed instead of succinct) and runs in time, thus being faster by a logarithmic factor. As a by-product, we also obtain the first online algorithm computing the Longest Common Suffix (LCS) array (that is, the LCP array of the reversed text) in time and compressed space. We also observe that the LPF array can be represented succinctly in 2n bits. Our online algorithm computes directly the succinct LPF and LCS arrays

Archivio della Ricerca - Università di Pisa

Longest Unbordered Factor in Quasilinear Time

Author: Kociumaka Tomasz
Kundu Ritu
Mohamed Manal
Pissis Solon P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th International Symposium on Algorithms and Computation (ISAAC 2018)
Publication date: 01/01/2018
Field of study

A border u of a word w is a proper factor of w occurring both as a prefix and as a suffix. The maximal unbordered factor of w is the longest factor of w which does not have a border. Here an O(n log n)-time with high probability (or O(n log n log^2 log n)-time deterministic) algorithm to compute the Longest Unbordered Factor Array of w for general alphabets is presented, where n is the length of w. This array specifies the length of the maximal unbordered factor starting at each position of w. This is a major improvement on the running time of the currently best worst-case algorithm working in O(n^{1.5}) time for integer alphabets [Gawrychowski et al., 2015]

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Substring Complexity in Sublinear Space

Author: and Solon P. Pissis.
Gabriele Fici
Giulia Bernardini
Paweł Gawrychowski
Publication venue: Schloss-Dagstuhl - Leibniz Zentrum für Informatik
Publication date: 28/11/2023
Field of study

Shannon’s entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel–Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Let T be a string of length n. A string attractor of T is a set of positions of T capturing the occurrences of all the substrings of T. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function S_T(k) counting the number of distinct substrings of length k of T, also known as the substring complexity of T. This new measure is defined as δ = sup{S_T(k)/k, k ≥ 1} and lower bounds all the relevant ad hoc measures previously considered. In particular, δ ≤ γ always holds and δ can be computed in O(n) time using Θ(n) working space. Kociumaka et al. showed that one can construct an O(δ log n/(δ))-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question: Can we compute δ efficiently using sublinear working space? It is straightforward to show that in the comparison model, any algorithm computing δ using O(b) space requires Ω(n^{2-o(1)}/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We thus wanted to investigate whether we can indeed match this lower bound. We address this algorithmic challenge by showing the following bounds to compute δ: - O((n3log b)/b2) time using O(b) space, for any b ∈ [1,n], in the comparison model. - Õ(n2/b) time using Õ(b) space, for any b ∈ [√n,n], in the word RAM model. This gives an Õ(n^{1+ε})-time and Õ(n^{1-ε})-space algorithm to compute δ, for any 0 < ε ≤ 1/2. Let us remark that our algorithms compute S_T(k), for all k, within the same complexities

Archivio istituzionale della ricerca - Università di Palermo

Maintaining the size of LZ77 on semi-dynamic strings

Author: Bannai H.
Charalampopoulos Panagiotis
Radoszewski J.
Publication venue: Dagstuhl Publishing
Publication date: 18/06/2024
Field of study

We consider the problem of maintaining the size of the LZ77 factorization of a string S of length at most n under the following operations: (a) appending a given letter to S and (b) deleting the first letter of S. Our main result is an algorithm for this problem with amortized update time Õ(√n). As a corollary, we obtain an Õ(n√n)-time algorithm for computing the most LZ77-compressible rotation of a length-n string - a naive approach for this problem would compute the LZ77 factorization of each possible rotation and would thus take quadratic time in the worst case. We also show an Ω(√n) lower bound for the additive sensitivity of LZ77 with respect to the rotation operation. Our algorithm employs dynamic trees to maintain the longest-previous-factor array information and depends on periodicity-based arguments that bound the number of the required updates and enable their efficient computation

Birkbeck Institutional Research Online

29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

Author: ISAAC <29. 2018, Jiaoxi, Yilan>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen

Computing the Longest Previous Factor

Author: Crochemore Maxime
Ilie Lucian
Iliopoulos Costas,
Kubica Marcin
Rytter Wojciech
Walen Tomasz
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

International audienceThe Longest Previous Factor array gives, for each position i in a string y, the length of the longest factor (substring) of y that occurs both at i and to the left of i in y. The Longest Previous Factor array is central in many text compression techniques as well as in the most efficient algorithms for detecting motifs and repetitions occurring in a text. Computing the Longest Previous Factor array requires usually the Suffix Array and the Longest Common Prefix array. We give the first time-space optimal algorithm that computes the Longest Previous Factor array, given the Suffix Array and the Longest Common Prefix arrays. We also give the first linear-time algorithm that computes the permutation that applied to the Longest Common Prefix array produces the Longest Previous Factor array

Elsevier - Publisher Connector

Crossref

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Computing longest previous factor in linear time and applications

Author: Crochemore M
Ilie L
Publication venue
Publication date: 01/01/2008
Field of study

We give two optimal linear-time algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications of LPF are investigated. They include computing the Lempel-Ziv factorization of a string and detecting all repetitions (runs) in a string in linear time independently of the integer alphabet size. (C) 2007 Elsevier B.V. All rights reserve

CiteSeerX

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM