Search CORE

5 research outputs found

Minimal Absent Words in Rooted and Unrooted Trees

Author: B Schieber
C Barton
D Belazzougui
D Belazzougui
F Mignosi
F Mignosi
F Mignosi
G Fici
G Fici
M Béal
M Béal
M Crochemore
M Crochemore
M Crochemore
M-P Béal
MA Bender
P Charalampopoulos
P Charalampopoulos
RM Silva
S Chairungsee
T Shibuya
Y Almirantis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet of cardinality. We show that the set of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality (resp.), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time (resp. assuming an integer alphabet of size polynomial in n

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Constructing Antidictionaries of Long Texts in Output-Sensitive Space

Author: Ayad Lorraine A. K.
Badkobeh Golnaz
Fici Gabriele
Heliou Alice
Pissis Solon P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2020
Field of study

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y1, … , yk over an alphabet Σ, we are asked to compute the set M{y1,…,yk}ℓ of minimal absent words of length at most ℓ of the collection {y1, … , yk}. The set M{y1,…,yk}ℓ contains all the words x such that x is absent from all the words of the collection while there exist i,j, such that the maximal proper suffix of x is a factor of yi and the maximal proper prefix of x is a factor of yj. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. Indeed, the set Myℓ of minimal absent words of a word y is equal to M{y1,…,yk}ℓ for any decomposition of y into a collection of words y1, … , yk such that there is an overlap of length at least ℓ − 1 between any two consecutive words in the collection. This computation generally requires Ω(n) space for n = |y| using any of the plenty available O(n) -time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ∥M{y1,…,yN}ℓ∥=o(n), for all N ∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in set S. For instance, in the human genome, n ≈ 3 × 109 but ∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show that allMy1ℓ,…,M{y1,…,yk}ℓ can be computed in O(kn+∑N=1k∥M{y1,…,yN}ℓ∥) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y1, … , yk} and MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution

Goldsmiths Research Online

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Brunel University Research Archive

HAL-Polytechnique

Archivio istituzionale della ricerca - Università di Palermo

Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Takeda Masayuki
Tsujimaru Yuki
Publication venue
Publication date: 03/07/2023
Field of study

The directed acyclic word graph (DAWG) of a string

y

of length

n

is the smallest (partial) DFA which recognizes all suffixes of

y

with only

O(n)

nodes and edges. In this paper, we show how to construct the DAWG for the input string

y

from the suffix tree for

y

, in

O(n)

time for integer alphabets of polynomial size in

n

. In so doing, we first describe a folklore algorithm which, given the suffix tree for

y

, constructs the DAWG for the reversed string of

y

O(n)

time. Then, we present our algorithm that builds the DAWG for

y

O(n)

time for integer alphabets, from the suffix tree for

y

. We also show that a straightforward modification to our DAWG construction algorithm leads to the first

O(n)

-time algorithm for constructing the affix tree of a given string

y

over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our

O(n)

-time DAWG construction algorithm, we show that the set

\mathsf{MAW}(y)

of all minimal absent words (MAWs) of

y

can be computed in optimal, input- and output-sensitive

O(n + |\mathsf{MAW}(y)|)

time and

O(n)

working space for integer alphabets.Comment: This is an extended version of the paper "Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets" from MFCS 201

arXiv.org e-Print Archive

Internal Shortest Absent Word Queries in Constant Time and Linear Space

Author: Badkobeh Golnaz
Charalampopoulos Panagiotis
Kosolobov Dmitry
Pissis Solon,
Publication venue: HAL CCSD
Publication date: 05/07/2021
Field of study

International audienceGiven a string T of length n over an alphabet Σ ⊂ {1, 2,. .. , n O(1) } of size σ, we are to preprocess T so that given a range [i, j], we can return a representation of a shortest string over Σ that is absent in the fragment T [i] • • • T [j] of T. We present an O(n)-space data structure that answers such queries in constant time and can be constructed in O(n log σ n) time

INRIA a CCSD electronic archive server

Querying and Efficiently Searching Large, Temporal Text Corpora

Author: Willkomm Jens
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 21/10/2021
Field of study

KITopen