Search CORE

23 research outputs found

Words and forbidden factors

Author: Mignosi F.
Restivo A.
Sciortino M.
Publication venue: Elsevier Science B.V.
Publication date: 28/02/2002
Field of study

AbstractGiven a finite or infinite word v, we consider the set M(v) of minimal forbidden factors of v. We show that the set M(v) is of fundamental importance in determining the structure of the word v. In the case of a finite word w we consider two parameters that are related to the size of M(w): the first counts the minimal forbidden factors of w and the second gives the length of the longest minimal forbidden factor of w. We derive sharp upper and lower bounds for both parameters. We prove also that the second parameter is related to the minimal period of the word w. We are further interested to the algorithmic point of view. Indeed, we design linear time algorithm for the following two problems: (i) given w, construct the set M(w) and, conversely, (ii) given M(w), reconstruct the word w. In the case of an infinite word x, we consider the following two functions: gx that counts, for each n, the allowed factors of x of length n and fx that counts, for each n, the minimal forbidden factors of x of length n. We address the following general problem: what information about the structure of x can be derived from the pair (gx,fx)? We prove that these two functions characterize, up to the automorphism exchanging the two letters, the language of factors of each single infinite Sturmian word

Elsevier - Publisher Connector

A Characterization of Bispecial Sturmian Words

Author: A. Carpi
A. Luca de
A. Luca de
E.M. Coven
E.P. Lipatov
F. Mignosi
F. Mignosi
G. Fici
G.H. Hardy
J. Berstel
M. Crochemore
M. Morse
M. Sciortino
M.-P. Béal
S. Dulucq
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A finite Sturmian word w over the alphabet {a,b} is left special (resp. right special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial Sturmian word is a Sturmian word that is both left and right special. We show as a main result that bispecial Sturmian words are exactly the maximal internal factors of Christoffel words, that are words coding the digital approximations of segments in the Euclidean plane. This result is an extension of the known relation between central words and primitive Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the set of Sturmian words.Comment: Accepted to MFCS 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Linear-time Computation of Minimal Absent Words Using Suffix Array

Author: Barton Carl
Heliou Alice
Mouchard Laurent
Pissis Solon P.
Publication venue
Publication date: 01/01/2014
Field of study

An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation provides a fast alternative for measuring approximation in sequence comparison. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an O(n^2)-time and O(n)-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we bridge this unpleasant gap by presenting an O(n)-time and O(n)-space algorithm for computing all minimal absent words based on the construction of suffix arrays. Experimental results using real and synthetic data show that the respective implementation outperforms the one by Pinho et al

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

INRIA a CCSD electronic archive server

PubMed Central

King's Research Portal

HAL-Polytechnique

Optimal Computation of Avoided Words

Author: A Akalin
C Acquisti
C Barton
C Barton
D Belazzougui
DB Searls
F Mignosi
I Rusinov
M Crochemore
P Gawrychowski
RN Mantegna
V Brendel
Publication venue
Publication date: 29/04/2016
Field of study

The deviation of the observed frequency of a word

w

from its expected frequency in a given sequence

x

is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguistic analysis. The value of the standard deviation of

w

, denoted by

std(w)

, effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word

w

of length

k>2

is a

\rho

-avoided word in

x

std(w) \leq \rho

, for a given threshold

\rho < 0

. Notice that such a word may be completely absent from

x

. Hence computing all such words na\"{\i}vely can be a very time-consuming procedure, in particular for large

k

. In this article, we propose an

O(n)

-time and

O(n)

-space algorithm to compute all

\rho

-avoided words of length

k

in a given sequence

x

of length

n

over a fixed-sized alphabet. We also present a time-optimal

O(\sigma n)

-time and

O(\sigma n)

-space algorithm to compute all

\rho

-avoided words (of any length) in a sequence of length

n

over an alphabet of size

\sigma

. Furthermore, we provide a tight asymptotic upper bound for the number of

\rho

-avoided words and the expected length of the longest one. We make available an open-source implementation of our algorithm. Experimental results, using both real and synthetic data, show the efficiency of our implementation

arXiv.org e-Print Archive

Crossref

King's Research Portal

On the Structure of Bispecial Sturmian Words

Author: Fici Gabriele
Publication venue: 'Elsevier BV'
Publication date: 19/11/2013
Field of study

A balanced word is one in which any two factors of the same length contain the same number of each letter of the alphabet up to one. Finite binary balanced words are called Sturmian words. A Sturmian word is bispecial if it can be extended to the left and to the right with both letters remaining a Sturmian word. There is a deep relation between bispecial Sturmian words and Christoffel words, that are the digital approximations of Euclidean segments in the plane. In 1997, J. Berstel and A. de Luca proved that \emph{palindromic} bispecial Sturmian words are precisely the maximal internal factors of \emph{primitive} Christoffel words. We extend this result by showing that bispecial Sturmian words are precisely the maximal internal factors of \emph{all} Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the language of Sturmian words.Comment: arXiv admin note: substantial text overlap with arXiv:1204.167

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Minimal Forbidden Factors of Circular Words

Author: AJ Pinho
C Barton
C Barton
D Belazzougui
F Mignosi
G Fici
M Béal
M Béal
M Crochemore
M Crochemore
M Crochemore
S Chairungsee
Publication venue
Publication date: 01/01/2017
Field of study

Minimal forbidden factors are a useful tool for investigating properties of words and languages. Two factorial languages are distinct if and only if they have different (antifactorial) sets of minimal forbidden factors. There exist algorithms for computing the minimal forbidden factors of a word, as well as of a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an algorithm that, given the trie recognizing a finite antifactorial language

M

, computes a DFA recognizing the language whose set of minimal forbidden factors is

M

. In the same paper, they showed that the obtained DFA is minimal if the input trie recognizes the minimal forbidden factors of a single word. We generalize this result to the case of a circular word. We discuss several combinatorial properties of the minimal forbidden factors of a circular word. As a byproduct, we obtain a formal definition of the factor automaton of a circular word. Finally, we investigate the case of minimal forbidden factors of the circular Fibonacci words.Comment: To appear in Theoretical Computer Scienc

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Suffix conjugates for a class of morphic subshifts

Author: B. Mossé
C. Holton
C. Holton
F. Mignosi
F. Mignosi
J. Shallit
J.D. Currie
K. Klouda
V. Canterini
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2013
Field of study

Let A be a finite alphabet and f: A^* --> A^* be a morphism with an iterative fixed point f^\omega(\alpha), where \alpha{} is in A. Consider the subshift (X, T), where X is the shift orbit closure of f^\omega(\alpha) and T: X --> X is the shift map. Let S be a finite alphabet that is in bijective correspondence via a mapping c with the set of nonempty suffixes of the images f(a) for a in A. Let calS be a subset S^N be the set of infinite words s = (s_n)_{n\geq 0} such that \pi(s):= c(s_0)f(c(s_1)) f^2(c(s_2))... is in X. We show that if f is primitive and f(A) is a suffix code, then there exists a mapping H: calS --> calS such that (calS, H) is a topological dynamical system and \pi: (calS, H) --> (X, T) is a conjugacy; we call (calS, H) the suffix conjugate of (X, T). In the special case when f is the Fibonacci or the Thue-Morse morphism, we show that the subshift (calS, T) is sofic, that is, the language of calS is regular

arXiv.org e-Print Archive

Crossref

Cyclic Complexity of Words

Author: Cassaigne Julien
Fici Gabriele
Sciortino Marinella
Zamboni Luca Q.
Publication venue
Publication date: 28/06/2016
Field of study

We introduce and study a complexity function on words

c_x(n),

called \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of length

n

of an infinite word

x.

We extend the well-known Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words of different slopes. We prove that if

x

is a Sturmian word and

y

is a word having the same cyclic complexity of

x,

then up to renaming letters,

x

and

y

have the same set of factors. In particular,

y

is also Sturmian of slope equal to that of

x.

Since

c_x(n)=1

for some

n\geq 1

implies

x

is periodic, it is natural to consider the quantity

\liminf_{n\rightarrow \infty} c_x(n).

We show that if

x

is a Sturmian word, then

\liminf_{n\rightarrow \infty} c_x(n)=2.

We prove however that this is not a characterization of Sturmian words by exhibiting a restricted class of Toeplitz words, including the period-doubling word, which also verify this same condition on the limit infimum. In contrast we show that, for the Thue-Morse word

t

\liminf_{n\rightarrow \infty} c_t(n)=+\infty.

Comment: To appear in Journal of Combinatorial Theory, Series

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

Archivio istituzionale della ricerca - Università di Palermo

HAL - UPEC / UPEM

Correlations of minimal forbidden factors of the Fibonacci word

Author: Rampersad Narad
Wiebe Max
Publication venue
Publication date: 13/09/2023
Field of study

u

and

v

are two words, the correlation of

u

over

v

is a binary word that encodes all possible overlaps between

u

and

v

. This concept was introduced by Guibas and Odlyzko as a key element of their method for enumerating the number of words of length

n

over a given alphabet that avoid a given set of forbidden factors. In this paper we characterize the pairwise correlations between the minimal forbidden factors of the infinite Fibonacci word.Comment: 11 page

arXiv.org e-Print Archive

Minimal Absent Words in Rooted and Unrooted Trees

Author: B Schieber
C Barton
D Belazzougui
D Belazzougui
F Mignosi
F Mignosi
F Mignosi
G Fici
G Fici
M Béal
M Béal
M Crochemore
M Crochemore
M Crochemore
M-P Béal
MA Bender
P Charalampopoulos
P Charalampopoulos
RM Silva
S Chairungsee
T Shibuya
Y Almirantis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet of cardinality. We show that the set of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality (resp.), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time (resp. assuming an integer alphabet of size polynomial in n

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo