14,850 research outputs found
Avoiding Abelian powers in binary words with bounded Abelian complexity
The notion of Abelian complexity of infinite words was recently used by the
three last authors to investigate various Abelian properties of words. In
particular, using van der Waerden's theorem, they proved that if a word avoids
Abelian -powers for some integer , then its Abelian complexity is
unbounded. This suggests the following question: How frequently do Abelian
-powers occur in a word having bounded Abelian complexity? In particular,
does every uniformly recurrent word having bounded Abelian complexity begin in
an Abelian -power? While this is true for various classes of uniformly
recurrent words, including for example the class of all Sturmian words, in this
paper we show the existence of uniformly recurrent binary words, having bounded
Abelian complexity, which admit an infinite number of suffixes which do not
begin in an Abelian square. We also show that the shift orbit closure of any
infinite binary overlap-free word contains a word which avoids Abelian cubes in
the beginning. We also consider the effect of morphisms on Abelian complexity
and show that the morphic image of a word having bounded Abelian complexity has
bounded Abelian complexity. Finally, we give an open problem on avoidability of
Abelian squares in infinite binary words and show that it is equivalent to a
well-known open problem of Pirillo-Varricchio and Halbeisen-Hungerb\"uhler.Comment: 16 pages, submitte
The distribution of word matches between Markovian sequences with periodic boundary conditions
Word match counts have traditionally been proposed as an alignment-free measure of similarity for biological sequences. The D2 statistic, which simply counts the number of exact word matches between two sequences, is a useful test bed for developing rigorous mathematical results, which can then be extended to more biologically useful measures. The distributional properties of the D2 statistic under the null hypothesis of identically and independently distributed letters have been studied extensively, but no comprehensive study of the D2 distribution for biologically more realistic higher-order Markovian sequences exists. Here we derive exact formulas for the mean and variance of the D2 statistic for Markovian sequences of any order, and demonstrate through Monte Carlo simulations that the entire distribution is accurately characterized by a Pólya-Aeppli distribution for sequence lengths of biological interest. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulas for the mean and variance to be derived. We also carry out a preliminary comparison between the approximate D2 distribution computed with the theoretical mean and variance under a Markovian hypothesis and an empirical D2 distribution from the human genome
The Number of Ternary Words Avoiding Abelian Cubes Grows Exponentially
We show that the number of ternary words of length n avoiding abelian cubes grows
faster than r^n, where r = 2^{1/24}NSERCcs.uwaterloo.ca/journals/JIS/VOL7/Currie/currie18.pd
A note on palindromicity
Two results on palindromicity of bi-infinite words in a finite alphabet are
presented. The first is a simple, but efficient criterion to exclude
palindromicity of minimal sequences and applies, in particular, to the
Rudin-Shapiro sequence. The second provides a constructive method to build
palindromic minimal sequences based upon regular, generic model sets with
centro-symmetric window. These give rise to diagonal tight-binding models in
one dimension with purely singular continuous spectrum.Comment: 12 page
Multidimensional extension of the Morse--Hedlund theorem
A celebrated result of Morse and Hedlund, stated in 1938, asserts that a
sequence over a finite alphabet is ultimately periodic if and only if, for
some , the number of different factors of length appearing in is
less than . Attempts to extend this fundamental result, for example, to
higher dimensions, have been considered during the last fifteen years. Let
. A legitimate extension to a multidimensional setting of the notion of
periodicity is to consider sets of \ZZ^d definable by a first order formula
in the Presburger arithmetic . With this latter notion and using a
powerful criterion due to Muchnik, we exhibit a complete extension of the
Morse--Hedlund theorem to an arbitrary dimension $d$ and characterize sets of
$\ZZ^d$ definable in in terms of some functions counting recurrent
blocks, that is, blocks occurring infinitely often
Universal Compressed Text Indexing
The rise of repetitive datasets has lately generated a lot of interest in
compressed self-indexes based on dictionary compression, a rich and
heterogeneous family that exploits text repetitions in different ways. For each
such compression scheme, several different indexing solutions have been
proposed in the last two decades. To date, the fastest indexes for repetitive
texts are based on the run-length compressed Burrows-Wheeler transform and on
the Compact Directed Acyclic Word Graph. The most space-efficient indexes, on
the other hand, are based on the Lempel-Ziv parsing and on grammar compression.
Indexes for more universal schemes such as collage systems and macro schemes
have not yet been proposed. Very recently, Kempa and Prezza [STOC 2018] showed
that all dictionary compressors can be interpreted as approximation algorithms
for the smallest string attractor, that is, a set of text positions capturing
all distinct substrings. Starting from this observation, in this paper we
develop the first universal compressed self-index, that is, the first indexing
data structure based on string attractors, which can therefore be built on top
of any dictionary-compressed text representation. Let be the size of a
string attractor for a text of length . Our index takes
words of space and supports locating the
occurrences of any pattern of length in
time, for any constant . This is, in particular, the first index
for general macro schemes and collage systems. Our result shows that the
relation between indexing and compression is much deeper than what was
previously thought: the simple property standing at the core of all dictionary
compressors is sufficient to support fast indexed queries.Comment: Fixed with reviewer's comment
- …