Search CORE

14,850 research outputs found

Avoiding Abelian powers in binary words with bounded Abelian complexity

Author: Cassaigne Julien
Richomme Gwénaël
Saari Kalle
Zamboni Luca Q.
Publication venue
Publication date: 14/05/2010
Field of study

The notion of Abelian complexity of infinite words was recently used by the three last authors to investigate various Abelian properties of words. In particular, using van der Waerden's theorem, they proved that if a word avoids Abelian

k

-powers for some integer

k

, then its Abelian complexity is unbounded. This suggests the following question: How frequently do Abelian

k

-powers occur in a word having bounded Abelian complexity? In particular, does every uniformly recurrent word having bounded Abelian complexity begin in an Abelian

k

-power? While this is true for various classes of uniformly recurrent words, including for example the class of all Sturmian words, in this paper we show the existence of uniformly recurrent binary words, having bounded Abelian complexity, which admit an infinite number of suffixes which do not begin in an Abelian square. We also show that the shift orbit closure of any infinite binary overlap-free word contains a word which avoids Abelian cubes in the beginning. We also consider the effect of morphisms on Abelian complexity and show that the morphic image of a word having bounded Abelian complexity has bounded Abelian complexity. Finally, we give an open problem on avoidability of Abelian squares in infinite binary words and show that it is equivalent to a well-known open problem of Pirillo-Varricchio and Halbeisen-Hungerb\"uhler.Comment: 16 pages, submitte

arXiv.org e-Print Archive

HAL-UJM

HAL AMU

Hal-Diderot

The distribution of word matches between Markovian sequences with periodic boundary conditions

Author: Burden Conrad J
Foret Sylvain
Leopardi Paul
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2014
Field of study

Word match counts have traditionally been proposed as an alignment-free measure of similarity for biological sequences. The D2 statistic, which simply counts the number of exact word matches between two sequences, is a useful test bed for developing rigorous mathematical results, which can then be extended to more biologically useful measures. The distributional properties of the D2 statistic under the null hypothesis of identically and independently distributed letters have been studied extensively, but no comprehensive study of the D2 distribution for biologically more realistic higher-order Markovian sequences exists. Here we derive exact formulas for the mean and variance of the D2 statistic for Markovian sequences of any order, and demonstrate through Monte Carlo simulations that the entire distribution is accurately characterized by a Pólya-Aeppli distribution for sequence lengths of biological interest. The approach is novel in that Markovian dependency is defined for sequences with periodic boundary conditions, and this enables exact analytic formulas for the mean and variance to be derived. We also carry out a preliminary comparison between the approximate D2 distribution computed with the theoretical mean and variance under a Markovian hypothesis and an empirical D2 distribution from the human genome

PubMed Central

The Australian National University

The Number of Ternary Words Avoiding Abelian Cubes Grows Exponentially

Author: Aberkane Ali
Currie James
Rampersad Narad
Publication venue
Publication date: 19/06/2004
Field of study

We show that the number of ternary words of length n avoiding abelian cubes grows faster than r^n, where r = 2^{1/24}NSERCcs.uwaterloo.ca/journals/JIS/VOL7/Currie/currie18.pd

WinnSpace Repository

A note on palindromicity

Author: Baake Michael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

Two results on palindromicity of bi-infinite words in a finite alphabet are presented. The first is a simple, but efficient criterion to exclude palindromicity of minimal sequences and applies, in particular, to the Rudin-Shapiro sequence. The second provides a constructive method to build palindromic minimal sequences based upon regular, generic model sets with centro-symmetric window. These give rise to diagonal tight-binding models in one dimension with purely singular continuous spectrum.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Multidimensional extension of the Morse--Hedlund theorem

Author: Durand Fabien
Rigo Michel
Publication venue
Publication date: 01/01/2012
Field of study

A celebrated result of Morse and Hedlund, stated in 1938, asserts that a sequence

x

over a finite alphabet is ultimately periodic if and only if, for some

n

, the number of different factors of length

n

appearing in

x

is less than

n+1

. Attempts to extend this fundamental result, for example, to higher dimensions, have been considered during the last fifteen years. Let

d\ge 2

. A legitimate extension to a multidimensional setting of the notion of periodicity is to consider sets of \ZZ^d definable by a first order formula in the Presburger arithmetic . With this latter notion and using a powerful criterion due to Muchnik, we exhibit a complete extension of the Morse--Hedlund theorem to an arbitrary dimension $d$ and characterize sets of $\ZZ^d$ definable in in terms of some functions counting recurrent blocks, that is, blocks occurring infinitely often

arXiv.org e-Print Archive

CiteSeerX

Open Repository and Bibliography - Liège

Hal-Diderot

Universal Compressed Text Indexing

Author: Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 06/09/2018
Field of study

The rise of repetitive datasets has lately generated a lot of interest in compressed self-indexes based on dictionary compression, a rich and heterogeneous family that exploits text repetitions in different ways. For each such compression scheme, several different indexing solutions have been proposed in the last two decades. To date, the fastest indexes for repetitive texts are based on the run-length compressed Burrows-Wheeler transform and on the Compact Directed Acyclic Word Graph. The most space-efficient indexes, on the other hand, are based on the Lempel-Ziv parsing and on grammar compression. Indexes for more universal schemes such as collage systems and macro schemes have not yet been proposed. Very recently, Kempa and Prezza [STOC 2018] showed that all dictionary compressors can be interpreted as approximation algorithms for the smallest string attractor, that is, a set of text positions capturing all distinct substrings. Starting from this observation, in this paper we develop the first universal compressed self-index, that is, the first indexing data structure based on string attractors, which can therefore be built on top of any dictionary-compressed text representation. Let

\gamma

be the size of a string attractor for a text of length

n

. Our index takes

O(\gamma\log(n/\gamma))

words of space and supports locating the

occ

occurrences of any pattern of length

m

O(m\log n + occ\log^{\epsilon}n)

time, for any constant

\epsilon>0

. This is, in particular, the first index for general macro schemes and collage systems. Our result shows that the relation between indexing and compression is much deeper than what was previously thought: the simple property standing at the core of all dictionary compressors is sufficient to support fast indexed queries.Comment: Fixed with reviewer's comment

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Repositorio Académico de la Universidad de Chile

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma