368,500 research outputs found
Almost overlap-free words and the word problem for the free Burnside semigroup satisfying x^2=x^3
In this paper we investigate the word problem of the free Burnside semigroup
satisfying x^2=x^3 and having two generators. Elements of this semigroup are
classes of equivalent words. A natural way to solve the word problem is to
select a unique "canonical" representative for each equivalence class. We prove
that overlap-free words and so-called almost overlap-free words (this notion is
some generalization of the notion of overlap-free words) can serve as canonical
representatives for corresponding equivalence classes. We show that such a word
in a given class, if any, can be efficiently found. As a result, we construct a
linear-time algorithm that partially solves the word problem for the semigroup
under consideration.Comment: 33 pages, submitted to Internat. J. of Algebra and Compu
Subword complexity and power avoidance
We begin a systematic study of the relations between subword complexity of
infinite words and their power avoidance. Among other things, we show that
-- the Thue-Morse word has the minimum possible subword complexity over all
overlap-free binary words and all -power-free binary words, but not
over all -power-free binary words;
-- the twisted Thue-Morse word has the maximum possible subword complexity
over all overlap-free binary words, but no word has the maximum subword
complexity over all -power-free binary words;
-- if some word attains the minimum possible subword complexity over all
square-free ternary words, then one such word is the ternary Thue word;
-- the recently constructed 1-2-bonacci word has the minimum possible subword
complexity over all \textit{symmetric} square-free ternary words.Comment: 29 pages. Submitted to TC
Similarity density of the Thue-Morse word with overlap-free infinite binary words
We consider a measure of similarity for infinite words that generalizes the
notion of asymptotic or natural density of subsets of natural numbers from
number theory. We show that every overlap-free infinite binary word, other than
the Thue-Morse word t and its complement t bar, has this measure of similarity
with t between 1/4 and 3/4. This is a partial generalization of a classical
1927 result of Mahler.Comment: In Proceedings AFL 2014, arXiv:1405.527
Shortest Repetition-Free Words Accepted by Automata
We consider the following problem: given that a finite automaton of
states accepts at least one -power-free (resp., overlap-free) word, what is
the length of the shortest such word accepted? We give upper and lower bounds
which, unfortunately, are widely separated.Comment: 12 pages, conference pape
RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison
Many algorithms for sequence analysis rely on word matching or word
statistics. Often, these approaches can be improved if binary patterns
representing match and don't-care positions are used as a filter, such that
only those positions of words are considered that correspond to the match
positions of the patterns. The performance of these approaches, however,
depends on the underlying patterns. Herein, we show that the overlap complexity
of a pattern set that was introduced by Ilie and Ilie is closely related to the
variance of the number of matches between two evolutionarily related sequences
with respect to this pattern set. We propose a modified hill-climbing algorithm
to optimize pattern sets for database searching, read mapping and
alignment-free sequence comparison of nucleic-acid sequences; our
implementation of this algorithm is called rasbhari. Depending on the
application at hand, rasbhari can either minimize the overlap complexity of
pattern sets, maximize their sensitivity in database searching or minimize the
variance of the number of pattern-based matches in alignment-free sequence
comparison. We show that, for database searching, rasbhari generates pattern
sets with slightly higher sensitivity than existing approaches. In our Spaced
Words approach to alignment-free sequence comparison, pattern sets calculated
with rasbhari led to more accurate estimates of phylogenetic distances than the
randomly generated pattern sets that we previously used. Finally, we used
rasbhari to generate patterns for short read classification with CLARK-S. Here
too, the sensitivity of the results could be improved, compared to the default
patterns of the program. We integrated rasbhari into Spaced Words; the source
code of rasbhari is freely available at http://rasbhari.gobics.de
- …