Search CORE

368,500 research outputs found

Almost overlap-free words and the word problem for the free Burnside semigroup satisfying x^2=x^3

Author: Plyushchenko A. N.
Shur A. M.
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we investigate the word problem of the free Burnside semigroup satisfying x^2=x^3 and having two generators. Elements of this semigroup are classes of equivalent words. A natural way to solve the word problem is to select a unique "canonical" representative for each equivalence class. We prove that overlap-free words and so-called almost overlap-free words (this notion is some generalization of the notion of overlap-free words) can serve as canonical representatives for corresponding equivalence classes. We show that such a word in a given class, if any, can be efficiently found. As a result, we construct a linear-time algorithm that partially solves the word problem for the semigroup under consideration.Comment: 33 pages, submitted to Internat. J. of Algebra and Compu

arXiv.org e-Print Archive

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Subword complexity and power avoidance

Author: Shallit Jeffrey
Shur Arseny M.
Publication venue: 'Elsevier BV'
Publication date: 16/01/2018
Field of study

We begin a systematic study of the relations between subword complexity of infinite words and their power avoidance. Among other things, we show that -- the Thue-Morse word has the minimum possible subword complexity over all overlap-free binary words and all

(\frac 73)

-power-free binary words, but not over all

(\frac 73)^+

-power-free binary words; -- the twisted Thue-Morse word has the maximum possible subword complexity over all overlap-free binary words, but no word has the maximum subword complexity over all

(\frac 73)

-power-free binary words; -- if some word attains the minimum possible subword complexity over all square-free ternary words, then one such word is the ternary Thue word; -- the recently constructed 1-2-bonacci word has the minimum possible subword complexity over all \textit{symmetric} square-free ternary words.Comment: 29 pages. Submitted to TC

arXiv.org e-Print Archive

University of Waterloo's Institutional Repository

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Similarity density of the Thue-Morse word with overlap-free infinite binary words

Author: Du Chen Fei
Shallit Jeffrey
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

We consider a measure of similarity for infinite words that generalizes the notion of asymptotic or natural density of subsets of natural numbers from number theory. We show that every overlap-free infinite binary word, other than the Thue-Morse word t and its complement t bar, has this measure of similarity with t between 1/4 and 3/4. This is a partial generalization of a classical 1927 result of Mahler.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals

Shortest Repetition-Free Words Accepted by Automata

Author: A. Carpi
A. Restivo
F.-J. Brandenburg
M. Ito
R.C. Lyndon
S. Horváth
T. Anderson
Publication venue
Publication date: 01/01/2013
Field of study

We consider the following problem: given that a finite automaton

M

N

states accepts at least one

k

-power-free (resp., overlap-free) word, what is the length of the shortest such word accepted? We give upper and lower bounds which, unfortunately, are widely separated.Comment: 12 pages, conference pape

arXiv.org e-Print Archive

Crossref

RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison

Author: Hahn Lars
Leimeister Chris-André
Lonardi Stefano
Morgenstern Burkhard
Ounit Rachid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

GRO.publications (Univ. Göttingen)