39 research outputs found

    On a generalization of Abelian equivalence and complexity of infinite words

    Get PDF
    In this paper we introduce and study a family of complexity functions of infinite words indexed by k in Z^+ U {+infinity}. Let k in Z^+ U {+infinity} and A be a finite non-empty set. Two finite words u and v in A* are said to be k-Abelian equivalent if for all x in A* of length less than or equal to k, the number of occurrences of x in u is equal to the number of occurrences of x in v. This defines a family of equivalence relations sim_k on A*, bridging the gap between the usual notion of Abelian equivalence (when k = 1) and equality (when k = +infinity). We show that the number of k-Abelian equivalence classes of words of length n grows polynomially, although the degree is exponential in k. Given an infinite word omega in A^N, we consider the associated complexity function P^(k)_omega : N -> N which counts the number of k-Abelian equivalence classes of factors of omega of length n. We show that the complexity function P_k is intimately linked with periodicity. More precisely we define an auxiliary function q^k : N -> N and show that if P^(k)_omega(n) < q^k(n) for some k in Z^+ U {+infinity} and n >= 0, then omega is ultimately periodic. Moreover if omega is aperiodic, then P^(k)_omega(n) = q^k(n) if and only if omega is Sturmian. We also study k-Abelian complexity in connection with repetitions in words. Using Szemeredi's theorem, we show that if omega has bounded k-Abelian complexity, then for every D subset of N with positive upper density and for every positive integer N, there exists a k-Abelian N-power occurring in omega at some position j in D

    DNA Computing: Modelling in Formal Languages and Combinatorics on Words, and Complexity Estimation

    Get PDF
    DNA computing, an essential area of unconventional computing research, encodes problems using DNA molecules and solves them using biological processes. This thesis contributes to the theoretical research in DNA computing by modelling biological processes as computations and by studying formal language and combinatorics on words concepts motivated by DNA processes. It also contributes to the experimental research in DNA computing by a scaling comparison between DNA computing and other models of computation. First, for theoretical DNA computing research, we propose a new word operation inspired by a DNA wet lab protocol called cross-pairing polymerase chain reaction (XPCR). We define and study a word operation called word blending that models and generalizes an unexpected outcome of XPCR. The input words are uwx and ywv that share a non-empty overlap w, and the output is the word uwv. Closure properties of the Chomsky families of languages under this operation and its iterated version, the existence of a solution to equations involving this operation, and its state complexity are studied. To follow the XPCR experimental requirement closely, a new word operation called conjugate word blending is defined, where the subwords x and y are required to be identical. Closure properties of the Chomsky families of languages under this operation and the XPCR experiments that motivate and implement it are presented. Second, we generalize the sequence of Fibonacci words inspired by biological concepts on DNA. The sequence of Fibonacci words is an infinite sequence of words obtained from two initial letters f(1) = a and f(2)= b, by the recursive definition f(n+2) = f(n+1)*f(n), for all positive integers n, where * denotes word concatenation. After we propose a unified terminology for different types of Fibonacci words and corresponding results in the extensive literature on the topic, we define and explore involutive Fibonacci words motivated by ideas stemming from theoretical studies of DNA computing. The relationship between different involutive Fibonacci words and their borderedness and primitivity are studied. Third, we analyze the practicability of DNA computing experiments since DNA computing and other unconventional computing methods that solve computationally challenging problems often have the limitation that the space of potential solutions grows exponentially with their sizes. For such problems, DNA computing algorithms may achieve a linear time complexity with an exponential space complexity as a trade-off. Using the subset sum problem as the benchmark problem, we present a scaling comparison of the DNA computing (DNA-C) approach with the network biocomputing (NB-C) and the electronic computing (E-C) approaches, where the volume, computing time, and energy required, relative to the input size, are compared. Our analysis shows that E-C uses a tiny volume compared to that required by DNA-C and NB-C, at the cost of the E-C computing time being outperformed first by DNA-C and then by NB-C. In addition, NB-C appears to be more energy efficient than DNA-C for some input sets, and E-C is always an order of magnitude less energy efficient than DNA-C

    5-Abelian cubes are avoidable on binary alphabets

    Get PDF
    A k-abelian cube is a word uvw, where the factors u, v, and w are either pairwise equal, or have the same multiplicities for every one of their factors of length at most k. Previously it has been shown that k-abelian cubes are avoidable over a binary alphabet for k &gt;= 8. Here it is proved that this holds for k &gt;= 5.</p

    On location, domination and information retrieval

    Get PDF
    The thesis is divided into two main branches: identifying and locatingdominating codes, and information retrieval. The former topics are motivated by the aim to locate objects in sensor networks (or other similar applications) and the latter one by the need to retrieve information in memories such as DNA data storage systems. Albeit the underlying applications, the study on these topics mainly belongs to discrete mathematics; more specically, to the elds of coding and graph theory. The sensor networks are usually represented by graphs where vertices represent the monitored locations and edges the connections between the locations. Moreover, the locations of the sensors are determined by a code. Furthermore, the desired properties of the sensor network are deeply linked with the properties of the underlying code. The number of errors in reading the data is abundant in the DNA data storage systems. In particular, there can occur more errors than a reasonable error-correcting code can handle. However, this problem is somewhat oset by the possibility to obtain multiple approximations of the same information from the data storage. Hence, the information retrieval process can be modelled by the Levenshtein's channel model, where a message is sent through multiple noisy channels and multiple outputs are received. In the rst two papers of the thesis, we introduce and study the new concepts of self- and solid-locating-dominating codes as a natural analogy to self-identifying codes with respect to locating-dominating codes. The rst paper introduces these new codes and considers them in some graphs such as the Hamming graphs. Then, in the second paper, we broaden our view on the topic by considering graph theoretical questions. We give optimal codes in multiple dierent graph classes and some more general results using concepts such as the Dilworth number and graph complements. The third paper focuses on the q-ary Hamming spaces. In particular, we disprove a conjecture proposed by Goddard and Wash related to identifying codes. In the fourth paper, we return to self- and solid-locating-dominating codes and give optimal codes in some graph classes and consider their densities in innite graphs. In the fth paper, we consider information retrieval in memories; in particular, the Levenshtein's channel model. In the channel model, we transmit some codeword belonging to the binary Hamming space through multiple identical channels. With the help of multiple dierent outputs, we give a list of codewords which may have been sent. In the paper, we study the number of channels required to have a rather small (constant) list size when the properties of the channels, the code and the dimension of the Hamming space are xed. In particular, we give an exact relation between the number of channels and the asymptotic value of the maximum list size.Väitöskirja käsittelee kahta aihetta: identioivia ja paikantavia peittokoodeja sekä tiedon noutamista muistista. Ensimmäisen aiheen motivaationa on objektien paikantaminen sensoriverkoista (sekä muut samankaltaiset sovellukset) ja jälkimmäisen tiedonnouto DNA-muisteista. Näiden aiheiden tutkimus kuuluu diskreettiin matematiikkaan, täsmällisemmin koodaus- ja graa-teoriaan. Sensoriverkkoja kuvataan yleensä graafeilla, joissa solmut esittävät tarkkailtuja kohteita ja viivat yhteyksiä näiden kohteiden välillä. Edelleen sensorien paikat määräytyvät annetun koodin perusteella. Tästä johtuen sensoriverkon halutut ominaisuudet pohjautuvat vahvasti alla olevaan koodiin. Luettaessa tietoa DNA-muisteista tapahtuvien virheiden määrä saattaa olla erittäin suuri; erityisesti suurempi kuin kiinnitetyn virheitä korjaavan koodin korjauskyky. Toisaalta tilanne ei ole aivan näin ongelmallinen, sillä DNA-muisteista voidaan saada useita eri arvioita muistiin tallennetusta tiedosta. Näistä syistä johtuen tietojen noutamista DNA-muisteista voidaan mallintaa käyttäen Levenshteinin kanavamallia. Kanavamallissa yksi viesti lähetetään useiden häiriöisten kanavien kautta ja näin vastaanotetaan useita viestejä (yksi jokaisesta kanavasta). Väitöskirjan kahdessa ensimmäisessä julkaisussa esitellään ja tutkitaan uusia paikantavien peittokoodien luokkia, jotka pohjautuvat aiemmin tutkittuihin itse-identioiviin koodeihin. Ensimmäisessä julkaisussa on esitelty nämä koodiluokat sekä tutkittu niitä joissain graafeissa kuten Hammingin graafeissa. Tämän jälkeen toisessa julkaisussa käsitellään yleisiä graa-teoreettisia kysymyksiä. Julkaisussa esitetään optimaaliset koodit useille graaperheille sekä joitain yleisempiä tuloksia käyttäen mm. Dilworthin lukua sekä graakomplementteja. Kolmas julkaisu keskittyy q-arisiin Hammingin avaruuksiin. Erityisesti julkaisussa todistetaan vääräksi Goddardin ja Washin aiemmin esittämä identioivia koodeja koskeva otaksuma. Neljäs artikkeli käsittelee jo kahdessa ensimmäisessä artikkelissa esiteltyjä paikantavien peittokoodien luokkia. Artikkeli esittää optimaalisia koodeja useille graaperheille sekä käsittelee äärettömiä graafeja. Viides artikkeli käsittelee tiedonnoutoa ja erityisesti Levenshteinin kanavamallia. Kanavamallissa binääriseen Hammingin avaruuteen kuuluva koodisana lähetetään useiden identtisten kanavien läpi. Näistä kanavista vastaanotetaan useita eri arvioita lähetetystä koodisanasta ja rakennetaan lista mahdollisesti lähetetyistä sanoista. Artikkelissa tutkitaan kuinka monta kanavaa tarvitaan, jotta tämän listan koko on pieni (vakio), kun kanavien ominaisuudet, koodi ja Hammingin avaruuden dimensio on kiinnitetty. Erityisesti löydetään täsmällinen suhde kanavien lukumäärän ja asymptoottisesti maksimaalisen listan koon välille

    On the k-Abelian Equivalence Relation of Finite Words

    Get PDF
    This thesis is devoted to the so-called k-abelian equivalence relation of sequences of symbols, that is, words. This equivalence relation is a generalization of the abelian equivalence of words. Two words are abelian equivalent if one is a permutation of the other. For any positive integer k, two words are called k-abelian equivalent if each word of length at most k occurs equally many times as a factor in the two words. The k-abelian equivalence defines an equivalence relation, even a congruence, of finite words. A hierarchy of equivalence classes in between the equality relation and the abelian equivalence of words is thus obtained. Most of the literature on the k-abelian equivalence deals with infinite words. In this thesis we consider several aspects of the equivalence relations, the main objective being to build a fairly comprehensive picture on the structure of the k-abelian equivalence classes themselves. The main part of the thesis deals with the structural aspects of k-abelian equivalence classes. We also consider aspects of k-abelian equivalence in infinite words. We survey known characterizations of the k-abelian equivalence of finite words from the literature and also introduce novel characterizations. For the analysis of structural properties of the equivalence relation, the main tool is the characterization by the rewriting rule called the k-switching. Using this rule it is straightforward to show that the language comprised of the lexicographically least elements of the k-abelian equivalence classes is regular. Further word-combinatorial analysis of the lexicographically least elements leads us to describe the deterministic finite automata recognizing this language. Using tools from formal language theory combined with our analysis, we give an optimal expression for the asymptotic growth rate of the number of k-abelian equivalence classes of length n over an m-letter alphabet. Explicit formulae are computed for small values of k and m, and these sequences appear in Sloane’s Online Encyclopedia of Integer Sequences. Due to the fact that the k-abelian equivalence relation is a congruence of the free monoid, we study equations over the k-abelian equivalence classes. The main result in this setting is that any system of equations of k-abelian equivalence classes is equivalent to one of its finite subsystems, i.e., the monoid defined by the k-abelian equivalence relation possesses the compactness property. Concerning infinite words, we mainly consider the (k-)abelian complexity function. We complete a classification of the asymptotic abelian complexities of pure morphic binary words. In other words, given a morphism which has an infinite binary fixed point, the limit superior asymptotic abelian complexity of the fixed point can be computed (in principle). We also give a new proof of the fact that the k-abelian complexity of a Sturmian word is n + 1 for length n 2k. In fact, we consider several aspects of the k-abelian equivalence relation in Sturmian words using a dynamical interpretation of these words. We reprove the fact that any Sturmian word contains arbitrarily large k-abelian repetitions. The methods used allow to analyze the situation in more detail, and this leads us to define the so-called k-abelian critical exponent which measures the ratio of the exponent and the length of the root of a k-abelian repetition. This notion is connected to a deep number theoretic object called the Lagrange spectrum

    Theoretical and Practical Aspects Related to the Avoidability of Patterns in Words

    Get PDF
    This thesis concerns repetitive structures in words. More precisely, it contributes to studying appearance and absence of such repetitions in words. In the first and major part of this thesis, we study avoidability of unary patterns with permutations. The second part of this thesis deals with modeling and solving several avoidability problems as constraint satisfaction problems, using the framework of MiniZinc. Solving avoidability problems like the one mentioned in the past paragraph required, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. This gave us the idea of using SAT solvers. Representing the problem-based SAT solvers seemed to be a standardised, and usually very optimised approach to formulate and solve the well-known avoidability problems like avoidability of formulas with reversal and avoidability of patterns in the abelian sense too. The final part is concerned with a variation on a classical avoidance problem from combinatorics on words. Considering the concatenation of i different factors of the word w, pexp_i(w) is the supremum of powers that can be constructed by concatenation of such factors, and RTi(k) is then the infimum of pexp_i(w). Again, by checking infinite ternary words that satisfy some properties, we calculate the value RT_i(3) for even and odd values of i
    corecore