105 research outputs found

    Efficient String Matching on Coded Texts

    Get PDF
    The so called "four Russians technique'' is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n / lambda) memory cells in O(log(lambda) ) time using n / log(lambda) processors. This paper presents an efficient CRCW-PRAM string-matching algorithm for coded texts that takes O(log log(m/lambda)) time making only O(n / lambda ) operations, an improvement by a factor of lambda = O(log n) on the number of operations used in previous algorithms. Using this string-matching algorithm one can test if a string is square-free and find all palindromes in a string in O(log log n) time using n / log log n processors

    Privileged Words and Sturmian Words

    Get PDF
    This dissertation has two almost unrelated themes: privileged words and Sturmian words. Privileged words are a new class of words introduced recently. A word is privileged if it is a complete first return to a shorter privileged word, the shortest privileged words being letters and the empty word. Here we give and prove almost all results on privileged words known to date. On the other hand, the study of Sturmian words is a well-established topic in combinatorics on words. In this dissertation, we focus on questions concerning repetitions in Sturmian words, reproving old results and giving new ones, and on establishing completely new research directions. The study of privileged words presented in this dissertation aims to derive their basic properties and to answer basic questions regarding them. We explore a connection between privileged words and palindromes and seek out answers to questions on context-freeness, computability, and enumeration. It turns out that the language of privileged words is not context-free, but privileged words are recognizable by a linear-time algorithm. A lower bound on the number of binary privileged words of given length is proven. The main interest, however, lies in the privileged complexity functions of the Thue-Morse word and Sturmian words. We derive recurrences for computing the privileged complexity function of the Thue-Morse word, and we prove that Sturmian words are characterized by their privileged complexity function. As a slightly separate topic, we give an overview of a certain method of automated theorem-proving and show how it can be applied to study privileged factors of automatic words. The second part of this dissertation is devoted to Sturmian words. We extensively exploit the interpretation of Sturmian words as irrational rotation words. The essential tools are continued fractions and elementary, but powerful, results of Diophantine approximation theory. With these tools at our disposal, we reprove old results on powers occurring in Sturmian words with emphasis on the fractional index of a Sturmian word. Further, we consider abelian powers and abelian repetitions and characterize the maximum exponents of abelian powers with given period occurring in a Sturmian word in terms of the continued fraction expansion of its slope. We define the notion of abelian critical exponent for Sturmian words and explore its connection to the Lagrange spectrum of irrational numbers. The results obtained are often specialized for the Fibonacci word; for instance, we show that the minimum abelian period of a factor of the Fibonacci word is a Fibonacci number. In addition, we propose a completely new research topic: the square root map. We prove that the square root map preserves the language of any Sturmian word. Moreover, we construct a family of non-Sturmian optimal squareful words whose language the square root map also preserves.This construction yields examples of aperiodic infinite words whose square roots are periodic.Siirretty Doriast

    Combinatorics on Words. New Aspects on Avoidability, Defect Effect, Equations and Palindromes

    Get PDF
    In this thesis we examine four well-known and traditional concepts of combinatorics on words. However the contexts in which these topics are treated are not the traditional ones. More precisely, the question of avoidability is asked, for example, in terms of k-abelian squares. Two words are said to be k-abelian equivalent if they have the same number of occurrences of each factor up to length k. Consequently, k-abelian equivalence can be seen as a sharpening of abelian equivalence. This fairly new concept is discussed broader than the other topics of this thesis. The second main subject concerns the defect property. The defect theorem is a well-known result for words. We will analyze the property, for example, among the sets of 2-dimensional words, i.e., polyominoes composed of labelled unit squares. From the defect effect we move to equations. We will use a special way to define a product operation for words and then solve a few basic equations over constructed partial semigroup. We will also consider the satisfiability question and the compactness property with respect to this kind of equations. The final topic of the thesis deals with palindromes. Some finite words, including all binary words, are uniquely determined up to word isomorphism by the position and length of some of its palindromic factors. The famous Thue-Morse word has the property that for each positive integer n, there exists a factor which cannot be generated by fewer than n palindromes. We prove that in general, every non ultimately periodic word contains a factor which cannot be generated by fewer than 3 palindromes, and we obtain a classification of those binary words each of whose factors are generated by at most 3 palindromes. Surprisingly these words are related to another much studied set of words, Sturmian words.Siirretty Doriast

    Patterns and Signals of Biology: An Emphasis On The Role of Post Translational Modifications in Proteomes for Function and Evolutionary Progression

    Get PDF
    After synthesis, a protein is still immature until it has been customized for a specific task. Post-translational modifications (PTMs) are steps in biosynthesis to perform this customization of protein for unique functionalities. PTMs are also important to protein survival because they rapidly enable protein adaptation to environmental stress factors by conformation change. The overarching contribution of this thesis is the construction of a computational profiling framework for the study of biological signals stemming from PTMs associated with stressed proteins. In particular, this work has been developed to predict and detect the biological mechanisms involved in types of stress response with PTMs in mitochondrial (Mt) and non-Mt protein. Before any mechanism can be studied, there must first be some evidence of its existence. This evidence takes the form of signals such as biases of biological actors and types of protein interaction. Our framework has been developed to locate these signals, distilled from “Big Data” resources such as public databases and the the entire PubMed literature corpus. We apply this framework to study the signals to learn about protein stress responses involving PTMs, modification sites (MSs). We developed of this framework, and its approach to analysis, according to three main facets: (1) by statistical evaluation to determine patterns of signal dominance throughout large volumes of data, (2) by signal location to track down the regions where the mechanisms must be found according to the types and numbers of associated actors at relevant regions in protein, and (3) by text mining to determine how these signals have been previously investigated by researchers. The results gained from our framework enable us to uncover the PTM actors, MSs and protein domains which are the major components of particular stress response mechanisms and may play roles in protein malfunction and disease

    Rich Words and Balanced Words

    Get PDF
    This thesis is mostly focused on palindromes. Palindromes have been studied extensively, in recent years, in the field of combinatorics on words.Our main focus is on rich words, also known as full words. These are words which have maximum number of distinct palindromes as factors.We shed some more light on these words and investigate certain restricted problems. Finite rich words are known to be extendable to infinite rich words. We study more closely how many different ways, and in which situations, rich words can be extended so that they remain rich.The defect of a ord is defined to be the number of palindromes the word is lacking.We will generalize the definition of defect with respect to extending the word to be infinite.The number of rich words, on an alphabet of size nn, is given an upper and a lower bound. Hof, Knill and Simon presented (Commun. Math. Phys. 174, 1995) a well-known question whether all palindromic subshifts which are enerated by primitive substitutions arise from substitutions which are in class P. Over the years, this question has transformed a bit and is nowadays called the class P conjecture. The main point of the conjecture is to attempt to explain how an infinite word can contain infinitely many palindromes.We will prove a partial result of the conjecture. Rich square-free words are known to be finite (Pelantov\'a and Sarosta, Discrete Math. 313, 2013). We will give another proof for that result. Since they are finite, there exists a longest such word on an nn-ary alphabet.We give an upper and a lower bound for the length of that word. We study also balanced words. Oliver Jenkinson proved (Discrete Math., Alg. and Appl. 1(4), 2009) that if we take the partial sum of the lexicographically ordered orbit of a binary word, then the balanced word gives the least partial sum. The balanced word also gives the largest product. We will show that, at the other extreme, there are the words of the form 0qp1p0^{q-p}1^p (pp and qq are integers with 1p<q1\leq p<q), which we call the most unbalanced words. They give the greatest partial sum and the smallest product.Tässä väitöskirjassa käsitellään pääasiassa palindromeja. Palindromeja on tutkittu viime vuosina runsaasti sanojen kombinatoriikassa.Suurin kiinnostuksen kohde tässä tutkielmassa on rikkaissa sanoissa. Nämä ovat sanoja joissa on maksimaalinen määrä erilaisia palindromeja tekijöinä.Näitä sanoja tutkitaan monesta eri näkökulmasta. Äärellisiä rikkaita sanoja voidaan tunnetusti jatkaa äärettömiksi rikkaiksi sanoiksi.Työssä tutkitaan tarkemmin sitä, miten monella tavalla ja missä eri tilanteissa rikkaita sanoja voidaan jatkaa siten, että ne pysyvät rikkaina.Sanan vajauksella tarkoitetaan puuttuvien palindromien lukumäärää.Vajauksen käsite yleistetään tapaukseen, jossa sanaa on jatkettava äärettömäksi sanaksi.Rikkaiden sanojen lukumäärälle annetaan myös ylä- ja alaraja. Hof, Knill ja Simon esittivät kysymyksen (Commun. Math. Phys. 174, 1995), saadaanko kaikki äärettömät sanat joissa on ääretön määrä palindromeja tekijöinä ja jotka ovat primitiivisen morfismin generoimia, morfismeista jotka kuuluvat luokkaan P. Nykyään tätä ongelmaa kutsutaan luokan P konjektuuriksi ja sen tarkoitus on saada selitys sille,millä tavalla äärettömässä sanassa voi olla tekijöinä äärettömän monta palindromia. Osittainen tulos tästä konjektuurista todistetaan. Rikkaiden neliövapaiden sanojen tiedetään olevan äärellisiä (Pelantov\'a ja Starosta, Discrete Math. 313, 2013). Tälle tulokselle annetaan uudenlainen todistus.Koska kyseiset sanat ovat äärellisiä, voidaan selvittää mikä niistä on pisin.Ylä- ja alaraja annetaan tällaisen pisimmän sanan pituudelle. Työssä tutkitaan myös tasapainotettuja sanoja.Tasapainotetut sanat antavat pienimmän osittaissumman binäärisille sanoille (Jenkinson, Discrete Math., Alg. and Appl. 1(4), 2009).Lisäksi ne antavat suurimman tulon.Muotoa 0qp1p0^{q-p}1^p (pp ja qq ovat kokonaislukuja joille 1p<q1\leq p<q) olevien sanojen todistetaan vastaavasti antavan suurimman osittaissumman ja pienimmän tulon.Ne muodostavat täten toisen ääripään tasapainotetuille sanoille, ja asettavat kaikki muut sanat näiden väliin.Siirretty Doriast
    corecore