84 research outputs found

    Parametrized Stochastic Grammars for RNA Secondary Structure Prediction

    Full text link
    We propose a two-level stochastic context-free grammar (SCFG) architecture for parametrized stochastic modeling of a family of RNA sequences, including their secondary structure. A stochastic model of this type can be used for maximum a posteriori estimation of the secondary structure of any new sequence in the family. The proposed SCFG architecture models RNA subsequences comprising paired bases as stochastically weighted Dyck-language words, i.e., as weighted balanced-parenthesis expressions. The length of each run of unpaired bases, forming a loop or a bulge, is taken to have a phase-type distribution: that of the hitting time in a finite-state Markov chain. Without loss of generality, each such Markov chain can be taken to have a bounded complexity. The scheme yields an overall family SCFG with a manageable number of parameters.Comment: 5 pages, submitted to the 2007 Information Theory and Applications Workshop (ITA 2007

    Acta Cybernetica : Volume 14. Number 1.

    Get PDF

    Splicing Systems from Past to Future: Old and New Challenges

    Full text link
    A splicing system is a formal model of a recombinant behaviour of sets of double stranded DNA molecules when acted on by restriction enzymes and ligase. In this survey we will concentrate on a specific behaviour of a type of splicing systems, introduced by P\u{a}un and subsequently developed by many researchers in both linear and circular case of splicing definition. In particular, we will present recent results on this topic and how they stimulate new challenging investigations.Comment: Appeared in: Discrete Mathematics and Computer Science. Papers in Memoriam Alexandru Mateescu (1952-2005). The Publishing House of the Romanian Academy, 2014. arXiv admin note: text overlap with arXiv:1112.4897 by other author

    Generative capacity of sticker systems with the presence of weights

    Get PDF
    DNA computing involves computing models which use the recombination behaviour of DNA molecules as computation devices. This idea was successfully applied by Adleman in his biological experiment in order to show the solvability of the Hamiltonian path problem for larger instances. A DNA-based computation model called a sticker system is an abstraction of the computations using the recombination behaviour as in Adleman’s experiment. In this paper, the generative capacity of several variants of bounded delay and unrestricted weighted sticker systems is investigated. The relation between families of languages generated by several variants of weighted sticker systems and weighted grammars is also presented

    DNA Computing: Modelling in Formal Languages and Combinatorics on Words, and Complexity Estimation

    Get PDF
    DNA computing, an essential area of unconventional computing research, encodes problems using DNA molecules and solves them using biological processes. This thesis contributes to the theoretical research in DNA computing by modelling biological processes as computations and by studying formal language and combinatorics on words concepts motivated by DNA processes. It also contributes to the experimental research in DNA computing by a scaling comparison between DNA computing and other models of computation. First, for theoretical DNA computing research, we propose a new word operation inspired by a DNA wet lab protocol called cross-pairing polymerase chain reaction (XPCR). We define and study a word operation called word blending that models and generalizes an unexpected outcome of XPCR. The input words are uwx and ywv that share a non-empty overlap w, and the output is the word uwv. Closure properties of the Chomsky families of languages under this operation and its iterated version, the existence of a solution to equations involving this operation, and its state complexity are studied. To follow the XPCR experimental requirement closely, a new word operation called conjugate word blending is defined, where the subwords x and y are required to be identical. Closure properties of the Chomsky families of languages under this operation and the XPCR experiments that motivate and implement it are presented. Second, we generalize the sequence of Fibonacci words inspired by biological concepts on DNA. The sequence of Fibonacci words is an infinite sequence of words obtained from two initial letters f(1) = a and f(2)= b, by the recursive definition f(n+2) = f(n+1)*f(n), for all positive integers n, where * denotes word concatenation. After we propose a unified terminology for different types of Fibonacci words and corresponding results in the extensive literature on the topic, we define and explore involutive Fibonacci words motivated by ideas stemming from theoretical studies of DNA computing. The relationship between different involutive Fibonacci words and their borderedness and primitivity are studied. Third, we analyze the practicability of DNA computing experiments since DNA computing and other unconventional computing methods that solve computationally challenging problems often have the limitation that the space of potential solutions grows exponentially with their sizes. For such problems, DNA computing algorithms may achieve a linear time complexity with an exponential space complexity as a trade-off. Using the subset sum problem as the benchmark problem, we present a scaling comparison of the DNA computing (DNA-C) approach with the network biocomputing (NB-C) and the electronic computing (E-C) approaches, where the volume, computing time, and energy required, relative to the input size, are compared. Our analysis shows that E-C uses a tiny volume compared to that required by DNA-C and NB-C, at the cost of the E-C computing time being outperformed first by DNA-C and then by NB-C. In addition, NB-C appears to be more energy efficient than DNA-C for some input sets, and E-C is always an order of magnitude less energy efficient than DNA-C

    Finite Models of Splicing and Their Complexity

    Get PDF
    Durante las dos últimas décadas ha surgido una colaboración estrecha entre informáticos, bioquímicos y biólogos moleculares, que ha dado lugar a la investigación en un área conocida como la computación biomolecular. El trabajo en esta tesis pertenece a este área, y estudia un modelo de cómputo llamado sistema de empalme (splicing system). El empalme es el modelo formal del corte y de la recombinación de las moléculas de ADN bajo la influencia de las enzimas de la restricción.Esta tesis presenta el trabajo original en el campo de los sistemas de empalme, que, como ya indica el título, se puede dividir en dos partes. La primera parte introduce y estudia nuevos modelos finitos de empalme. La segunda investiga aspectos de complejidad (tanto computacional como descripcional) de los sistema de empalme. La principal contribución de la primera parte es que pone en duda la asunción general que una definición finita, más realista de sistemas de empalme es necesariamente débil desde un punto de vista computacional. Estudiamos varios modelos alternativos y demostramos que en muchos casos tienen más poder computacional. La segunda parte de la tesis explora otro territorio. El modelo de empalme se ha estudiado mucho respecto a su poder computacional, pero las consideraciones de complejidad no se han tratado apenas. Introducimos una noción de la complejidad temporal y espacial para los sistemas de empalme. Estas definiciones son utilizadas para definir y para caracterizar las clases de complejidad para los sistemas de empalme. Entre otros resultados, presentamos unas caracterizaciones exactas de las clases de empalme en términos de clases de máquina de Turing conocidas. Después, usando una nueva variante de sistemas de empalme, que acepta lenguajes en lugar de generarlos, demostramos que los sistemas de empalme se pueden usar para resolver problemas. Por último, definimos medidas de complejidad descriptional para los sistemas de empalme. Demostramos que en este respecto los sistemas de empalme finitos tienen buenas propiedades comparadosOver the last two decades, a tight collaboration has emerged between computer scientists, biochemists and molecular biologists, which has spurred research into an area known as DNAComputing (also biomolecular computing). The work in this thesis belongs to this field, and studies a computational model called splicing system. Splicing is the formal model of the cutting and recombination of DNA molecules under the influence of restriction enzymes.This thesis presents original work in the field of splicing systems, which, as the title already indicates, can be roughly divided into two parts: 'Finite models of splicing' on the onehand and 'their complexity' on the other. The main contribution of the first part is that it challenges the general assumption that a finite, more realistic definition of splicing is necessarily weal from a computational point of view. We propose and study various alternative models and show that in most cases they have more computational power, often reaching computational completeness. The second part explores other territory. Splicing research has been mainly focused on computational power, but complexity considerations have hardly been addressed. Here we introduce notions of time and space complexity for splicing systems. These definitions are used to characterize splicing complexity classes in terms of well known Turing machine classes. Then, using a new accepting variant of splicing systems, we show that they can also be used as problem solvers. Finally, we study descriptional complexity. We define measures of descriptional complexity for splicing systems and show that for representing regular languages they have good properties with respect to finite automata, especially in the accepting variant

    Languages Generated by Iterated Idempotencies.

    Get PDF
    The rewrite relation with parameters m and n and with the possible length limit = k or :::; k we denote by w~, =kW~· or ::;kw~ respectively. The idempotency languages generated from a starting word w by the respective operations are wDAlso other special cases of idempotency languages besides duplication have come up in different contexts. The investigations of Ito et al. about insertion and deletion, Le., operations that are also observed in DNA molecules, have established that w5 and w~ both preserve regularity.Our investigations about idempotency relations and languages start out from the case of a uniform length bound. For these relations =kW~ the conditions for confluence are characterized completely. Also the question of regularity is -k n answered for aH the languages w- D 1 are more complicated and belong to the class of context-free languages.For a generallength bound, i.e."for the relations :"::kW~, confluence does not hold so frequently. This complicatedness of the relations results also in more complicated languages, which are often non-regular, as for example the languages WWithout any length bound, idempotency relations have a very complicated structure. Over alphabets of one or two letters we still characterize the conditions for confluence. Over three or more letters, in contrast, only a few cases are solved. We determine the combinations of parameters that result in the regularity of wDIn a second chapter sorne more involved questions are solved for the special case of duplication. First we shed sorne light on the reasons why it is so difficult to determine the context-freeness ofduplication languages. We show that they fulfiH aH pumping properties and that they are very dense. Therefore aH the standard tools to prove non-context-freness do not apply here.The concept of root in Formal Language ·Theory is frequently used to describe the reduction of a word to another one, which is in sorne sense elementary.For example, there are primitive roots, periodicity roots, etc. Elementary in connection with duplication are square-free words, Le., words that do not contain any repetition. Thus we define the duplication root of w to consist of aH the square-free words, from which w can be reached via the relation w~.Besides sorne general observations we prove the decidability of the question, whether the duplication root of a language is finite.Then we devise acode, which is robust under duplication of its code words.This would keep the result of a computation from being destroyed by dupli cations in the code words. We determine the exact conditions, under which infinite such codes exist: over an alphabet of two letters they exist for a length bound of 2, over three letters already for a length bound of 1.Also we apply duplication to entire languages rather than to single words; then it is interesting to determine, whether regular and context-free languages are closed under this operation. We show that the regular languages are closed under uniformly bounded duplication, while they are not closed under duplication with a generallength bound. The context-free languages are closed under both operations.The thesis concludes with a list of open problems related with the thesis' topics