132 research outputs found

    Formal Model and Simulation of the Gene Assembly Process in Ciliates

    Get PDF
    The construction process of the functional macronucleus in certain types of ciliates is known as the ciliate gene assembly process. It consists of a massive amount of DNA excision from the micronucleus and the rearrangement of the rest of the DNA sequences (in the case of stichotrichous ciliates). While several computational models have tried to represent certain parts of the gene assembly process, the real process remains not completely understood. In this research, a new formal model called the Computational 2JLP model is introduced based on the recent biological 2JLP model. For justifying the formal model, a simulation is created and tested with real data. Several parameters are introduced in the model that are used to test ambiguities or edge cases of the biological model. Parameters are systematically tested from the simulation to try to find their optimal values. Interestingly, a negative correlation is found between a parameter (which is used to filter out scnRNAs that are similar to IES specific sequences from the macronucleus) and the outcome of the simulation. It indicates that if a scnRNA consists of both an MDS and IES, then from the perspective of maximizing the outcome of the simulation, it is desirable to filter out this scnRNA. The simulator successfully performs the gene assembly process whether the inputs are scrambled or unscrambled DNA sequences. It is desirable for this model to serve as a foundation for future computational and mathematical study, and to help inform and refine the biological model

    Chromosome Descrambling Order Analysis in ciliates

    Get PDF
    Ciliates are a type of unicellular eukaryotic organism that has two types of nuclei within each cell; one is called the macronucleus (MAC) and the other is known as the micronucleus (MIC). During mating, ciliates exchange their MIC, destroy their own MAC, and create a new MAC from the genetic material of their new MIC. The process of developing a new MAC from the exchanged new MIC is known as gene assembly in ciliates, and it consists of a massive amount of DNA excision from the micronucleus, and the rearrangement of the rest of the DNA sequences. During the gene assembly process, the DNA segments that get eliminated are known as internal eliminated segments (IESs), and the remaining DNA segments that are rearranged in an order that is correct for creating proteins, are called macronuclear destined segments (MDSs). A topic of interest is to predict the correct order to descramble a gene or chromosomal segment. A prediction can be made based on the principle of parsimony, whereby the smallest sequence of operations is likely close to the actual number of operations that occurred. Interestingly, the order of MDSs in the newly assembled 22,354 Oxytricha trifallax MIC chromosome fragments provides evidence that multiple parallel recombinations occur, where the structure of the chromosomes allows for interleaving between two sections of the developing macronuclear chromosome in a manner that can be captured with a common string operation called the shuffle operation (the shuffle operation on two strings results in a new string by weaving together the first two, while preserving the order within each string). Thus, we studied four similar systems involving applications of shuffle to see how the minimum number of operations needed to assemble differs between the types. Two algorithms for each of the first two systems have been implemented that are both shown to be optimal. And, for the third and fourth systems, four and two heuristic algorithms, respectively, have been implemented. The results from these algorithms revealed that, in most cases, the third system gives the minimum number of applications of shuffle to descramble, but whether the best implemented algorithm for the third system is optimal or not remains an open question. The best implemented algorithm for the third system showed that 96.63% of the scrambled micronuclear chromosome fragments of Oxytricha trifallax can be descrambled by only 1 or 2 applications of shuffle. This small number of steps lends theoretical evidence that some structural component is enforcing an alignment of segments in a shuffle-like fashion, and then parallel recombination is taking place to enable MDS rearrangement and IES elimination. Another problem of interest is to classify segments of the MIC into MDSs and IESs; this is the second topic of the thesis, and is a matter of determining the right "class label", i.e. MDS or IES, on each nucleotide. Thus, training data of labelled input sequences was used with hidden Markov models (HMMs), which is a well-known supervised machine learning classification algorithm. HMMs of first-, second-, third-, fourth-, and fifth-order have been implemented. The accuracy of the classification was verified through 10-fold cross validation. Results from this work show that an HMM is more likely to fail to accurately classify micronuclear chromosomes without having some additional knowledge

    Further Open Problems in Membrane Computing

    Get PDF
    A series of open problems and research topics in membrane com- puting are pointed out, most of them suggested by recent developments in this area. Many of these problems have several facets and branchings, and further facets and branchings can surely be found after addressing them in a more careful manner

    Formal models of the extension activity of DNA polymerase enzymes

    Get PDF
    The study of formal language operations inspired by enzymatic actions on DNA is part of ongoing efforts to provide a formal framework and rigorous treatment of DNA-based information and DNA-based computation. Other studies along these lines include theoretical explorations of splicing systems, insertion-deletion systems, substitution, hairpin extension, hairpin reduction, superposition, overlapping concatenation, conditional concatenation, contextual intra- and intermolecular recombinations, as well as template-guided recombination. First, a formal language operation is proposed and investigated, inspired by the naturally occurring phenomenon of DNA primer extension by a DNA-template-directed DNA polymerase enzyme. Given two DNA strings u and v, where the shorter string v (called the primer) is Watson-Crick complementary and can thus bind to a substring of the longer string u (called the template) the result of the primer extension is a DNA string that is complementary to a suffix of the template which starts at the binding position of the primer. The operation of DNA primer extension can be abstracted as a binary operation on two formal languages: a template language L1 and a primer language L2. This language operation is called L1-directed extension of L2 and the closure properties of various language classes, including the classes in the Chomsky hierarchy, are studied under directed extension. Furthermore, the question of finding necessary and sufficient conditions for a given language of target strings to be generated from a given template language when the primer language is unknown is answered. The canonic inverse of directed extension is used in order to obtain the optimal solution (the minimal primer language) to this question. The second research project investigates properties of the binary string and language operation overlap assembly as defined by Csuhaj-Varju, Petre and Vaszil as a formal model of the linear self-assembly of DNA strands: The overlap assembly of two strings, xy and yz, which share an overlap y, results in the string xyz. In this context, we investigate overlap assembly and its properties: closure properties of various language families under this operation, and related decision problems. A theoretical analysis of the possible use of iterated overlap assembly to generate combinatorial DNA libraries is also given. The third research project continues the exploration of the properties of the overlap assembly operation by investigating closure properties of various language classes under iterated overlap assembly, and the decidability of the completeness of a language. The problem of deciding whether a given string is terminal with respect to a language, and the problem of deciding if a given language can be generated by an overlap assembly operation of two other given languages are also investigated

    Sorting Permutations: Games, Genomes, and Cycles

    Get PDF
    Permutation sorting, one of the fundamental steps in pre-processing data for the efficient application of other algorithms, has a long history in mathematical research literature and has numerous applications. Two special-purpose sorting operations are considered in this paper: context directed swap, abbreviated cds, and context directed reversal, abbreviated cdr. These are special cases of sorting operations that were studied in prior work on permutation sorting. Moreover, cds and cdr have been postulated to model molecular sorting events that occur in the genome maintenance program of certain species of single-celled organisms called ciliates. This paper investigates mathematical aspects of these two sorting operations. The main result of this paper is a generalization of previously discovered characterizations of cds-sortability of a permutation. The combinatorial structure underlying this generalization suggests natural combinatorial two-player games. These games are the main mathematical innovation of this paper.Comment: to appear in Discrete Mathematics, Algorithms and Application

    Networks of Bio-inspired Processors

    Get PDF
    The goal of this work is twofold. Firstly, we propose a uniform view of three types of accepting networks of bio-inspired processors: networks of evolutionary processors, networks of splicing processors and networks of genetic processors. And, secondly, we survey some features of these networks: computational power, computational and descriptional complexity, the existence of universal networks, eciency as problem solvers and the relationships among them

    Doctor of Philosophy

    Get PDF
    dissertationGenotype Phenotype Association (GPA) is a means to identify candidate genes and genetic variants that may contribute to phenotypic variation. Technological advances in DNA sequencing continue to improve the efficiency and accuracy of GPA. Currently, High Throughput Sequencing (HTS) is the preferred method for GPA as it is fast and economical. HTS allows for population-level characterization of genetic variation, required for GPA studies. Despite the potential power of using HTS in GPA studies, there are technical hurdles that must be overcome. For instance, the excessive error rate in HTS data and the sheer size of population-level data can hinder GPA studies. To overcome these challenges, I have written two software programs for the purpose of HTS GPA. The first toolkit, GPAT++, is designed to detect GPA using small genetic variants. Unlike pervious software, GPAT++'s association test models the inherent errors in HTS, preventing many spurious GPA. The second toolkit, Whole Genome Alignment Metrics (WHAM), was designed for GPA using large genetic variants (structural variants). By integrating both structural variant identification and association testing, WHAM can identify shared structural variants associated with a phenotype. Both GPAT++ and WHAM have been successfully applied to real-world GPA studie

    Reprogrammable In Vivo Architecture

    Get PDF
    The biological cell is the intricate, yet ubiquitous component of life, able to grow, adapt and reproduce. The genetic material contained within a cell encodes information which directs its development and behaviour, and this information is passed down from one generation of cell to the next. One emerging interest, resulting from collaborations between the disciplines of Molecular Biology and Computer Science, is to encode computational programs, sets of engineered, information processing instructions, in genetic material, to be executed by living cells.So far, the large majority of in vivo computation research has been based on the detection and conditional manipulation of protein concentrations inside cells, which is the biological method of gene expression. In contrast, this thesis describes how a computational program, encoded in genetic material inside a bacterium, can be triggered by external stimuli to reassemble itself in a directed manner to create a newly arranged computational program.In order to investigate the potential utility of in vivo self-arranging programs, software was designed to explore a search space of candidate computational programs, encoded in genetic material, which are able to rearrange themselves; to simulate these candidates and to evaluate their behaviour against a set of criteria. Rearrangements were facilitated by biological catalysts which can selectively sever and rejoin genetic material in a cooperative manner. Their ability to perform compound operations was found to allow for a general purpose mechanismAs a proof of concept, one of the candidate computational programs, a two-colour switch which can be set irreversibly through its rearrangement, was encoded in genetic material. Measurements of in vivo expression were observed resulting from in vitro rearrangement manipulations, to illustrate its operation
    corecore