5 research outputs found

    Chromosome Descrambling Order Analysis in ciliates

    Get PDF
    Ciliates are a type of unicellular eukaryotic organism that has two types of nuclei within each cell; one is called the macronucleus (MAC) and the other is known as the micronucleus (MIC). During mating, ciliates exchange their MIC, destroy their own MAC, and create a new MAC from the genetic material of their new MIC. The process of developing a new MAC from the exchanged new MIC is known as gene assembly in ciliates, and it consists of a massive amount of DNA excision from the micronucleus, and the rearrangement of the rest of the DNA sequences. During the gene assembly process, the DNA segments that get eliminated are known as internal eliminated segments (IESs), and the remaining DNA segments that are rearranged in an order that is correct for creating proteins, are called macronuclear destined segments (MDSs). A topic of interest is to predict the correct order to descramble a gene or chromosomal segment. A prediction can be made based on the principle of parsimony, whereby the smallest sequence of operations is likely close to the actual number of operations that occurred. Interestingly, the order of MDSs in the newly assembled 22,354 Oxytricha trifallax MIC chromosome fragments provides evidence that multiple parallel recombinations occur, where the structure of the chromosomes allows for interleaving between two sections of the developing macronuclear chromosome in a manner that can be captured with a common string operation called the shuffle operation (the shuffle operation on two strings results in a new string by weaving together the first two, while preserving the order within each string). Thus, we studied four similar systems involving applications of shuffle to see how the minimum number of operations needed to assemble differs between the types. Two algorithms for each of the first two systems have been implemented that are both shown to be optimal. And, for the third and fourth systems, four and two heuristic algorithms, respectively, have been implemented. The results from these algorithms revealed that, in most cases, the third system gives the minimum number of applications of shuffle to descramble, but whether the best implemented algorithm for the third system is optimal or not remains an open question. The best implemented algorithm for the third system showed that 96.63% of the scrambled micronuclear chromosome fragments of Oxytricha trifallax can be descrambled by only 1 or 2 applications of shuffle. This small number of steps lends theoretical evidence that some structural component is enforcing an alignment of segments in a shuffle-like fashion, and then parallel recombination is taking place to enable MDS rearrangement and IES elimination. Another problem of interest is to classify segments of the MIC into MDSs and IESs; this is the second topic of the thesis, and is a matter of determining the right "class label", i.e. MDS or IES, on each nucleotide. Thus, training data of labelled input sequences was used with hidden Markov models (HMMs), which is a well-known supervised machine learning classification algorithm. HMMs of first-, second-, third-, fourth-, and fifth-order have been implemented. The accuracy of the classification was verified through 10-fold cross validation. Results from this work show that an HMM is more likely to fail to accurately classify micronuclear chromosomes without having some additional knowledge

    The Pathway to Detangle a Scrambled Gene

    Get PDF
    Programmed DNA elimination and reorganization frequently occur during cellular differentiation. Development of the somatic macronucleus in some ciliates presents an extreme case, involving excision of internal eliminated sequences (IESs) that interrupt coding DNA segments (macronuclear destined sequences, MDSs), as well as removal of transposon-like elements and extensive genome fragmentation, leading to 98% genome reduction in Stylonychia lemnae. Approximately 20-30% of the genes are estimated to be scrambled in the germline micronucleus, with coding segment order permuted and present in either orientation on micronuclear chromosomes. Massive genome rearrangements are therefore critical for development.To understand the process of DNA deletion and reorganization during macronuclear development, we examined the population of DNA molecules during assembly of different scrambled genes in two related organisms in a developmental time-course by PCR. The data suggest that removal of conventional IESs usually occurs first, accompanied by a surprising level of error at this step. The complex events of inversion and translocation seem to occur after repair and excision of all conventional IESs and via multiple pathways.This study reveals a temporal order of DNA rearrangements during the processing of a scrambled gene, with simpler events usually preceding more complex ones. The surprising observation of a hidden layer of errors, absent from the mature macronucleus but present during development, also underscores the need for repair or screening of incorrectly-assembled DNA molecules

    Scrambling analysis of ciliates

    Get PDF
    Ciliates are a class of organisms which undergo a genetic process called gene descrambling after mating. In order to better understand the problem, a literature review of past works has been presented in this thesis. This includes a brief summary of both the relevant biology and bioinformatics literature. Then, a formal definition of scrambling systems is developed which attempts to model the problem of sequence alignment between scrambled and descrambled genes. With this system, sequences can be classified into relevant functional segments. It also provides a framework whereby we can compare various ciliate sequence alignment algorithms. After that, a new method of predicting the various functional segments is studied. This method shows better coverage, and usually a better labelling score with certain parameters. Then we discuss several recent hypotheses as to how ciliates naturally descramble genes. An algorithm suite is developed to test these hypotheses. With the tests, we are able to computationally check which factors are potentially the most important. According to the current results with 247 pointer sequences of 13 micronuclear genes, examining repeats which are the same distance together with either the sequence or the size, as the real pointers, is almost always enough information to guide descrambling. Indeed, the real pointer sequence is the unique repeat 92.7% and 94.3% of the time within the 247 pointers, from the left and right respectively, using only the pointer distance and the pointer sequence information

    Formal Model and Simulation of the Gene Assembly Process in Ciliates

    Get PDF
    The construction process of the functional macronucleus in certain types of ciliates is known as the ciliate gene assembly process. It consists of a massive amount of DNA excision from the micronucleus and the rearrangement of the rest of the DNA sequences (in the case of stichotrichous ciliates). While several computational models have tried to represent certain parts of the gene assembly process, the real process remains not completely understood. In this research, a new formal model called the Computational 2JLP model is introduced based on the recent biological 2JLP model. For justifying the formal model, a simulation is created and tested with real data. Several parameters are introduced in the model that are used to test ambiguities or edge cases of the biological model. Parameters are systematically tested from the simulation to try to find their optimal values. Interestingly, a negative correlation is found between a parameter (which is used to filter out scnRNAs that are similar to IES specific sequences from the macronucleus) and the outcome of the simulation. It indicates that if a scnRNA consists of both an MDS and IES, then from the perspective of maximizing the outcome of the simulation, it is desirable to filter out this scnRNA. The simulator successfully performs the gene assembly process whether the inputs are scrambled or unscrambled DNA sequences. It is desirable for this model to serve as a foundation for future computational and mathematical study, and to help inform and refine the biological model
    corecore