unknown

Nested tandem repeat computation and analysis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computational Biology at Massey University

Abstract

Biological sequences have long been known to contain many classes of repeats. The most studied repetitive structure is the tandem repeat where many approximate copies of a common segment (the motif ) appear consecutively. In this thesis, a complex repetitive structure is investigated. This repetitive structure is called a nested tandem repeat. It consists of many approximate copies of two motifs interspersed with one another. This thesis is a collection of published and in progress papers. Each paper addresses a computational problem related to the analysis of nested tandem repeats. Nested tandem repeats have been observed in the intergenic spacer of the ribosomal DNA gene in Colocasia esculenta. The question of whether such repeats can be found elsewhere in biological sequence databases is addressed and NTRFinder, a software tool to detect nested tandem repeats, is described. Another problem that arises after detecting a nested tandem repeat is the alignment of the nested tandem repeat region against its two motifs. An algorithm that guarantees an optimal solution to this problem is introduced. After detecting nested tandem repeats and identifying their structures, the identification of the motif boundaries is an unsolved problem which arises not only in nested tandem repeats but in tandem repeats as well. Heuristic solutions to this problem are implemented and tested. In order to compare two tandem repeat sequences an algorithm that aligns a hypothetical ancestral sequence of both sequences against each sequence is presented. This algorithm considers substitutions, deletions, and unidirectional duplication, namely, from ancestor to descendant

    Similar works