139 research outputs found

    Phylogenetic Tree Construction for Starfish and Primate Genomes via Alignment Free Methods

    Get PDF
    A phylogenetic tree is a tree like diagram showing the evolutionary relationship among various species based on their differences or similarity in their physical or genetic makeup.The similarity in their genetic makeup is traditionally measured based on pairwise distance between their gene sequences using sequence alignment methods. Due to the advancement in next generation sequencing technologies there is a huge amount of datasets available for partially or completely sequenced genomes. These massive datasets requires a faster comparison methods other than the traditional alignment-based approaches. Therefore, alignment free approaches are gaining popularity in recent years. In this thesis, we compare alignment-based and various alignment free methods for phylogenetic tree construction. The alignment free methods we study are based on k-mer frequency, Average Common Substring (ACS) and ACS with position restrictions and mismatches. The position restricted ACS is a novel contribution of this thesis. To evaluate performance of the alignment free approaches we applied it to phylogeny reconstruction using DNA ( 27 primate mitochondrial genomes) and protein (Starfish RNA-seq) sequence sets. The phylogenetic trees are constructed using Neighbor joining to the distance matrices obtained with the above mentioned alignment-free methods. The resulting phylogenetic trees are then compared with the reference tree using Branch Score Distance measure. Both the Neighbor joining and the Branch Score Distance Measure are calculated by using the programs neighbor and treedist from the PHYLIP package

    Three mathematical issues in reconstructing ancestral genome

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Efficient estimation of evolutionary distances

    Get PDF
    The advent of high throughput sequencers has lead to a dramatic increase in the size of available genomic data. Standard methods, which have worked well for many years, are not suitable for the analysis of big data sets, due to their reliance on a time-consuming alignment step. In this thesis, a new alignment-free approach for phylogeny reconstruction is introduced. The corresponding program, andi, is orders of magnitude faster than classical approaches and also superior to comparable alignment-free methods. The central data structure in andi is the enhanced suffix array. It is used to find long exact matches between sequences. In this thesis, various approaches to the construction of enhanced suffix arrays, including novel ones, are evaluated with respect to performance. Additionally, a new parallel algorithm for the computation of suffix arrays is introduced

    HYMENOPTERAN MOLECULAR PHYLOGENETICS: FROM APOCRITA TO BRACONIDAE (ICHNEUMONOIDEA)

    Get PDF
    Two separate phylogenetic studies were performed for two different taxonomic levels within Hymenoptera. The first study examined the utility of expressed sequence tags for resolving relationships among hymenopteran superfamilies. Transcripts were assembled from 14,000 sequenced clones for 6 disparate Hymenopteran taxa, averaging over 660 unique contigs per species. Orthology and gene determination were performed using modifications to a previously developed computerized pipeline and compared against annotated insect genomes. Sequences from additional taxa were added from public databases with a final dataset of 24 genes for 16 taxa. The concatenated dataset recovered a robust and well-supported topology; however, there was extreme incongruity among individual gene trees. Analyses of sequences indicated strong compositional and transition biases, particularly in the third codon positions. The use of filtered supernetworks aided visualization of the existing congruent phylogenetic signal that existed across the individual gene trees. Additionally, treeness triangle plots indicated a strong residual signal in several gene trees and across codon positions in the concatenated dataset. However, most analyses of the concatenated dataset recovered expected relationships, known from other independent analyses. Thus, ESTs provide a powerful source of information for phylogenetic analysis, but results are sensitive to low taxonomic sampling and missing data. The second study examined subfamilial relationships within the parasitoid family Braconidae, using over 4kb of sequence data for 139 taxa. Bayesian inference of the concatenated dataset recovered a robust phylogeny, particularly for early divergences within the family. There was strong evidence supporting two independent lineages within the family: one leading to the noncyclostomes and one leading to the cyclostomes. Ancestral state reconstructions were performed to test the theory of ectoparasitism as the ancestral condition for all taxa within the family. Results indicated an endoparasitic ancestor for the family and for the non-cyclostome lineage, with an early transition to ectoparasitism for the cyclostome lineage. However, reconstructions of some nodes were sensitive to outgroup coding and will also be impacted with increased biological knowledge

    Halovirus HF2 Intergenic Repeat Sequences Carry Promoters

    Get PDF
    Halovirus HF2 was the first member of the Haloferacalesvirus genus to have its genome fully sequenced, which revealed two classes of intergenic repeat (IR) sequences: class I repeats of 58 bp in length, and class II repeats of 29 bp in length. Both classes of repeat contain AT-rich motifs that were conjectured to represent promoters. In the present study, nine IRs were cloned upstream of the bgaH reporter gene, and all displayed promoter activity, providing experimental evidence for the previous conjecture. Comparative genomics showed that IR sequences and their relative genomic positions were strongly conserved among other members of the same virus genus. The transcription of HF2 was also examined by the reverse-transcriptase-PCR (RT-PCR) method, which demonstrated very long transcripts were produced that together covered most of the genome, and from both strands. The presence of long counter transcripts suggests a regulatory role or possibly unrecognized coding potential
    corecore