3 research outputs found

    EREM: Parameter Estimation and Ancestral Reconstruction by Expectation-Maximization Algorithm for a Probabilistic Model of Genomic Binary Characters Evolution

    Get PDF
    Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches)

    In search of lost introns

    Full text link
    Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after O(nL)O(nL) preprocessing time, subsequent evaluations take O(nL/log⁥L)O(nL/\log L) time almost surely in the Yule-Harding random model of nn-taxon phylogenies, where LL is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

    Relative Timing of Intron Gain and a New Marker for Phylogenetic Analyses

    Get PDF
    Despite decades of effort by molecular systematists, the trees of life of eukaryotic organisms still remain partly unresolved or in conflict with each other. An ever increasing number of fully-sequenced genomes of various eukaryotes allows to consider gene and species phylogenies at genome-scale. However, such phylogenomics-based approaches also revealed that more taxa and more and more gene sequences are not the ultimate solution to fully resolve these conflicts, and that there is a need for sequence-independent phylogenetic meta-characters that are derived from genome sequences. Spliceosomal introns are characteristic features of eukaryotic nuclear genomes. The relatively rare changes of spliceosomal intron positions have already been used as genome-level markers, both for the estimation of intron evolution and phylogenies, however with variable success. In this thesis, a specific subset of these changes is introduced and established as a novel phylogenetic marker, termed near intron pair (NIP). These characters are inferred from homologous genes that contain mutually-exclusive intron presences at pairs of coding sequence (CDS) positions in close proximity. The idea that NIPs are powerful characters is based on the assumption that both very small exons and multiple intron gains at the same position are rare. To obtain sufficient numbers of NIP character data from genomic and alignment data sets in a consistent and flexible way, the implementation of a computational pipeline was a main goal of this work. Starting from orthologous (or more general: homologous) gene datasets comprising genomic sequences and corresponding CDS transcript annotations, the multiple alignment generation is an integral part of this pipeline. The alignment can be calculated at the amino acid level utilizing external tools (e.g. transAlign) and results in a codon alignment via back-translation. Guided by the multiple alignment, the positionally homologous intron positions should become apparent when mapped individually for each transcript. The pipeline proceeds at this stage to output portions of the intron-annotated alignment that contain at least one candidate of a NIP character. In a subsequent pipeline script, these collected so-called NIP region files are finally converted to binary state characters representing valid NIPs in dependence of quality filter constraints concerning, e.g., the amino acid alignment conservation around intron loci and splice sites, to name a few. The computational pipeline tools provide the researcher to elaborate on NIP character matrices that can be used for tree inference, e.g., using the maximum parsimony approach. In a first NIP-based application, the phylogenetic position of major orders of holometabolic insects (more specifically: the Coleoptera-Hymenoptera-Mecopterida trifurcation) was evaluated in a cladistic sense. As already suggested during a study on the eIF2gamma gene based on two NIP cases (Krauss et al. 2005), the genome-scale evaluation supported Hymenoptera as sister group to an assemblage of Coleoptera and Mecopterida, in agreement with other studies, but contradicting the previously established view. As part of the genome paper describing a new species of twisted-wing parasites (Strepsiptera), the NIP method was employed to help to resolve the phylogenetic position of them within (holometabolic) insects. Together with analyses of sequence patterns and a further meta-character, it revealed twisted-wing parasites as being the closest relatives of the mega-diverse beetles. NIP-based reconstructions of the metazoan tree covering a broad selection of representative animal species also identified some weaknesses of the NIP approach that may suffer e.g. from alignment/ortholog prediction artifacts (depending on the depth of range of taxa) and systematic biases (long branch attraction artifacts, due to unequal evolutionary rates of intron gain/loss and the use of the maximum parsimony method). In a further study, the identification of NIPs within the recently diverged genus Drosophila could be utilized to characterize recent intron gain events that apparently involved several cases of intron sliding and tandem exon duplication, albeit the mechanisms of gain for the majority of cases could not be elucidated. Finally, the NIP marker could be established as a novel phylogenetic marker, in particular dedicated to complementarily explore the wealth of genome data for phylogenetic purposes and to address open questions of intron evolution
    corecore