10 research outputs found

    A genomic survey of transposable elements in the choanoflagellate Salpingoeca rosetta reveals selection on codon usage

    Get PDF
    Abstract: Background: Unicellular species make up the majority of eukaryotic diversity, however most studies on transposable elements (TEs) have centred on multicellular host species. Such studies may have therefore provided a limited picture of how transposable elements evolve across eukaryotes. The choanoflagellates, as the sister group to Metazoa, are an important study group for investigating unicellular to multicellular transitions. A previous survey of the choanoflagellate Monosiga brevicollis revealed the presence of only three families of LTR retrotransposons, all of which appeared to be active. Salpingoeca rosetta is the second choanoflagellate to have its whole genome sequenced and provides further insight into the evolution and population biology of transposable elements in the closest relative of metazoans. Results: Screening the genome revealed the presence of a minimum of 20 TE families. Seven of the annotated families are DNA transposons and the remaining 13 families are LTR retrotransposons. Evidence for two putative non-LTR retrotransposons was also uncovered, but full-length sequences could not be determined. Superfamily phylogenetic trees indicate that vertical inheritance and, in the case of one family, horizontal transfer have been involved in the evolution of the choanoflagellates TEs. Phylogenetic analyses of individual families highlight recent element activity in the genome, however six families did not show evidence of current transposition. The majority of families possess young insertions and the expression levels of TE genes vary by four orders of magnitude across families. In contrast to previous studies on TEs, the families present in S. rosetta show the signature of selection on codon usage, with families favouring codons that are adapted to the host translational machinery. Selection is stronger in LTR retrotransposons than DNA transposons, with highly expressed families showing stronger codon usage bias. Mutation pressure towards guanosine and cytosine also appears to contribute to TE codon usage. Conclusions: S. rosetta increases the known diversity of choanoflagellate TEs and the complement further highlights the role of horizontal gene transfer from prey species in choanoflagellate genome evolution. Unlike previously studied TEs, the S. rosetta families show evidence for selection on their codon usage, which is shown to act via translational efficiency and translational accuracy

    Comparative Genomics and the Evolution of Transposable Elements in Unicellular Eukaryotes

    Get PDF
    Background Transposable elements are mobile DNA sequences, which are ubiquitous in the majority of eukaryotic genomes. Unicellular eukaryotes have limited research on transposable elements and therefore the picture of evolution is far from conclusive. Similarly, codon usage bias, the frequency of synonymous codons present in a host species coding DNA, has been focused on multicellular organisms, with no clear explanation of the evolutionary pressures that drive bias in unicellular eukaryotic species. Methods Eight Kazachstania budding yeast species, and choanoflagellate species, Salpingoeca rosetta, were screened for the presence of mobile elements, with use of homology based methods. Protein and nucleotide phylogenies were constructed to review ancestral patterns and similarity across superfamilies. Codon usage statistics were employed to review patterns of bias in the host genes and mobile elements of the Kazachstania species, and S.rosetta, as well as two additional holozoan species, Monosiga brevicollis and Capsaspora owczarzaki. Results A diverse repetoire of transposable element families were uncovered in the species reviewed. A complete absence of DNA transposons was found in the Kazachstania species, however both classes of elements were uncovered in S. rosetta. Element phylogenies indicated vertical transfer for the majority of families, with the exception of one family in S. rosetta, which suggested acquisition by horizontal transfer. Patterns of codon usage were revealed in the genus Kazachstania and conservation was seen in the three holozoan species, with similar trends observed in the majority of host species mobile elements. Conclusions The known diversity of TE families for the yeast superfamily, and Choanoflagellatea has increased as a result of the study presented here. Codon usage bias for host genes and mobile elements provided evidence of selection, as well as mutational bias, suggesting that models of evolutionary pressures are more complex in unicellular eukaryotes

    Differential evolution of non-coding DNA across eukaryotes and its close relationship with complex multicellularity on Earth

    Get PDF
    Here, I elaborate on the hypothesis that complex multicellularity (CM, sensu Knoll) is a major evolutionary transition (sensu Szathmary), which has convergently evolved a few times in Eukarya only: within red and brown algae, plants, animals, and fungi. Paradoxically, CM seems to correlate with the expansion of non-coding DNA (ncDNA) in the genome rather than with genome size or the total number of genes. Thus, I investigated the correlation between genome and organismal complexities across 461 eukaryotes under a phylogenetically controlled framework. To that end, I introduce the first formal definitions and criteria to distinguish ‘unicellularity’, ‘simple’ (SM) and ‘complex’ multicellularity. Rather than using the limited available estimations of unique cell types, the 461 species were classified according to our criteria by reviewing their life cycle and body plan development from literature. Then, I investigated the evolutionary association between genome size and 35 genome-wide features (introns and exons from protein-coding genes, repeats and intergenic regions) describing the coding and ncDNA complexities of the 461 genomes. To that end, I developed ‘GenomeContent’, a program that systematically retrieves massive multidimensional datasets from gene annotations and calculates over 100 genome-wide statistics. R-scripts coupled to parallel computing were created to calculate >260,000 phylogenetic controlled pairwise correlations. As previously reported, both repetitive and non-repetitive DNA are found to be scaling strongly and positively with genome size across most eukaryotic lineages. Contrasting previous studies, I demonstrate that changes in the length and repeat composition of introns are only weakly or moderately associated with changes in genome size at the global phylogenetic scale, while changes in intron abundance (within and across genes) are either not or only very weakly associated with changes in genome size. Our evolutionary correlations are robust to: different phylogenetic regression methods, uncertainties in the tree of eukaryotes, variations in genome size estimates, and randomly reduced datasets. Then, I investigated the correlation between the 35 genome-wide features and the cellular complexity of the 461 eukaryotes with phylogenetic Principal Component Analyses. Our results endorse a genetic distinction between SM and CM in Archaeplastida and Metazoa, but not so clearly in Fungi. Remarkably, complex multicellular organisms and their closest ancestral relatives are characterized by high intron-richness, regardless of genome size. Finally, I argue why and how a vast expansion of non-coding RNA (ncRNA) regulators rather than of novel protein regulators can promote the emergence of CM in Eukarya. As a proof of concept, I co-developed a novel ‘ceRNA-motif pipeline’ for the prediction of “competing endogenous” ncRNAs (ceRNAs) that regulate microRNAs in plants. We identified three candidate ceRNAs motifs: MIM166, MIM171 and MIM159/319, which were found to be conserved across land plants and be potentially involved in diverse developmental processes and stress responses. Collectively, the findings of this dissertation support our hypothesis that CM on Earth is a major evolutionary transition promoted by the expansion of two major ncDNA classes, introns and regulatory ncRNAs, which might have boosted the irreversible commitment of cell types in certain lineages by canalizing the timing and kinetics of the eukaryotic transcriptome.:Cover page Abstract Acknowledgements Index 1. The structure of this thesis 1.1. Structure of this PhD dissertation 1.2. Publications of this PhD dissertation 1.3. Computational infrastructure and resources 1.4. Disclosure of financial support and information use 1.5. Acknowledgements 1.6. Author contributions and use of impersonal and personal pronouns 2. Biological background 2.1. The complexity of the eukaryotic genome 2.2. The problem of counting and defining “genes” in eukaryotes 2.3. The “function” concept for genes and “dark matter” 2.4. Increases of organismal complexity on Earth through multicellularity 2.5. Multicellularity is a “fitness transition” in individuality 2.6. The complexity of cell differentiation in multicellularity 3. Technical background 3.1. The Phylogenetic Comparative Method (PCM) 3.2. RNA secondary structure prediction 3.3. Some standards for genome and gene annotation 4. What is in a eukaryotic genome? GenomeContent provides a good answer 4.1. Background 4.2. Motivation: an interoperable tool for data retrieval of gene annotations 4.3. Methods 4.4. Results 4.5. Discussion 5. The evolutionary correlation between genome size and ncDNA 5.1. Background 5.2. Motivation: estimating the relationship between genome size and ncDNA 5.3. Methods 5.4. Results 5.5. Discussion 6. The relationship between non-coding DNA and Complex Multicellularity 6.1. Background 6.2. Motivation: How to define and measure complex multicellularity across eukaryotes? 6.3. Methods 6.4. Results 6.5. Discussion 7. The ceRNA motif pipeline: regulation of microRNAs by target mimics 7.1. Background 7.2. A revisited protocol for the computational analysis of Target Mimics 7.3. Motivation: a novel pipeline for ceRNA motif discovery 7.4. Methods 7.5. Results 7.6. Discussion 8. Conclusions and outlook 8.1. Contributions and lessons for the bioinformatics of large-scale comparative analyses 8.2. Intron features are evolutionarily decoupled among themselves and from genome size throughout Eukarya 8.3. “Complex multicellularity” is a major evolutionary transition 8.4. Role of RNA throughout the evolution of life and complex multicellularity on Earth 9. Supplementary Data Bibliography Curriculum Scientiae Selbständigkeitserklärung (declaration of authorship

    An investigation into the contribution of gene remodeling to protein coding gene family evolution across the Metazoa

    Get PDF
    This thesis explores gene evolution throughout the history of Metazoa. This group of multicellular organisms represents a wide range of diversity, embodied by the number of species, the multitude of morphological and developmental traits, and the complexity within the genetic elements dictating these traits. Although a significant amount of research has been carried out on gene family emergence and expansion in animal genomes, comparatively little research has been published on how these genes are formed. Of specific interest here is the role of complex, reticulated mechanisms of gene evolution in forming new genes, these include processes such as gene fusion and fission - hereafter referred to as gene remodeling. Current methods of gene family prediction are not sensitive to these mechanisms of gene evolution. We apply both network and phylogenetic models to characterise the traits and role of gene remodeling across Metazoa and take a data focused approach to attempt to resolve remaining issues within the animal tree of life (AToL). ​In Chapter 2 we took a novel network approach to quantify the contribution of gene remodeling events to novel protein coding gene family evolution in the animal tree of life. Using graph theory we analysed the partial homology shared between a set of animal proteomes spanning most major clades, and we placed these gene remodeling events onto the species tree. In addition to this, in Chapter 3 we sought to assess the phylogenetic properties of these events and their ability to reconstruct AToL, ultimately our aim was to determine if gene fusions could be deployed to resolve contentious regions within AToL. As a consilient approach is most desirable in phylogeny reconstruction, in Chapter 4 we examine the potential for resolving AToL using a combination of data types, i.e. gene fusions and previously published phylogenomic datasets. Specifically, we examined potential issues in annotation of homology and orthology within previously published animal phylogenomic datasets and focussed on determining what impact inaccurate definitions of orthology have had on resolving difficult or contentious parts of AToL

    Horizontal gene transfer in the sponge Amphimedon queenslandica

    Get PDF
    corecore