4 research outputs found

    Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes

    Get PDF
    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology

    Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes.

    Get PDF
    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology

    Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

    Get PDF
    We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development

    The Simultaneous Identification of Genes in Related Species

    No full text
    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. In my dissertation, I address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or—if not—where the exon gains and losses are plausible given the species tree. The multi-species gene finding problem is formulated as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. I tested the novel approach on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that the new method is well-suited for annotation of a large number of genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C++ as part of the gene finder AUGUSTUS.Dank der Fortschritte in der DNA-Sequenzierung werden Genomprojekte immer umfangreicher und befassen sich mit der Sequenzierung ganzer Gruppen nahverwandter Spezies, sogenannte Clades. Die Annotation dieser riesigen Datenmengen stellt eine große Herausforderung dar. Computergestützte Methoden, welche ganze Clades effizient und konsistent annotieren, werden dringend benötigt. In meiner Dissertation habe ich eine neue Methode zur vergleichenden Genvorhersage entwickelt, die Protein-kodierende Gene und deren Exon-Intron-Struktur gleichzeitig in mehreren Genomen von verwandten Spezies identifiziert. Der neue Ansatz verwendet ein Alignment der Genome, welches es ermöglicht, die Ähnlichkeit von Genstrukturen in verwandten Spezies bei der Vorhersage zu berücksichtigen und die Genauigkeit der Annotationen zu verbessern. In dem Modell werden einerseits in den verschiedenen Spezies übereinstimmende Genstrukturen favorisiert, andererseits werden plausible Unterschiede wie der Verlust oder Gewinn eines Exons in Abhängigkeit von der Phylogenie zugelassen. Das vergleichende Genvorhersage Problem (CGP) lässt sich als ein Knotenlabeling-Problem in einem Graphen formulieren. Das resultierende Optimierungsproblem ist zwar NP-vollständig, dennoch können gute approximative Lösungen mithilfe eines Subgradientenverfahrens und der Technik der Dual-Zerlegung gefunden werden. Die neue CGP Methode wurde auf Genom-Alignments von 12 Wirbeltieren und 12 Drosophila Spezies getestet und die Genauigkeit für die bereits annotierten Spezies Mensch, Maus und Drosophila Melanogaster mit konkurrierenden Methoden verglichen. Die Ergebnisse legen nahe, dass CGP für die Annotation großer Clades mit vielen Spezies geeignet ist, insbesondere wenn für viele der Genome RNA-Seq Daten zur Verfügung stehen. Ein weitere Anwendung der CGP Methode ist der Transfer von Annotationen von bereits annotierten Genomen auf neu sequenzierte Genome. Bei geringen bis mittleren Abständen ist die CGP Methode genauer als bisherige Ansätze zum Annotationstransfer, welche die Proteinsequenzen direkt gegen das Zielgenom alignieren. Die neue Methode ist als Erweiterung zu dem Genvorhersageprogramm AUGUSTUS implementiert
    corecore