Assembly and analysis of whole genomes is now a routine part of genetic
research, but effective tools for the visualization of whole genomes and their
alignments are few. Here we present two approaches to allow such visualizations
to be done in an efficient and user-friendly manner. These allow researchers to
spot problems and patterns in their data and present them effectively.
First, FluentDNA is developed to tackle single full genome visualization and
assembly tasks by representing nucleotides as colored pixels in a zooming
interface. This enables users to identify features without relying on algorithmic
annotation. FluentDNA also supports visualizing pairwise alignments of wellassembled whole genomes from chromosome to nucleotide resolution.
Second, Pantograph is developed to tackle the problem of visualizing variation
among large numbers of whole genome sequences. This uses a graph genome
approach, which addresses many of the technical challenges of whole genome
multiple sequence alignments by representing aligned sequences as nodes which
can be shared by many individuals. Pantograph is capable of scaling to thousands
of individuals and is applied to SARS and A. thaliana pangenomes.
Alongside the development of these new genomics tools, comparative genomic
research was undertaken on worldwide species of ash trees. I assembled 13 ash
genomes and used FluentDNA to quality check the results and discovered
contaminants and a mitochondrial integration. I annotated protein coding genes
in 28 ash assemblies and aligned their gene families. Using phylogenetic analysis,
I identified gene duplications that likely occurred in an ancient whole genome
duplication shared by all ash species. I examined the fate of these duplicated
genes, showing that losses are concentrated in a subset of gene families more
often than predicted by a null model simulation. I conclude that convergent
evolution has occurred in the loss and retention of duplicated genes in different
ash species.BBSRC BB/S004661/