Abstract

The relationship between DNA sequence, biochemical function and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in non-coding regions, particularly in eukaryote genomes. In part, this is because we lack a complete description of the essential non-coding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the non-coding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3' and 5' untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary and biochemical data can provide new insights into the relationship between genome function and molecular evolution

    Similar works