4 research outputs found
On Bijective Variants of the Burrows-Wheeler Transform
The sort transform (ST) is a modification of the Burrows-Wheeler transform
(BWT). Both transformations map an arbitrary word of length n to a pair
consisting of a word of length n and an index between 1 and n. The BWT sorts
all rotation conjugates of the input word, whereas the ST of order k only uses
the first k letters for sorting all such conjugates. If two conjugates start
with the same prefix of length k, then the indices of the rotations are used
for tie-breaking. Both transforms output the sequence of the last letters of
the sorted list and the index of the input within the sorted list. In this
paper, we discuss a bijective variant of the BWT (due to Scott), proving its
correctness and relations to other results due to Gessel and Reutenauer (1993)
and Crochemore, Desarmenien, and Perrin (2005). Further, we present a novel
bijective variant of the ST.Comment: 15 pages, presented at the Prague Stringology Conference 2009 (PSC
2009
The genomic and evolutionary analysis of floral heteromorphy in Primula
The genetic basis and evolutionary significance of floral heteromorphy in Primula has been debated for over 150 years. Charles Darwin was the first to explain the importance of the two heterostylous floral morphs, pin and thrum, suggesting that their reciprocal anther and stigma heights facilitate cross-pollination, and showing that only between morph crosses are fully compatible. This key innovation is an archetypal example of convergent evolution that serves to physically promote insect-mediated outcrossing, having evolved in over 28 angiosperm families
.
Darwin’s findings laid the foundation for an extensive number of studies into heterostyly that contributed to the establishment of modern genetic theory. The widely accepted genetic model portrays the Primula S locus, which controls heterostyly and self-incompatibility, as a coadapted group of tightly-linked genes, or supergene. It is predicted that self-fertile homostyle flowers, with anthers and stigma at the same height, arise via rare recombination events between dominant and recessive alleles in heterozygous thrums. These observations have underpinned over 60 years of research into the genetics and evolution of heterostyly.
The Primula vulgaris genome assembly and associated transcriptomic and comparative sequence analyses have facilitated the assembly and characterisation of the complete S locus in this species. Here it is revealed that thrums are hemizygous not heterozygous: the S locus contains five thrum-specific genes which are completely absent in pins, which means recombination cannot be the cause of homostyles as previously believed. The studies also reveal candidate genes in Primula veris and other species, and have facilitated an estimation for the assembly of the S locus supergene at 51.7 MYA. These findings challenge established theory, and reveal novel insight into the structure and origin of the Primula S locus, providing the foundation for understanding the evolution and breakdown of insect-mediated outcrossing in Primula and other heterostylous species
Scalable succinct indexing for large text collections
Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems