4 research outputs found

    On Bijective Variants of the Burrows-Wheeler Transform

    Full text link
    The sort transform (ST) is a modification of the Burrows-Wheeler transform (BWT). Both transformations map an arbitrary word of length n to a pair consisting of a word of length n and an index between 1 and n. The BWT sorts all rotation conjugates of the input word, whereas the ST of order k only uses the first k letters for sorting all such conjugates. If two conjugates start with the same prefix of length k, then the indices of the rotations are used for tie-breaking. Both transforms output the sequence of the last letters of the sorted list and the index of the input within the sorted list. In this paper, we discuss a bijective variant of the BWT (due to Scott), proving its correctness and relations to other results due to Gessel and Reutenauer (1993) and Crochemore, Desarmenien, and Perrin (2005). Further, we present a novel bijective variant of the ST.Comment: 15 pages, presented at the Prague Stringology Conference 2009 (PSC 2009

    The genomic and evolutionary analysis of floral heteromorphy in Primula

    Get PDF
    The genetic basis and evolutionary significance of floral heteromorphy in Primula has been debated for over 150 years. Charles Darwin was the first to explain the importance of the two heterostylous floral morphs, pin and thrum, suggesting that their reciprocal anther and stigma heights facilitate cross-pollination, and showing that only between morph crosses are fully compatible. This key innovation is an archetypal example of convergent evolution that serves to physically promote insect-mediated outcrossing, having evolved in over 28 angiosperm families . Darwin’s findings laid the foundation for an extensive number of studies into heterostyly that contributed to the establishment of modern genetic theory. The widely accepted genetic model portrays the Primula S locus, which controls heterostyly and self-incompatibility, as a coadapted group of tightly-linked genes, or supergene. It is predicted that self-fertile homostyle flowers, with anthers and stigma at the same height, arise via rare recombination events between dominant and recessive alleles in heterozygous thrums. These observations have underpinned over 60 years of research into the genetics and evolution of heterostyly. The Primula vulgaris genome assembly and associated transcriptomic and comparative sequence analyses have facilitated the assembly and characterisation of the complete S locus in this species. Here it is revealed that thrums are hemizygous not heterozygous: the S locus contains five thrum-specific genes which are completely absent in pins, which means recombination cannot be the cause of homostyles as previously believed. The studies also reveal candidate genes in Primula veris and other species, and have facilitated an estimation for the assembly of the S locus supergene at 51.7 MYA. These findings challenge established theory, and reveal novel insight into the structure and origin of the Primula S locus, providing the foundation for understanding the evolution and breakdown of insect-mediated outcrossing in Primula and other heterostylous species

    Scalable succinct indexing for large text collections

    Get PDF
    Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems

    Efficient Algorithms for the Inverse Sort Transform

    No full text
    corecore