3 research outputs found

    A Practical Implementation of Compressed Suffix Arrays with Applications to Self-indexing

    No full text
    Abstract: In this paper, we develop a simple and practical storage scheme for compressed suffix arrays (CSA). Our CSA can be constructed in linear time and needs 2nH k + n + o(n) bits of space simultaneously for any k c 1 and any constant c <1, where H k denotes the k-th order entropy. We compare the performance of our method with two established compressed indexing methods, viz. the FM-index and the Sad-CSA. Experiments on the Canterbury Corpus and the Pizza&Chili Corpus show significant advantages of our algorithm over two other indices in terms of compression and query time. Our storage scheme achieves better performance on all types of data present in these two corpora, except for evenly distributed data, such as DNA. The source code for our CSA is available online

    A Practical Implementation of Compressed Suffix Arrays with Applications to Self-Indexing

    No full text
    corecore