3,060 research outputs found
A Faster Implementation of Online Run-Length Burrows-Wheeler Transform
Run-length encoding Burrows-Wheeler Transformed strings, resulting in
Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive
strings. We propose a new algorithm for online RLBWT working in run-compressed
space, which runs in time and bits of space, where
is the length of input string received so far and is the number of runs
in the BWT of the reversed . We improve the state-of-the-art algorithm for
online RLBWT in terms of empirical construction time. Adopting the dynamic list
for maintaining a total order, we can replace rank queries in a dynamic wavelet
tree on a run-length compressed string by the direct comparison of labels in a
dynamic list. The empirical result for various benchmarks show the efficiency
of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201
Simple and Efficient Fully-Functional Succinct Trees
The fully-functional succinct tree representation of Navarro and Sadakane
(ACM Transactions on Algorithms, 2014) supports a large number of operations in
constant time using bits. However, the full idea is hard to
implement. Only a simplified version with operation time has been
implemented and shown to be practical and competitive. We describe a new
variant of the original idea that is much simpler to implement and has
worst-case time for the operations. An implementation based on
this version is experimentally shown to be superior to existing
implementations
From Theory to Practice: Plug and Play with Succinct Data Structures
Engineering efficient implementations of compact and succinct structures is a
time-consuming and challenging task, since there is no standard library of
easy-to- use, highly optimized, and composable components. One consequence is
that measuring the practical impact of new theoretical proposals is a difficult
task, since older base- line implementations may not rely on the same basic
components, and reimplementing from scratch can be very time-consuming. In this
paper we present a framework for experimentation with succinct data structures,
providing a large set of configurable components, together with tests,
benchmarks, and tools to analyze resource requirements. We demonstrate the
functionality of the framework by recomposing succinct solutions for document
retrieval.Comment: 10 pages, 4 figures, 3 table
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
- …