3,625 research outputs found
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
Labeling Schemes for Bounded Degree Graphs
We investigate adjacency labeling schemes for graphs of bounded degree
. In particular, we present an optimal (up to an additive
constant) adjacency labeling scheme for bounded degree trees.
The latter scheme is derived from a labeling scheme for bounded degree
outerplanar graphs. Our results complement a similar bound recently obtained
for bounded depth trees [Fraigniaud and Korman, SODA 10], and may provide new
insights for closing the long standing gap for adjacency in trees [Alstrup and
Rauhe, FOCS 02]. We also provide improved labeling schemes for bounded degree
planar graphs. Finally, we use combinatorial number systems and present an
improved adjacency labeling schemes for graphs of bounded degree with
Efficient pebbling for list traversal synopses
We show how to support efficient back traversal in a unidirectional list,
using small memory and with essentially no slowdown in forward steps. Using
memory for a list of size , the 'th back-step from the
farthest point reached so far takes time in the worst case, while
the overhead per forward step is at most for arbitrary small
constant . An arbitrary sequence of forward and back steps is
allowed. A full trade-off between memory usage and time per back-step is
presented: vs. and vice versa. Our algorithms are based on a
novel pebbling technique which moves pebbles on a virtual binary, or -ary,
tree that can only be traversed in a pre-order fashion. The compact data
structures used by the pebbling algorithms, called list traversal synopses,
extend to general directed graphs, and have other interesting applications,
including memory efficient hash-chain implementation. Perhaps the most
surprising application is in showing that for any program, arbitrary rollback
steps can be efficiently supported with small overhead in memory, and marginal
overhead in its ordinary execution. More concretely: Let be a program that
runs for at most steps, using memory of size . Then, at the cost of
recording the input used by the program, and increasing the memory by a factor
of to , the program can be extended to support an
arbitrary sequence of forward execution and rollback steps: the 'th rollback
step takes time in the worst case, while forward steps take O(1)
time in the worst case, and amortized time per step.Comment: 27 page
Succinct Representations of Dynamic Strings
The rank and select operations over a string of length n from an alphabet of
size have been used widely in the design of succinct data structures.
In many applications, the string itself need be maintained dynamically,
allowing characters of the string to be inserted and deleted. Under the word
RAM model with word size , we design a succinct representation
of dynamic strings using bits to support rank,
select, insert and delete in time. When the alphabet size is small, i.e. when \sigma = O(\polylog
(n)), including the case in which the string is a bit vector, these operations
are supported in time. Our data structures are more
efficient than previous results on the same problem, and we have applied them
to improve results on the design and construction of space-efficient text
indexes
- …