35 research outputs found
From Theory to Practice: Plug and Play with Succinct Data Structures
Engineering efficient implementations of compact and succinct structures is a
time-consuming and challenging task, since there is no standard library of
easy-to- use, highly optimized, and composable components. One consequence is
that measuring the practical impact of new theoretical proposals is a difficult
task, since older base- line implementations may not rely on the same basic
components, and reimplementing from scratch can be very time-consuming. In this
paper we present a framework for experimentation with succinct data structures,
providing a large set of configurable components, together with tests,
benchmarks, and tools to analyze resource requirements. We demonstrate the
functionality of the framework by recomposing succinct solutions for document
retrieval.Comment: 10 pages, 4 figures, 3 table
Online Pattern Matching for String Edit Distance with Moves
Edit distance with moves (EDM) is a string-to-string distance measure that
includes substring moves in addition to ordinal editing operations to turn one
string to the other. Although optimizing EDM is intractable, it has many
applications especially in error detections. Edit sensitive parsing (ESP) is an
efficient parsing algorithm that guarantees an upper bound of parsing
discrepancies between different appearances of the same substrings in a string.
ESP can be used for computing an approximate EDM as the L1 distance between
characteristic vectors built by node labels in parsing trees. However, ESP is
not applicable to a streaming text data where a whole text is unknown in
advance. We present an online ESP (OESP) that enables an online pattern
matching for EDM. OESP builds a parse tree for a streaming text and computes
the L1 distance between characteristic vectors in an online manner. For the
space-efficient computation of EDM, OESP directly encodes the parse tree into a
succinct representation by leveraging the idea behind recent results of a
dynamic succinct tree. We experimentally test OESP on the ability to compute
EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International
Symposium on String Processing and Information Retrieval (SPIRE2014
Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46049-9_5[Abstract] For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of σσ characters, we can store a nearly optimal alphabetic prefix-free code in o(σ)o(σ) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in O(σlogL+2ϵL)O(σlogL+2ϵL) bits, where L is the maximum codeword length and ϵϵ is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of ℓℓ bits in time O(ℓ)O(ℓ) using O(σlogL)O(σlogL) bits of space.Ministerio de Economía, Industria y Competitividad; TIN2013-47090-C3-3-PMinisterio de Economía, Industria y Competitividad; TIN2015-69951-RMinisterio de Economía, Industria y Competitividad; ITC-20151305Ministerio de Economía, Industria y Competitividad; ITC-20151247Xunta de Galicia; GRC2013/053Chile. Núcleo Milenio Información y Coordinación en Redes; ICM/FIC.P10-024FCOST. IC1302Academy of Finland; 268324Academy of Finland; 25034
Priority queues and sorting for read-only data
Abstract. We revisit the random-access-machine model in which the input is given on a read-only random-access media, the output is to be produced to a write-only sequential-access media, and in addition there is a limited random-access workspace. The length of the input is N elements, the length of the output is limited by the computation itself, and the capacity of the workspace is O(S + w) bits, where S is a parameter specified by the user and w is the number of bits per machine word. We present a state-of-the-art priority queue-called an adjustable navigation pile-for this model. Under some reasonable assumptions, our priority queue supports minimum and insert in O(1) worst-case time and extract in O(N/S +lg S) worst-case time, where lg N ≤ S ≤ N/ lg N . We also show how to use this data structure to simplify the existing optimal O(N 2 /S + N lg S)-time sorting algorithm for this model
Memory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls
We consider space-bounded computations on a random-access machine (RAM) where
the input is given on a read-only random-access medium, the output is to be
produced to a write-only sequential-access medium, and the available workspace
allows random reads and writes but is of limited capacity. The length of the
input is elements, the length of the output is limited by the computation,
and the capacity of the workspace is bits for some predetermined
parameter . We present a state-of-the-art priority queue---called an
adjustable navigation pile---for this restricted RAM model. Under some
reasonable assumptions, our priority queue supports and
in worst-case time and in worst-case time for any . We show how to use this
data structure to sort elements and to compute the convex hull of
points in the two-dimensional Euclidean space in
worst-case time for any . Following a known lower bound for the
space-time product of any branching program for finding unique elements, both
our sorting and convex-hull algorithms are optimal. The adjustable navigation
pile has turned out to be useful when designing other space-efficient
algorithms, and we expect that it will find its way to yet other applications.Comment: 21 page