Search CORE

35 research outputs found

From Theory to Practice: Plug and Play with Succinct Data Structures

Author: F. Claude
G. Navarro
G. Navarro
J.S. Culpepper
K. Sadakane
K. Sadakane
N. Jesper Larsson
R. Grossi
S. Vigna
V. Mäkinen
Publication venue
Publication date: 05/11/2013
Field of study

Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older base- line implementations may not rely on the same basic components, and reimplementing from scratch can be very time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing succinct solutions for document retrieval.Comment: 10 pages, 4 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Pattern Matching for String Edit Distance with Moves

Author: D. Shapira
G. Navarro
J. Kececioglu
R. Clifford
S. Maruyama
V. Bafna
V.I. Levenshtein
W. Rytter
Publication venue
Publication date: 01/01/2014
Field of study

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

arXiv.org e-Print Archive

Crossref

Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes

Author: A Itai
ES Schwartz
F Claude
G Navarro
G Navarro
JI Munro
P Ferragina
RL Wessner
T Gagie
T Gagie
W Evans
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46049-9_5[Abstract] For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of σσ characters, we can store a nearly optimal alphabetic prefix-free code in o(σ)o(σ) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in O(σlogL+2ϵL)O(σlog⁡L+2ϵL) bits, where L is the maximum codeword length and ϵϵ is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of ℓℓ bits in time O(ℓ)O(ℓ) using O(σlogL)O(σlog⁡L) bits of space.Ministerio de Economía, Industria y Competitividad; TIN2013-47090-C3-3-PMinisterio de Economía, Industria y Competitividad; TIN2015-69951-RMinisterio de Economía, Industria y Competitividad; ITC-20151305Ministerio de Economía, Industria y Competitividad; ITC-20151247Xunta de Galicia; GRC2013/053Chile. Núcleo Milenio Información y Coordinación en Redes; ICM/FIC.P10-024FCOST. IC1302Academy of Finland; 268324Academy of Finland; 25034

Repositorio da Universidade da Coruña

Crossref

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Priority queues and sorting for read-only data

Author: Asano Tetsuo
Elmasry Amr
Katajainen Jyrki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Abstract. We revisit the random-access-machine model in which the input is given on a read-only random-access media, the output is to be produced to a write-only sequential-access media, and in addition there is a limited random-access workspace. The length of the input is N elements, the length of the output is limited by the computation itself, and the capacity of the workspace is O(S + w) bits, where S is a parameter specified by the user and w is the number of bits per machine word. We present a state-of-the-art priority queue-called an adjustable navigation pile-for this model. Under some reasonable assumptions, our priority queue supports minimum and insert in O(1) worst-case time and extract in O(N/S +lg S) worst-case time, where lg N ≤ S ≤ N/ lg N . We also show how to use this data structure to simplify the existing optimal O(N 2 /S + N lg S)-time sorting algorithm for this model

CiteSeerX

Copenhagen University Research Information System

Retrieval and Perfect Hashing Using Fingerprinting

Author: Müller I.
Sanders P.
Schulze R.
Zhou W.
Publication venue: Springer US
Publication date: 01/01/2014
Field of study

KITopen

Memory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls

Author: Darwish Omar
Elmasry Amr
Katajainen Jyrki
Publication venue
Publication date: 24/10/2015
Field of study

We consider space-bounded computations on a random-access machine (RAM) where the input is given on a read-only random-access medium, the output is to be produced to a write-only sequential-access medium, and the available workspace allows random reads and writes but is of limited capacity. The length of the input is

N

elements, the length of the output is limited by the computation, and the capacity of the workspace is

O(S)

bits for some predetermined parameter

S

. We present a state-of-the-art priority queue---called an adjustable navigation pile---for this restricted RAM model. Under some reasonable assumptions, our priority queue supports

\mathit{minimum}

and

\mathit{insert}

O(1)

worst-case time and

\mathit{extract}

O(N/S + \lg{} S)

worst-case time for any

S \geq \lg{} N

. We show how to use this data structure to sort

N

elements and to compute the convex hull of

N

points in the two-dimensional Euclidean space in

O(N^2/S + N \lg{} S)

worst-case time for any

S \geq \lg{} N

. Following a known lower bound for the space-time product of any branching program for finding unique elements, both our sorting and convex-hull algorithms are optimal. The adjustable navigation pile has turned out to be useful when designing other space-efficient algorithms, and we expect that it will find its way to yet other applications.Comment: 21 page

arXiv.org e-Print Archive

MPG.PuRe