Search CORE

86,607 research outputs found

Space-Efficient Re-Pair Compression

Author: Bille Philip
Gørtz Inge Li
Prezza Nicola
Publication venue
Publication date: 04/11/2016
Field of study

Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let

n

\sigma

, and

d

be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and

5n + 4\sigma^2 + 4d + \sqrt{n}

words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of

\lceil\log_2 n\rceil

bits and a re-writable input text composed by

n

such words. Our first algorithm runs in expected

\mathcal O(n/\epsilon)

time and uses

(1+\epsilon)n +\sqrt n

words of space on top of the text for any parameter

0<\epsilon \leq 1

chosen in advance. Our second algorithm runs in expected

\mathcal O(n\log n)

time and improves the space to

n +\sqrt n

words

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology

A Grammar Compression Algorithm based on Induced Suffix Sorting

Author: Ayala-Rincón Mauricio
Gog Simon
Louza Felipe A.
Navarro Gonzalo
Nunes Daniel Saad Nogueira
Publication venue
Publication date: 08/11/2017
Field of study

We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, introduced by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its correspondent factor. The resulting grammar is encoded by exploring some redundancies, such as common prefixes between suffix rules, which are sorted according to SAIS framework. When compared to well-known compression tools such as Re-Pair and 7-zip, our algorithm is competitive and very effective at handling repetitive string regarding compression ratio, compression and decompression running time

arXiv.org e-Print Archive

Crossref

Repositorio Académico de la Universidad de Chile

Universal Indexes for Highly Repetitive Document Collections

Author: Claude Francisco
Fariña Antonio
Martínez-Prieto Miguel A.
Navarro Gonzalo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We introduce new techniques for compressing inverted indexes that exploit this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar compression of the differential inverted lists, instead of the usual practice of gap-encoding them. We show that, in this highly repetitive setting, our compression methods significantly reduce the space obtained with classical techniques, at the price of moderate slowdowns. Moreover, our best methods are universal, that is, they do not need to know the versioning structure of the collection, nor that a clear versioning structure even exists. We also introduce compressed self-indexes in the comparison. These are designed for general strings (not only natural language texts) and represent the text collection plus the index structure (not an inverted index) in integrated form. We show that these techniques can compress much further, using a small fraction of the space required by our new inverted indexes. Yet, they are orders of magnitude slower.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Académico de la Universidad de Chile

Re-Pair Compression of Inverted Lists

Author: Claude Francisco
Farina Antonio
Navarro Gonzalo
Publication venue
Publication date: 01/01/2009
Field of study

Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers. In this paper we explore a completely different alternative: We use Re-Pair compression of those differences. While Re-Pair by itself offers fast decompression at arbitrary positions in main and secondary memory, we introduce variants that in addition speed up the operations required for inverted list intersection. We compare the resulting data structures with several recent proposals under various list intersection algorithms, to conclude that our Re-Pair variants offer an interesting time/space tradeoff for this problem, yet further improvements are required for it to improve upon the state of the art

arXiv.org e-Print Archive

CiteSeerX

Repositorio Académico de la Universidad de Chile

GraCT: A Grammar based Compressed representation of Trajectories

Author: E Knuth
JI Munro
M Hadjieleftheriou
M Vazirgiannis
MF Worboys
N Brisaboa
NJ Larsson
NR Brisaboa
NR Brisaboa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/09/2016
Field of study

We present a compressed data structure to store free trajectories of moving objects (ships over the sea, for example) allowing spatio-temporal queries. Our method, GraCT, uses a

k^2

-tree to store the absolute positions of all objects at regular time intervals (snapshots), whereas the positions between snapshots are represented as logs of relative movements compressed with Re-Pair. Our experimental evaluation shows important savings in space and time with respect to a fair baseline.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Crossref

Scipedia

Occam's Quantum Strop: Synchronizing and Compressing Classical Cryptic Processes via a Quantum Channel

Author: Aghamohammadi C.
Crutchfield J. P.
Mahoney J. R.
Publication venue
Publication date: 11/08/2015
Field of study

A stochastic process's statistical complexity stands out as a fundamental property: the minimum information required to synchronize one process generator to another. How much information is required, though, when synchronizing over a quantum channel? Recent work demonstrated that representing causal similarity as quantum state-indistinguishability provides a quantum advantage. We generalize this to synchronization and offer a sequence of constructions that exploit extended causal structures, finding substantial increase of the quantum advantage. We demonstrate that maximum compression is determined by the process's cryptic order---a classical, topological property closely allied to Markov order, itself a measure of historical dependence. We introduce an efficient algorithm that computes the quantum advantage and close noting that the advantage comes at a cost---one trades off prediction for generation complexity.Comment: 10 pages, 6 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/oqs.ht

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California