Search CORE

134,006 research outputs found

The genealogy of self-similar fragmentations with negative index as a continuum random tree

Author: Haas Benedicte
Miermont Gregory
Publication venue
Publication date: 01/01/2003
Field of study

We encode a certain class of stochastic fragmentation processes, namely self-similar fragmentation processes with a negative index of self-similarity, into a metric family tree which belongs to the family of Continuum Random Trees of Aldous. When the splitting times of the fragmentation are dense near 0, the tree can in turn be encoded into a continuous height function, just as the Brownian Continuum Random Tree is encoded in a normalized Brownian excursion. Under mild hypotheses, we then compute the Hausdorff dimensions of these trees, and the maximal H\"older exponents of the height functions

arXiv.org e-Print Archive

CiteSeerX

Base de publications de l'université Paris-Dauphine

Hal-Diderot

Universal Compressed Text Indexing

Author: Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 06/09/2018
Field of study

The rise of repetitive datasets has lately generated a lot of interest in compressed self-indexes based on dictionary compression, a rich and heterogeneous family that exploits text repetitions in different ways. For each such compression scheme, several different indexing solutions have been proposed in the last two decades. To date, the fastest indexes for repetitive texts are based on the run-length compressed Burrows-Wheeler transform and on the Compact Directed Acyclic Word Graph. The most space-efficient indexes, on the other hand, are based on the Lempel-Ziv parsing and on grammar compression. Indexes for more universal schemes such as collage systems and macro schemes have not yet been proposed. Very recently, Kempa and Prezza [STOC 2018] showed that all dictionary compressors can be interpreted as approximation algorithms for the smallest string attractor, that is, a set of text positions capturing all distinct substrings. Starting from this observation, in this paper we develop the first universal compressed self-index, that is, the first indexing data structure based on string attractors, which can therefore be built on top of any dictionary-compressed text representation. Let

\gamma

be the size of a string attractor for a text of length

n

. Our index takes

O(\gamma\log(n/\gamma))

words of space and supports locating the

occ

occurrences of any pattern of length

m

O(m\log n + occ\log^{\epsilon}n)

time, for any constant

\epsilon>0

. This is, in particular, the first index for general macro schemes and collage systems. Our result shows that the relation between indexing and compression is much deeper than what was previously thought: the simple property standing at the core of all dictionary compressors is sufficient to support fast indexed queries.Comment: Fixed with reviewer's comment

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Repositorio Académico de la Universidad de Chile

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space

Author: Gagie Travis
Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 04/07/2019
Field of study

Indexing highly repetitive texts - such as genomic databases, software repositories and versioned text collections - has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of r. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time, O(m + occ), within O(r log log w ({\sigma} + n/r)) space, for a text of length n over an alphabet of size {\sigma} on a RAM machine with words of w = {\Omega}(log n) bits. Within that space, our index can also count in optimal time, O(m). Multiplying the space by O(w/ log {\sigma}), we support count and locate in O(dm log({\sigma})/we) and O(dm log({\sigma})/we + occ) time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using O(r log(n/r)) space that replaces the text and extracts any text substring of length ` in almost-optimal time O(log(n/r) + ` log({\sigma})/w). Within that space, we similarly provide direct access to suffix array, inverse suffix array, and longest common prefix array cells, and extend these capabilities to full suffix tree functionality, typically in O(log(n/r)) time per operation.Comment: submitted version; optimal count and locate in smaller space: O(r log log_w(n/r + sigma)

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Optimal-Time Text Indexing in BWT-runs Bounded Space

Author: Gagie Travis
Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 11/07/2017
Field of study

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is

r

, the number of runs in their Burrows-Wheeler Transform (BWT). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used

O(r)

space and was able to efficiently count the number of occurrences of a pattern of length

m

in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of

r

. Since then, a number of other indexes with space bounded by other measures of repetitiveness --- the number of phrases in the Lempel-Ziv parse, the size of the smallest grammar generating the text, the size of the smallest automaton recognizing the text factors --- have been proposed for efficiently locating, but not directly counting, the occurrences of a pattern. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the

occ

occurrences efficiently within

O(r)

space (in loglogarithmic time each), and reaching optimal time

O(m+occ)

within

O(r\log(n/r))

space, on a RAM machine of

w=\Omega(\log n)

bits. Within

O(r\log (n/r))

space, our index can also count in optimal time

O(m)

. Raising the space to

O(r w\log_\sigma(n/r))

, we support count and locate in

O(m\log(\sigma)/w)

and

O(m\log(\sigma)/w+occ)

time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using

O(r\log(n/r))

space that replaces the text and extracts any text substring of length

\ell

in almost-optimal time

O(\log(n/r)+\ell\log(\sigma)/w)

. (...continues...

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology

General Fragmentation Trees

Author: Stephenson Robin
Publication venue
Publication date: 01/01/2013
Field of study

We show that the genealogy of any self-similar fragmentation process can be encoded in a compact measured real tree. Under some Malthusian hypotheses, we compute the fractal Hausdorff dimension of this tree through the use of a natural measure on the set of its leaves. This generalizes previous work of Haas and Miermont which was restricted to conservative fragmentation processes

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Compressed Text Indexes:From Theory to Practice!

Author: Ferragina Paolo
Gonzalez Rodrigo
Navarro Gonzalo
Venturini Rossano
Publication venue
Publication date: 01/01/2007
Field of study

A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this technology has matured up to a point where theoretical research is giving way to practical developments. Nonetheless this requires significant programming skills, a deep engineering effort, and a strong algorithmic background to dig into the research results. To date only isolated implementations and focused comparisons of compressed indexes have been reported, and they missed a common API, which prevented their re-use or deployment within other applications. The goal of this paper is to fill this gap. First, we present the existing implementations of compressed indexes from a practitioner's point of view. Second, we introduce the Pizza&Chili site, which offers tuned implementations and a standardized API for the most successful compressed full-text self-indexes, together with effective testbeds and scripts for their automatic validation and test. Third, we show the results of our extensive experiments on these codes with the aim of demonstrating the practical relevance of this novel and exciting technology

arXiv.org e-Print Archive

CiteSeerX

Archivio della Ricerca - Università di Pisa

LRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations

Author: Barbay Jérémy
Fischer Johannes
Publication venue
Publication date: 29/09/2010
Field of study

LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data structures to algorithms: some compressed succinct indices for range minimum queries; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and indirect application in time all the shortest as the permutation is compressible.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

On the exponential functional of Markov Additive Processes, and applications to multi-type self-similar fragmentation processes and trees

Author: Stephenson Robin
Publication venue
Publication date: 01/01/2018
Field of study

A Markov Additive Process is a bi-variate Markov process

(\xi,J)=\big((\xi_t,J_t),t\geq0\big)

which should be thought of as a multi-type L\'evy process: the second component

J

is a Markov chain on a finite space

\{1,\ldots,K\}

, and the first component

\xi

behaves locally as a L\'evy process, with local dynamics depending on

J

. In the subordinator-like case where

\xi

is nondecreasing, we establish several results concerning the moments of

\xi

and of its exponential functional

I_{\xi}=\int_{0}^{\infty} e^{-\xi_t}\mathrm dt,

extending the work of Carmona et al., and Bertoin and Yor. We then apply these results to the study of multi-type self-similar fragmentation processes: these are self-similar analogues of Bertoin's homogeneous multi-type fragmentation processes Notably, we encode the genealogy of the process in a tree, and under some Malthusian hypotheses, compute its Hausdorff dimension in a generalisation of our previous work.Comment: Minor corrections and typo

arXiv.org e-Print Archive

Oxford University Research Archive

Orderly Spanning Trees with Applications

Author: Ching-Chi Lin
Hsueh-I Lu
Trotter William
Yi-Ting Chiang
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/07/2002
Field of study

We introduce and study the {\em orderly spanning trees} of plane graphs. This algorithmic tool generalizes {\em canonical orderings}, which exist only for triconnected plane graphs. Although not every plane graph admits an orderly spanning tree, we provide an algorithm to compute an {\em orderly pair} for any connected planar graph

G

, consisting of a plane graph

H

G

, and an orderly spanning tree of

H

. We also present several applications of orderly spanning trees: (1) a new constructive proof for Schnyder's Realizer Theorem, (2) the first area-optimal 2-visibility drawing of

G

, and (3) the best known encodings of

G

with O(1)-time query support. All algorithms in this paper run in linear time.Comment: 25 pages, 7 figures, A preliminary version appeared in Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2001), Washington D.C., USA, January 7-9, 2001, pp. 506-51

arXiv.org e-Print Archive

Crossref

National Taiwan University Repository