Search CORE

16,057 research outputs found

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

Author: Prezza Nicola
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of encoding a string of length

n

from an integer alphabet of size

\sigma

so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take

n\log\sigma + \Theta(\log (n\log\sigma))

bits. We describe a new data structure matching this lower bound when

\sigma\leq n^{O(1)}

while supporting both queries in optimal

O(1)

time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of

\Theta(\log n)

bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in

O(n\log n)

time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in

O(n^{1.5}\sqrt{\log \sigma})

time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

The oriented swap process and last passage percolation

Author: Bisi Elia
Cunden Fabio Deelan
Gibbons Shane
Romik Dan
Publication venue
Publication date: 12/07/2021
Field of study

We present new probabilistic and combinatorial identities relating three random processes: the oriented swap process on

n

particles, the corner growth process, and the last passage percolation model. We prove one of the probabilistic identities, relating a random vector of last passage percolation times to its dual, using the duality between the Robinson-Schensted-Knuth and Burge correspondences. A second probabilistic identity, relating those two vectors to a vector of 'last swap times' in the oriented swap process, is conjectural. We give a computer-assisted proof of this identity for

n\le 6

after first reformulating it as a purely combinatorial identity, and discuss its relation to the Edelman-Greene correspondence. The conjectural identity provides precise finite-

n

and asymptotic predictions on the distribution of the absorbing time of the oriented swap process, thus conditionally solving an open problem posed by Angel, Holroyd and Romik.Comment: 36 pages, 6 figures. Full version of the FPSAC 2020 extended abstract arXiv:2003.0333

arXiv.org e-Print Archive

Distributional convergence for the number of symbol comparisons used by QuickSort

Author: Fill James Allen
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (i.i.d.) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild "tameness" condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, that is, whenever each key is generated as an infinite string of i.i.d. symbols. This is somewhat surprising; even for the classical model that each key is an i.i.d. string of unbiased ("fair") bits, the mean exhibits periodic fluctuations of order n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP866 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX