Search CORE

5,688 research outputs found

Rank, select and access in grammar-compressed strings

Author: Belazzougui Djamal
Puglisi Simon J.
Tabei Yasuo
Publication venue
Publication date: 14/08/2014
Field of study

Given a string

S

of length

N

on a fixed alphabet of

\sigma

symbols, a grammar compressor produces a context-free grammar

G

of size

n

that generates

S

and only

S

. In this paper we describe data structures to support the following operations on a grammar-compressed string: \mbox{rank}_c(S,i) (return the number of occurrences of symbol

c

before position

i

S

); \mbox{select}_c(S,i) (return the position of the

i

th occurrence of

c

S

); and \mbox{access}(S,i,j) (return substring

S[i,j]

). For rank and select we describe data structures of size

O(n\sigma\log N)

bits that support the two operations in

O(\log N)

time. We propose another structure that uses

O(n\sigma\log (N/n)(\log N)^{1+\epsilon})

bits and that supports the two queries in

O(\log N/\log\log N)

, where

\epsilon>0

is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires

O(n\log N)

bits of space and

O(\log N+m/\log_\sigma N)

time to extract

m=j-i+1

consecutive symbols from

S

. Alternatively, we can achieve

O(\log N/\log\log N+m/\log_\sigma N)

query time using

O(n\log (N/n)(\log N)^{1+\epsilon})

bits of space. This matches a lower bound stated by Verbin and Yu for strings where

N

is polynomially related to

n

.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Selection from read-only memory with limited workspace

Author: A. Golynski
B. Chazelle
D.E. Knuth
G. Jacobson
G. Navarro
G.N. Frederickson
J. Pagter
J.I. Munro
J.I. Munro
J.I. Munro
M. Blum
P. Beame
R. Grossi
R. Raman
T. Asano
T.H. Cormen
T.M. Chan
V. Raman
Publication venue
Publication date: 01/01/2013
Field of study

Given an unordered array of

N

elements drawn from a totally ordered set and an integer

k

in the range from

1

N

, in the classic selection problem the task is to find the

k

-th smallest element in the array. We study the complexity of this problem in the space-restricted random-access model: The input array is stored on read-only memory, and the algorithm has access to a limited amount of workspace. We prove that the linear-time prune-and-search algorithm---presented in most textbooks on algorithms---can be modified to use

\Theta(N)

bits instead of

\Theta(N)

words of extra space. Prior to our work, the best known algorithm by Frederickson could perform the task with

\Theta(N)

bits of extra space in

O(N \lg^{*} N)

time. Our result separates the space-restricted random-access model and the multi-pass streaming model, since we can surpass the

\Omega(N \lg^{*} N)

lower bound known for the latter model. We also generalize our algorithm for the case when the size of the workspace is

\Theta(S)

bits, where

\lg^3{N} \leq S \leq N

. The running time of our generalized algorithm is

O(N \lg^{*}(N/S) + N (\lg N) / \lg{} S)

, slightly improving over the

O(N \lg^{*}(N (\lg N)/S) + N (\lg N) / \lg{} S)

bound of Frederickson's algorithm. To obtain the improvements mentioned above, we developed a new data structure, called the wavelet stack, that we use for repeated pruning. We expect the wavelet stack to be a useful tool in other applications as well.Comment: 16 pages, 1 figure, Preliminary version appeared in COCOON-201

arXiv.org e-Print Archive

Copenhagen University Research Information System

Distributed multi-agent Gaussian regression via finite-dimensional approximations

Author: Pillonetto Gianluigi
Schenato Luca
Varagnolo Damiano
Publication venue
Publication date: 10/05/2018
Field of study

We consider the problem of distributedly estimating Gaussian processes in multi-agent frameworks. Each agent collects few measurements and aims to collaboratively reconstruct a common estimate based on all data. Agents are assumed with limited computational and communication capabilities and to gather

M

noisy measurements in total on input locations independently drawn from a known common probability density. The optimal solution would require agents to exchange all the

M

input locations and measurements and then invert an

M \times M

matrix, a non-scalable task. Differently, we propose two suboptimal approaches using the first

E

orthonormal eigenfunctions obtained from the \ac{KL} expansion of the chosen kernel, where typically

E \ll M

. The benefits are that the computation and communication complexities scale with

E

and not with

M

, and computing the required statistics can be performed via standard average consensus algorithms. We obtain probabilistic non-asymptotic bounds that determine a priori the desired level of estimation accuracy, and new distributed strategies relying on Stein's unbiased risk estimate (SURE) paradigms for tuning the regularization parameters and applicable to generic basis functions (thus not necessarily kernel eigenfunctions) and that can again be implemented via average consensus. The proposed estimators and bounds are finally tested on both synthetic and real field data

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Rapid Sampling for Visualizations with Ordering Guarantees

Author: Blais Eric
Indyk Piotr
Kim Albert
Madden Sam
Parameswaran Aditya
Rubinfeld Ronitt
Publication venue
Publication date: 09/12/2014
Field of study

Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our techniques will also apply to some other visual properties. For instance, our algorithms can be used to generate an approximate visualization of a bar chart very rapidly, where the comparisons between any two bars are correct. We formally show that our sampling algorithms are generally applicable and provably optimal in theory, in that they do not take more samples than necessary to generate the visualizations with ordering guarantees. They also work well in practice, correctly ordering output groups while taking orders of magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Compact Binary Relation Representations with Rich Functionality

Author: Barbay Jérémy
Claude Francisco
Navarro Gonzalo
Publication venue
Publication date: 17/01/2012
Field of study

Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are space-efficient but also efficiently support a large subset of the desired queries.Comment: 32 page

arXiv.org e-Print Archive

CiteSeerX