Search CORE

17,785 research outputs found

Efficient LZ78 factorization of grammar compressed text

Author: A. Amir
A. Jeż
E. Ukkonen
E.M. McCreight
J. Jansson
J. Westbrook
J. Ziv
J. Ziv
K. Goto
K. Goto
M. Crochemore
M. Li
M. Li
M.A. Bender
O. Berkman
P. Weiner
R. Cilibrasi
T. Kida
V. Freschi
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size

n

representing a text

S

of length

N

, our algorithm computes the LZ78 factorization of

T

O(n\sqrt{N}+m\log N)

time and

O(n\sqrt{N}+m)

space, where

m

is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the

n\sqrt{N}

term in the time and space complexities becomes either

nL

, where

L

is the length of the longest LZ78 factor, or

(N - \alpha)

where

\alpha \geq 0

is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of

S

of a certain length. Since

m = O(N/\log_\sigma N)

where

\sigma

is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when

\sigma

is constant, and can be more efficient when the text is compressible, i.e. when

m

and

n

are small.Comment: SPIRE 201

arXiv.org e-Print Archive

Crossref

Generalizing input-driven languages: theoretical and practical benefits

Author: Mandrioli Dino
Pradella Matteo
Publication venue
Publication date: 02/05/2017
Field of study

Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Recommended from our members

A constraint based structure description language for Biosequences

Author: Eidhammer I
Gilbert D
Grindhaug SH
Jonassen J
Ratnayake R
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

Brunel University Research Archive

Equational reasoning with context-free families of string diagrams

Author: A Joyal
A Schürr
B Coecke
B Coecke
B Coecke
B Coecke
G Rozenberg
G Rozenberg
L Dixon
P Sobociński
R Duncan
TW Pratt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/10/2015
Field of study

String diagrams provide an intuitive language for expressing networks of interacting processes graphically. A discrete representation of string diagrams, called string graphs, allows for mechanised equational reasoning by double-pushout rewriting. However, one often wishes to express not just single equations, but entire families of equations between diagrams of arbitrary size. To do this we define a class of context-free grammars, called B-ESG grammars, that are suitable for defining entire families of string graphs, and crucially, of string graph rewrite rules. We show that the language-membership and match-enumeration problems are decidable for these grammars, and hence that there is an algorithm for rewriting string graphs according to B-ESG rewrite patterns. We also show that it is possible to reason at the level of grammars by providing a simple method for transforming a grammar by string graph rewriting, and showing admissibility of the induced B-ESG rewrite pattern.Comment: International Conference on Graph Transformation, ICGT 2015. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21145-9_

arXiv.org e-Print Archive

Crossref

Transport on percolation clusters with power-law distributed bond strengths: when do blobs matter?

Author: A. Benmizrahi
A. Coniglio
A. Goldberg
A. Skal
B. Halperin
C. Lobb
C. Moukarzel
Cristian F. Moukarzel
G. Deutscher
H.E. Stanley
J. Gordon
J. Hovi
J. Machta
J. Machta
J. Rhyner
J.P. Straley
M. Hernandez
M. Octavio
M. Octavio
M. Sahimi
Mikko Alava
O. Stenull
P. Kogut
P. Ledoussal
P.G. Degennes
R. Mulet
S. Feng
T. Lubensky
V. Ambegaokar
Y. Kantor
Publication venue: 'American Physical Society (APS)'
Publication date: 05/12/2002
Field of study

The simplest transport problem, namely maxflow, is investigated on critical percolation clusters in two and three dimensions, using a combination of extremal statistics arguments and exact numerical computations, for power-law distributed bond strengths of the type

P(\sigma) \sim \sigma^{-\alpha}

. Assuming that only cutting bonds determine the flow, the maxflow critical exponent \ve is found to be \ve(\alpha)=(d-1) \nu + 1/(1-\alpha). This prediction is confirmed with excellent accuracy using large-scale numerical simulation in two and three dimensions. However, in the region of anomalous bond capacity distributions (

0\leq \alpha \leq 1

) we demonstrate that, due to cluster-structure fluctuations, it is not the cutting bonds but the blobs that set the transport properties of the backbone. This ``blob-dominance'' avoids a cross-over to a regime where structural details, the distribution of the number of red or cutting bonds, would set the scaling. The restored scaling exponents however still follow the simplistic red bond estimate. This is argued to be due to the existence of a hierarchy of so-called minimum cut-configurations, for which cutting bonds form the lowest level, and whose transport properties scale all in the same way. We point out the relevance of our findings to other scalar transport problems (i.e. conductivity).Comment: 9 pages + Postscript figures. Revtex4+psfig. Submitted to PR

arXiv.org e-Print Archive

Crossref

Aaltodoc Publication Archive

Answering Regular Path Queries on Workflow Provenance

Author: Bao Zhuowei
Davidson Susan B.
Huang Xiaocheng
Milo Tova
Yuan Xiaojie
Publication venue
Publication date: 04/08/2014
Field of study

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

arXiv.org e-Print Archive

Crossref