17,785 research outputs found
Efficient LZ78 factorization of grammar compressed text
We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size representing a text of length , our
algorithm computes the LZ78 factorization of in time
and space, where is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the term in the
time and space complexities becomes either , where is the length of the
longest LZ78 factor, or where is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of of a certain length. Since where
is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when is
constant, and can be more efficient when the text is compressible, i.e. when
and are small.Comment: SPIRE 201
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Equational reasoning with context-free families of string diagrams
String diagrams provide an intuitive language for expressing networks of
interacting processes graphically. A discrete representation of string
diagrams, called string graphs, allows for mechanised equational reasoning by
double-pushout rewriting. However, one often wishes to express not just single
equations, but entire families of equations between diagrams of arbitrary size.
To do this we define a class of context-free grammars, called B-ESG grammars,
that are suitable for defining entire families of string graphs, and crucially,
of string graph rewrite rules. We show that the language-membership and
match-enumeration problems are decidable for these grammars, and hence that
there is an algorithm for rewriting string graphs according to B-ESG rewrite
patterns. We also show that it is possible to reason at the level of grammars
by providing a simple method for transforming a grammar by string graph
rewriting, and showing admissibility of the induced B-ESG rewrite pattern.Comment: International Conference on Graph Transformation, ICGT 2015. The
final publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-21145-9_
Transport on percolation clusters with power-law distributed bond strengths: when do blobs matter?
The simplest transport problem, namely maxflow, is investigated on critical
percolation clusters in two and three dimensions, using a combination of
extremal statistics arguments and exact numerical computations, for power-law
distributed bond strengths of the type .
Assuming that only cutting bonds determine the flow, the maxflow critical
exponent \ve is found to be \ve(\alpha)=(d-1) \nu + 1/(1-\alpha). This
prediction is confirmed with excellent accuracy using large-scale numerical
simulation in two and three dimensions. However, in the region of anomalous
bond capacity distributions () we demonstrate that, due to
cluster-structure fluctuations, it is not the cutting bonds but the blobs that
set the transport properties of the backbone. This ``blob-dominance'' avoids a
cross-over to a regime where structural details, the distribution of the number
of red or cutting bonds, would set the scaling. The restored scaling exponents
however still follow the simplistic red bond estimate. This is argued to be due
to the existence of a hierarchy of so-called minimum cut-configurations, for
which cutting bonds form the lowest level, and whose transport properties scale
all in the same way. We point out the relevance of our findings to other scalar
transport problems (i.e. conductivity).Comment: 9 pages + Postscript figures. Revtex4+psfig. Submitted to PR
Answering Regular Path Queries on Workflow Provenance
This paper proposes a novel approach for efficiently evaluating regular path
queries over provenance graphs of workflows that may include recursion. The
approach assumes that an execution g of a workflow G is labeled with
query-agnostic reachability labels using an existing technique. At query time,
given g, G and a regular path query R, the approach decomposes R into a set of
subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is
rewritten so that, using the reachability labels of nodes in g, whether or not
there is a path which matches Ri between two nodes can be decided in constant
time. The results of each safe subquery are then composed, possibly with some
small unsafe remainder, to produce an answer to R. The approach results in an
algorithm that significantly reduces the number of subqueries k over existing
techniques by increasing their size and complexity, and that evaluates each
subquery in time bounded by its input and output size. Experimental results
demonstrate the benefit of this approach
- …