17,785 research outputs found

    Efficient LZ78 factorization of grammar compressed text

    Full text link
    We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size nn representing a text SS of length NN, our algorithm computes the LZ78 factorization of TT in O(nN+mlogN)O(n\sqrt{N}+m\log N) time and O(nN+m)O(n\sqrt{N}+m) space, where mm is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nNn\sqrt{N} term in the time and space complexities becomes either nLnL, where LL is the length of the longest LZ78 factor, or (Nα)(N - \alpha) where α0\alpha \geq 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of SS of a certain length. Since m=O(N/logσN)m = O(N/\log_\sigma N) where σ\sigma is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ\sigma is constant, and can be more efficient when the text is compressible, i.e. when mm and nn are small.Comment: SPIRE 201

    Generalizing input-driven languages: theoretical and practical benefits

    Get PDF
    Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

    Equational reasoning with context-free families of string diagrams

    Full text link
    String diagrams provide an intuitive language for expressing networks of interacting processes graphically. A discrete representation of string diagrams, called string graphs, allows for mechanised equational reasoning by double-pushout rewriting. However, one often wishes to express not just single equations, but entire families of equations between diagrams of arbitrary size. To do this we define a class of context-free grammars, called B-ESG grammars, that are suitable for defining entire families of string graphs, and crucially, of string graph rewrite rules. We show that the language-membership and match-enumeration problems are decidable for these grammars, and hence that there is an algorithm for rewriting string graphs according to B-ESG rewrite patterns. We also show that it is possible to reason at the level of grammars by providing a simple method for transforming a grammar by string graph rewriting, and showing admissibility of the induced B-ESG rewrite pattern.Comment: International Conference on Graph Transformation, ICGT 2015. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-21145-9_

    Transport on percolation clusters with power-law distributed bond strengths: when do blobs matter?

    Get PDF
    The simplest transport problem, namely maxflow, is investigated on critical percolation clusters in two and three dimensions, using a combination of extremal statistics arguments and exact numerical computations, for power-law distributed bond strengths of the type P(σ)σαP(\sigma) \sim \sigma^{-\alpha}. Assuming that only cutting bonds determine the flow, the maxflow critical exponent \ve is found to be \ve(\alpha)=(d-1) \nu + 1/(1-\alpha). This prediction is confirmed with excellent accuracy using large-scale numerical simulation in two and three dimensions. However, in the region of anomalous bond capacity distributions (0α10\leq \alpha \leq 1) we demonstrate that, due to cluster-structure fluctuations, it is not the cutting bonds but the blobs that set the transport properties of the backbone. This ``blob-dominance'' avoids a cross-over to a regime where structural details, the distribution of the number of red or cutting bonds, would set the scaling. The restored scaling exponents however still follow the simplistic red bond estimate. This is argued to be due to the existence of a hierarchy of so-called minimum cut-configurations, for which cutting bonds form the lowest level, and whose transport properties scale all in the same way. We point out the relevance of our findings to other scalar transport problems (i.e. conductivity).Comment: 9 pages + Postscript figures. Revtex4+psfig. Submitted to PR

    Answering Regular Path Queries on Workflow Provenance

    Full text link
    This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach
    corecore