5,978 research outputs found

    Finite Automata for the Sub- and Superword Closure of CFLs: Descriptional and Computational Complexity

    Full text link
    We answer two open questions by (Gruber, Holzer, Kutrib, 2009) on the state-complexity of representing sub- or superword closures of context-free grammars (CFGs): (1) We prove a (tight) upper bound of 2O(n)2^{\mathcal{O}(n)} on the size of nondeterministic finite automata (NFAs) representing the subword closure of a CFG of size nn. (2) We present a family of CFGs for which the minimal deterministic finite automata representing their subword closure matches the upper-bound of 22O(n)2^{2^{\mathcal{O}(n)}} following from (1). Furthermore, we prove that the inequivalence problem for NFAs representing sub- or superword-closed languages is only NP-complete as opposed to PSPACE-complete for general NFAs. Finally, we extend our results into an approximation method to attack inequivalence problems for CFGs

    Criticality in Formal Languages and Statistical Physics

    Full text link
    We show that the mutual information between two symbols, as a function of the number of symbols between the two, decays exponentially in any probabilistic regular grammar, but can decay like a power law for a context-free grammar. This result about formal languages is closely related to a well-known result in classical statistical mechanics that there are no phase transitions in dimensions fewer than two. It is also related to the emergence of power-law correlations in turbulence and cosmological inflation through recursive generative processes. We elucidate these physics connections and comment on potential applications of our results to machine learning tasks like training artificial recurrent neural networks. Along the way, we introduce a useful quantity which we dub the rational mutual information and discuss generalizations of our claims involving more complicated Bayesian networks.Comment: Replaced to match final published version. Discussion improved, references adde

    Synthesizing Program Input Grammars

    Full text link
    We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers

    Controlled non uniform random generation of decomposable structures

    Get PDF
    Consider a class of decomposable combinatorial structures, using different types of atoms \Atoms = \{\At_1,\ldots ,\At_{|{\Atoms}|}\}. We address the random generation of such structures with respect to a size nn and a targeted distribution in kk of its \emph{distinguished} atoms. We consider two variations on this problem. In the first alternative, the targeted distribution is given by kk real numbers \TargFreq_1, \ldots, \TargFreq_k such that 0 < \TargFreq_i < 1 for all ii and \TargFreq_1+\cdots+\TargFreq_k \leq 1. We aim to generate random structures among the whole set of structures of a given size nn, in such a way that the {\em expected} frequency of any distinguished atom \At_i equals \TargFreq_i. We address this problem by weighting the atoms with a kk-tuple \Weights of real-valued weights, inducing a weighted distribution over the set of structures of size nn. We first adapt the classical recursive random generation scheme into an algorithm taking \bigO{n^{1+o(1)}+mn\log{n}} arithmetic operations to draw mm structures from the \Weights-weighted distribution. Secondly, we address the analytical computation of weights such that the targeted frequencies are achieved asymptotically, i. e. for large values of nn. We derive systems of functional equations whose resolution gives an explicit relationship between \Weights and \TargFreq_1, \ldots, \TargFreq_k. Lastly, we give an algorithm in \bigO{k n^4} for the inverse problem, {\it i.e.} computing the frequencies associated with a given kk-tuple \Weights of weights, and an optimized version in \bigO{k n^2} in the case of context-free languages. This allows for a heuristic resolution of the weights/frequencies relationship suitable for complex specifications. In the second alternative, the targeted distribution is given by a kk natural numbers n1,…,nkn_1, \ldots, n_k such that n1+⋯+nk+r=nn_1+\cdots+n_k+r=n where r≥0r \geq 0 is the number of undistinguished atoms. The structures must be generated uniformly among the set of structures of size nn that contain {\em exactly} nin_i atoms \At_i (1≤i≤k1 \leq i \leq k). We give a \bigO{r^2\prod_{i=1}^k n_i^2 +m n k \log n} algorithm for generating mm structures, which simplifies into a \bigO{r\prod_{i=1}^k n_i +m n} for regular specifications
    • …
    corecore