1,932 research outputs found

    Descriptional Succinctness of Some Grammatical Formalisms for Natrual Language

    Get PDF
    We investigate the problem of describing languages compactly in different grammatical formalisms for natural languages. In particular, the problem is studied from the point of view of some newly developed natural language formalisms like linear control grammars (LCGs) and tree adjoining grammars (TAGs); these formalisms not only generate non-context-free languages that capture a wide variety of syntactic phenomena found in natural language, but also have computationally efficient polynomial time recognition algorithms. We prove that the formalisms enjoy the property of unbounded succinctness over the family of context-grammars, i.e. they are, in general, able to provide more compact representations of natural languages as compared to standard context-free grammars

    Inference of Shape Graphs for Graph Databases

    Get PDF
    We investigate the problem of constructing a shape graph that describes the structure of a given graph database. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound, i.e., always producing a schema that validates the input graph, and complete, i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We identify a number of fundamental limitations that preclude feasible inference. We present inference algorithms based on natural approaches that allow to infer schemas that we argue to be of practical importance

    Upper Bounds on Recognition of a Hierarchy of Non-Context-Free Languages

    Get PDF
    Control grammars, a generalization of context-free grammars recently introduced for use in natural language recognition, are investigated. In particular, it is shown that a hierarchy of non-context-free languages, called the Control Language Hierarchy (CLH), generated by control grammars can be recognized in polynomial time. Previously, the best known upper bound was exponential time. It is also shown that CLH is in NC(2) the class of languages recognizable by uniform boolean circuits of polynomial size and O(log2 n) depth

    On the equivalence, containment, and covering problems for the regular and context-free languages

    Get PDF
    We consider the complexity of the equivalence and containment problems for regular expressions and context-free grammars, concentrating on the relationship between complexity and various language properties. Finiteness and boundedness of languages are shown to play important roles in the complexity of these problems. An encoding into grammars of Turing machine computations exponential in the size of the grammar is used to prove several exponential lower bounds. These lower bounds include exponential time for testing equivalence of grammars generating finite sets, and exponential space for testing equivalence of non-self-embedding grammars. Several problems which might be complex because of this encoding are shown to simplify for linear grammars. Other problems considered include grammatical covering and structural equivalence for right-linear, linear, and arbitrary grammars

    Learning probability distributions generated by finite-state machines

    Get PDF
    We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.Peer ReviewedPostprint (author's final draft
    • …
    corecore