379 research outputs found

    CHR Grammars

    Full text link
    A grammar formalism based upon CHR is proposed analogously to the way Definite Clause Grammars are defined and implemented on top of Prolog. These grammars execute as robust bottom-up parsers with an inherent treatment of ambiguity and a high flexibility to model various linguistic phenomena. The formalism extends previous logic programming based grammars with a form of context-sensitive rules and the possibility to include extra-grammatical hypotheses in both head and body of grammar rules. Among the applications are straightforward implementations of Assumption Grammars and abduction under integrity constraints for language analysis. CHR grammars appear as a powerful tool for specification and implementation of language processors and may be proposed as a new standard for bottom-up grammars in logic programming. To appear in Theory and Practice of Logic Programming (TPLP), 2005Comment: 36 pp. To appear in TPLP, 200

    Tracking Down the Origins of Ambiguity in Context-Free Grammars

    Get PDF
    Context-free grammars are widely used but still hindered by ambiguity. This stresses the need for detailed detection methods that point out the sources of ambiguity in a grammar. In this paper we show how the approximative Noncanonical Unambiguity Test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar. We prove the correctness of our approach and consider its practical applicability

    The Usability of Ambiguity Detection Methods for Context-Free Grammars

    Get PDF
    One way of verifying a grammar is the detection of ambiguities. Ambiguities are not always unwanted, but they can only be controlled if their sources are known. Unfortunately, the ambiguity problem for context-free grammars is undecidable in the general case. Various ambiguity detection methods (ADMs) exist, but they can never be perfect. In this paper we explore three ADMs to test whether they still can be of any practical value: the derivation generator AMBER, the LR(k) test and the Noncanonical Unambiguity test. We benchmarked their implementations on a collection of ambiguous and unambiguous grammars of different sizes and compa

    Tracking Down the Origins of Ambiguity in Context-Free Grammars

    Get PDF
    Context-free grammars are widely used but still hindered by ambiguity. This stresses the need for detailed detection methods that point out the sources of ambiguity in a grammar. In this paper we show how the approximative Noncanonical Unambiguity Test by Schmitz can be extended to conservatively identify production rules that do not contribute to the ambiguity of a grammar. We prove the correctness of our approach and consider its practical applicability

    Almost Every Simply Typed Lambda-Term Has a Long Beta-Reduction Sequence

    Full text link
    It is well known that the length of a beta-reduction sequence of a simply typed lambda-term of order k can be huge; it is as large as k-fold exponential in the size of the lambda-term in the worst case. We consider the following relevant question about quantitative properties, instead of the worst case: how many simply typed lambda-terms have very long reduction sequences? We provide a partial answer to this question, by showing that asymptotically almost every simply typed lambda-term of order k has a reduction sequence as long as (k-1)-fold exponential in the term size, under the assumption that the arity of functions and the number of variables that may occur in every subterm are bounded above by a constant. To prove it, we have extended the infinite monkey theorem for strings to a parametrized one for regular tree languages, which may be of independent interest. The work has been motivated by quantitative analysis of the complexity of higher-order model checking

    Unsupervised learning of probabilistic grammars

    Get PDF
    Probabilistic grammars define joint probability distributions over sentences and their grammatical structures. They have been used in many areas, such as natural language processing, bioinformatics and pattern recognition, mainly for the purpose of deriving grammatical structures from data (sentences). Unsupervised approaches to learning probabilistic grammars induce a grammar from unannotated sentences, which eliminates the need for manual annotation of grammatical structures that can be laborious and error-prone. In this thesis we study unsupervised learning of probabilistic context-free grammars and probabilistic dependency grammars, both of which are expressive enough for many real-world languages but remain tractable in inference. We investigate three different approaches. The first approach is a structure search approach for learning probabilistic context-free grammars. It acquires rules of an unknown probabilistic context-free grammar through iterative coherent biclustering of the bigrams in the training corpus. A greedy procedure is used in our approach to add rules from biclusters such that each set of rules being added into the grammar results in the largest increase in the posterior of the grammar given the training corpus. Our experiments on several benchmark datasets show that this approach is competitive with existing methods for unsupervised learning of context-free grammars. The second approach is a parameter learning approach for learning natural language grammars based on the idea of unambiguity regularization. We make the observation that natural language is remarkably unambiguous in the sense that each natural language sentence has a large number of possible parses but only a few of the parses are syntactically valid. We incorporate this prior information into parameter learning by means of posterior regularization. The resulting algorithm family contains classic EM and Viterbi EM, as well as a novel softmax-EM algorithm that can be implemented with a simple and efficient extension to classic EM. Our experiments show that unambiguity regularization improves natural language grammar learning, and when combined with other techniques our approach achieves the state-of-the-art grammar learning results. The third approach is grammar learning with a curriculum. A curriculum is a means of presenting training samples in a meaningful order. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsupervised learning of real-world probabilistic grammars

    Weakly-Unambiguous Parikh Automata and Their Link to Holonomic Series

    Get PDF
    We investigate the connection between properties of formal languages and properties of their generating series, with a focus on the class of holonomic power series. We first prove a strong version of a conjecture by Castiglione and Massazza: weakly-unambiguous Parikh automata are equivalent to unambiguous two-way reversal bounded counter machines, and their multivariate generating series are holonomic. We then show that the converse is not true: we construct a language whose generating series is algebraic (thus holonomic), but which is inherently weakly-ambiguous as a Parikh automata language. Finally, we prove an effective decidability result for the inclusion problem for weakly-unambiguous Parikh automata, and provide an upper-bound on its complexity
    • …
    corecore