15,892 research outputs found

    On the Relation between Context-Free Grammars and Parsing Expression Grammars

    Full text link
    Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper we present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsing expressions and the recognition-based semantics. We show how one way of removing non-determinism from this formalism yields a formalism with the semantics of PEGs. We also prove, based on these new formalisms, how LL(1) grammars define the same language whether interpreted as CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular grammars have simple language-preserving translations from CFGs to PEGs

    Practical experiments with regular approximation of context-free languages

    Get PDF
    Several methods are discussed that construct a finite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more refined form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are filtered through a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200

    Multiple Context-Free Tree Grammars: Lexicalization and Characterization

    Get PDF
    Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.Comment: 78 pages, 13 figure

    Tree transducers, L systems, and two-way machines

    Get PDF
    A relationship between parallel rewriting systems and two-way machines is investigated. Restrictions on the “copying power” of these devices endow them with rich structuring and give insight into the issues of determinism, parallelism, and copying. Among the parallel rewriting systems considered are the top-down tree transducer; the generalized syntax-directed translation scheme and the ETOL system, and among the two-way machines are the tree-walking automaton, the two-way finite-state transducer, and (generalizations of) the one-way checking stack automaton. The. relationship of these devices to macro grammars is also considered. An effort is made .to provide a systematic survey of a number of existing results

    On external presentations of infinite graphs

    Get PDF
    The vertices of a finite state system are usually a subset of the natural numbers. Most algorithms relative to these systems only use this fact to select vertices. For infinite state systems, however, the situation is different: in particular, for such systems having a finite description, each state of the system is a configuration of some machine. Then most algorithmic approaches rely on the structure of these configurations. Such characterisations are said internal. In order to apply algorithms detecting a structural property (like identifying connected components) one may have first to transform the system in order to fit the description needed for the algorithm. The problem of internal characterisation is that it hides structural properties, and each solution becomes ad hoc relatively to the form of the configurations. On the contrary, external characterisations avoid explicit naming of the vertices. Such characterisation are mostly defined via graph transformations. In this paper we present two kind of external characterisations: deterministic graph rewriting, which in turn characterise regular graphs, deterministic context-free languages, and rational graphs. Inverse substitution from a generator (like the complete binary tree) provides characterisation for prefix-recognizable graphs, the Caucal Hierarchy and rational graphs. We illustrate how these characterisation provide an efficient tool for the representation of infinite state systems

    Stochastic Attribute-Value Grammars

    Full text link
    Probabilistic analogues of regular and context-free grammars are well-known in computational linguistics, and currently the subject of intensive research. To date, however, no satisfactory probabilistic analogue of attribute-value grammars has been proposed: previous attempts have failed to define a correct parameter-estimation algorithm. In the present paper, I define stochastic attribute-value grammars and give a correct algorithm for estimating their parameters. The estimation algorithm is adapted from Della Pietra, Della Pietra, and Lafferty (1995). To estimate model parameters, it is necessary to compute the expectations of certain functions under random fields. In the application discussed by Della Pietra, Della Pietra, and Lafferty (representing English orthographic constraints), Gibbs sampling can be used to estimate the needed expectations. The fact that attribute-value grammars generate constrained languages makes Gibbs sampling inapplicable, but I show how a variant of Gibbs sampling, the Metropolis-Hastings algorithm, can be used instead.Comment: 23 pages, 21 Postscript figures, uses rotate.st

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
    • …
    corecore