583 research outputs found

    Probabilistic parsing

    Get PDF
    Postprin

    Multiple Context-Free Tree Grammars: Lexicalization and Characterization

    Get PDF
    Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.Comment: 78 pages, 13 figure

    Probabilistic Constraint Logic Programming

    Full text link
    This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

    Exploring the N-th Dimension of Language

    Get PDF
    This paper is aimed at exploring the hidden fundamental\ud computational property of natural language that has been so elusive that it has made all attempts to characterize its real computational property ultimately fail. Earlier natural language was thought to be context-free. However, it was gradually realized that this does not hold much water given that a range of natural language phenomena have been found as being of non-context-free character that they have almost scuttled plans to brand natural language contextfree. So it has been suggested that natural language is mildly context-sensitive and to some extent context-free. In all, it seems that the issue over the exact computational property has not yet been solved. Against this background it will be proposed that this exact computational property of natural language is perhaps the N-th dimension of language, if what we mean by dimension is\ud nothing but universal (computational) property of natural language

    An automata characterisation for multiple context-free languages

    Full text link
    We introduce tree stack automata as a new class of automata with storage and identify a restricted form of tree stack automata that recognises exactly the multiple context-free languages.Comment: This is an extended version of a paper with the same title accepted at the 20th International Conference on Developments in Language Theory (DLT 2016

    Parsing With Lexicalized Tree Adjoining Grammar

    Get PDF
    Most current linguistic theories give lexical accounts of several phenomena that used to be considered purely syntactic. The information put in the lexicon is thereby increased in both amount and complexity: see, for example, lexical rules in LFG (Kaplan and Bresnan, 1983), GPSG (Gazdar, Klein, Pullum and Sag, 1985), HPSG (Pollard and Sag, 1987), Combinatory Categorial Grammars (Steedman, 1987), Karttunen\u27s version of Categorial Grammar (Karttunen 1986, 1988), some versions of GB theory (Chomsky 1981), and Lexicon-Grammars (Gross 1984). We would like to take into account this fact while defining a formalism. We therefore explore the view that syntactical rules are not separated from lexical items. We say that a grammar is lexicalized (Schabes, AbeilK and Joshi, 1988) if it consists of: (1) a finite set of structures each associated with lexical items; each lexical item will be called the anchor of the corresponding structure; the structures define the domain of locality over which constraints are specified; (2) an operation or operations for composing the structures. The notion of anchor is closely related to the word associated with a functor-argument category in Categorial Grammars. Categorial Grammar (as used for example by Steedman, 1987) are \u27lexicalized\u27 according to our definition since each basic category has a lexical item associated with it

    Descriptional Succinctness of Some Grammatical Formalisms for Natrual Language

    Get PDF
    We investigate the problem of describing languages compactly in different grammatical formalisms for natural languages. In particular, the problem is studied from the point of view of some newly developed natural language formalisms like linear control grammars (LCGs) and tree adjoining grammars (TAGs); these formalisms not only generate non-context-free languages that capture a wide variety of syntactic phenomena found in natural language, but also have computationally efficient polynomial time recognition algorithms. We prove that the formalisms enjoy the property of unbounded succinctness over the family of context-grammars, i.e. they are, in general, able to provide more compact representations of natural languages as compared to standard context-free grammars

    Two characterisation results of multiple context-free grammars and their application to parsing

    Get PDF
    In the first part of this thesis, a Chomsky-SchĂźtzenberger characterisation and an automaton characterisation of multiple context-free grammars are proved. Furthermore, a framework for approximation of automata with storage is described. The second part develops each of the three theoretical results into a parsing algorithm

    Syntactic phrase-based statistical machine translation

    Get PDF
    Phrase-based statistical machine translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research which uses 'syntactified' target language phrases, by incorporating supertags as constraints to better resolve parse tree fragments. In addition, we do not impose any sentence-length limit, and using a log-linear decoder, we outperform a state-of-the-art PBSMT system by over 1.3 BLEU points (or 3.51% relative) on the NIST 2003 Arabic-English test corpus
    • …
    corecore