711 research outputs found

    A derivational model of discontinuous parsing

    Get PDF
    The notion of latent-variable probabilistic context-free derivation of syntactic structures is enhanced to allow heads and unrestricted discontinuities. The chosen formalization covers both constituent parsing and dependency parsing. The derivational model is accompanied by an equivalent probabilistic automaton model. By the new framework, one obtains a probability distribution over the space of all discontinuous parses. This lends itself to intrinsic evaluation in terms of perplexity, as shown in experiments.Postprin

    A derivational model of discontinuous parsing

    Get PDF
    The notion of latent-variable probabilistic context-free derivation of syntactic structures is enhanced to allow heads and unrestricted discontinuities. The chosen formalization covers both constituency parsing and dependency parsing. By the new framework, one obtains a probability distribution over the space of all discontinuous parses. This lends itself to intrinsic evaluation in terms of cross-entropy. The derivational model is accompanied by an equivalent automaton model, which can be used for deterministic parsing.PostprintPeer reviewe

    Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

    Get PDF
    The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

    A Lexicalized Tree Adjoining Grammar for Thai

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Morphologically complex words in L1 and L2 processing: Evidence from masked priming experiments in English

    Get PDF
    This paper reports results from masked priming experiments investigating regular past-tense forms and deadjectival nominalizations with -ness and -ity in adult native (L1) speakers of English and in different groups of advanced adult second language (L2) learners of English. While the L1 group showed efficient priming for both inflected and derived word forms, the L2 learners demonstrated repetition-priming effects (like the L1 group), but no priming for inflected and reduced priming for derived word forms. We argue that this striking contrast between L1 and L2 processing supports the view that adult L2 learners rely more on lexical storage and less on combinatorial processing of morphologically complex words than native speakers.</jats:p

    Parsing/theorem-proving for logical grammar CatLog3

    Get PDF
    CatLog3 is a 7000 line Prolog parser/theorem-prover for logical categorial grammar. In such logical categorial grammar syntax is universal and grammar is reduced to logic: an expression is grammatical if and only if an associated logical statement is a theorem of a fixed calculus. Since the syntactic component is invariant, being the logic of the calculus, logical categorial grammar is purely lexicalist and a particular language model is defined by just a lexical dictionary. The foundational logic of continuity was established by Lambek (Am Math Mon 65:154–170, 1958) (the Lambek calculus) while a corresponding extension including also logic of discontinuity was established by Morrill and Valentín (Linguist Anal 36(1–4):167–192, 2010) (the displacement calculus). CatLog3 implements a logic including as primitive connectives the continuous (concatenation) and discontinuous (intercalation) connectives of the displacement calculus, additives, 1st order quantifiers, normal modalities, bracket modalities, and universal and existential subexponentials. In this paper we review the rules of inference for these primitive connectives and their linguistic applications, and we survey the principles of Andreoli’s focusing, and of a generalisation of van Benthem’s count-invariance, on the basis of which CatLog3 is implemented.Peer ReviewedPostprint (author's final draft

    Lexicalization and Grammar Development

    Get PDF
    In this paper we present a fully lexicalized grammar formalism as a particularly attractive framework for the specification of natural language grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We illustrate the advantages of lexicalized grammars in various contexts of natural language processing, ranging from wide-coverage grammar development to parsing and machine translation. We also present a method for compact and efficient representation of lexicalized trees.Comment: ps file. English w/ German abstract. 10 page
    corecore