31 research outputs found

    FinnTreeBank: Creating a research resource and service for language researchers with Constraint Grammar

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), 41–49. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    Inducing Constraint Grammars

    Full text link
    Constraint Grammar rules are induced from corpora. A simple scheme based on local information, i.e., on lexical biases and next-neighbour contexts, extended through the use of barriers, reached 87.3 percent precision (1.12 tags/word) at 98.2 percent recall. The results compare favourably with other methods that are used for similar tasks although they are by no means as good as the results achieved using the original hand-written rules developed over several years time.Comment: 10 pages, uuencoded, gzipped PostScrip

    A double-blind experiment on interannotator agreement: the case of dependency syntax and Finnish

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 319-322. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/1695

    Compiling and Using Finite-State Syntactic Rules

    Get PDF
    Proceeding volume: 1A language-independent framework for syntactic finlte-state parsing is discussed. The article presents a framework, a formalism, a compiler and a parser for grammars written in this forrealism. As a substantial example, fragments from a nontrivial finite-state grammar of English are discussed. The linguistic framework of the present approach is based on a surface syntactic tagging scheme by F. Karlsson. This representation is slightly less powerful than phrase structure tree notation, letUng some ambiguous constructions be described more concisely. The finite-state rule compiler implements what was briefly sketched by Koskenniemi (1990). It is based on the calculus of finite-state machines. The compiler transforms rules into rule-automata. The run-time parser exploits one of certain alternative strategies in performing the effective intersection of the rule automata and the sentence automaton. Fragments of a fairly comprehensive finite-state granmmr of English are presented here, including samples from non-finite constructions as a demonstration of the capacity of the present formalism, which goes far beyond plain disamblguation or part of speech tagging. The grammar itself is directly related to a parser and tagging system for English created as a part of project SIMPR I using Karlsson's CG (Constraint Grammar) formalism.Peer reviewe

    Analysing Finnish with word lists : The DDI approach to morphology revisited

    Get PDF
    Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens
    corecore