329 research outputs found

    Towards a machine-learning architecture for lexical functional grammar parsing

    Get PDF
    Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages. The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing. In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages

    Deep machine learning for syntactic annotation projection

    Get PDF
    U ovom radu istražuje se prijenosno učenje kroz više jezika s ciljem omogućavanja sintaktičke analize jezika koji nemaju dovoljno označenih podataka za učenje. Najbolji pristupi rješavanju problema uključuju projekciju oznaka sintaktičkih ovisnosti preko paralelnih tekstova, iz jezika koji imaju mnogo označenih podataka za učenje u jezike koji imaju nedovoljno. U prvom poglavlju opisuju se osnovni pojmovi morfološkog označivanja rečenica i parsanja njihovih ovisnosnih stabala kao i način označivanja sintaktičkih ovisnosti. Prvi pristup rješavanja problema projekcije oznaka sintaktičkih ovisnosti je opisan u drugom poglavlju. Zasnovan je na algoritmu predstavljenom u znanstvenom radu Multilingual Projection for Parsing Truly Low-Resource Languages [10]. Predložene su prilagodbe algoritma koje vode poboljšanju rezultata. U trećem poglavlju predstavljena je ideja o upotrebi neuronskih mreža za projekcije oznaka sintaktičkih ovisnosti te nekoliko ideja kojima rad u budućnosti može biti unaprijeđen.The purpose of this thesis was to explore cross-lingual transfer learning to dependency parsing, with a goal of enabling syntactic analysis for low-resource languages. The best approaches involve annotation projection: the transfer of dependency structures via parallel texts, from resource-rich to low-resource languages. In the first chapter, basic concepts of part of speech tagging and dependency parsing are described as well as the way of annotating texts. The first approach to solving an annotation projection problem is described in the second chapter. It is based on the algorithm presented in the paper Multilingual Projection for Parsing Truly Low-Resource Languages [10]. We propose the way of adjusting the existing algorithm which leads to the improvement of results. In the third chapter, the idea how to use neural networks for annotation projection is presented, and also some of the ideas how the work done in this thesis can be extended in the future

    "Algún" indefinite is not bound by adverbs of quantification

    Get PDF
    Some indefinites cannot be bound by adverbs of quantification or the generic operator. I argue that this datum follows from the internal syntax of indefinites: only those indefinites consisting of a minimal structure can be bound, bigger indefinites cannot. I present evidence from Spanish, Russian and English to support this claim. Two theoretical consequences follow. The first one is about wh-dependencies: I argue that wh-phrases cannot be regarded as noun phrases with an extra [wh] feature, but rather as very small indefinites without additional features. The second one involves exceptional scope: choice function approaches seem to run into a paradox that alternative approaches, such as Schwarszchild's Singleton Indefinite approach, avoid. I also argue that an alternative semantic approach to binding resistance yields no fruit. Finally, I show that only small indefinites can be used as predicates, thus bolstering the approach taken in these pages

    West Flemish verb-based discourse markers and the articulation of the Speech Act layer

    Get PDF
    This paper focuses on the West Flemish discourse markers located at the edge of the clause. After a brief survey of the distribution of discourse markers in WF, the paper proposes a syntactic analysis of the discourse markers ne and we. Based on the distribution of these discourse markers, of vocatives and of dislocated DPs, an articulated speech act layer is elaborated which corroborates the proposals in Hill (). It is postulated that there is a syntactic relation between particles used as discourse markers and vocatives. The paper offers further support for the grammaticalization of pragmatic features at the interface between syntax and discourse and for the hypothesis that the relevant computation at the interface is of the same nature as that in Narrow Syntax

    MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages

    Get PDF
    In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages

    Focus and Focus Structures in the Romance Languages

    Get PDF
    The archived version is a draft of a chapter/article that has been accepted for publication by Oxford University Press in the Oxford Research Encyclopedia of Linguistics.Peer reviewe

    Deep machine learning for syntactic annotation projection

    Get PDF
    U ovom radu istražuje se prijenosno učenje kroz više jezika s ciljem omogućavanja sintaktičke analize jezika koji nemaju dovoljno označenih podataka za učenje. Najbolji pristupi rješavanju problema uključuju projekciju oznaka sintaktičkih ovisnosti preko paralelnih tekstova, iz jezika koji imaju mnogo označenih podataka za učenje u jezike koji imaju nedovoljno. U prvom poglavlju opisuju se osnovni pojmovi morfološkog označivanja rečenica i parsanja njihovih ovisnosnih stabala kao i način označivanja sintaktičkih ovisnosti. Prvi pristup rješavanja problema projekcije oznaka sintaktičkih ovisnosti je opisan u drugom poglavlju. Zasnovan je na algoritmu predstavljenom u znanstvenom radu Multilingual Projection for Parsing Truly Low-Resource Languages [10]. Predložene su prilagodbe algoritma koje vode poboljšanju rezultata. U trećem poglavlju predstavljena je ideja o upotrebi neuronskih mreža za projekcije oznaka sintaktičkih ovisnosti te nekoliko ideja kojima rad u budućnosti može biti unaprijeđen.The purpose of this thesis was to explore cross-lingual transfer learning to dependency parsing, with a goal of enabling syntactic analysis for low-resource languages. The best approaches involve annotation projection: the transfer of dependency structures via parallel texts, from resource-rich to low-resource languages. In the first chapter, basic concepts of part of speech tagging and dependency parsing are described as well as the way of annotating texts. The first approach to solving an annotation projection problem is described in the second chapter. It is based on the algorithm presented in the paper Multilingual Projection for Parsing Truly Low-Resource Languages [10]. We propose the way of adjusting the existing algorithm which leads to the improvement of results. In the third chapter, the idea how to use neural networks for annotation projection is presented, and also some of the ideas how the work done in this thesis can be extended in the future

    Deep machine learning for syntactic annotation projection

    Get PDF
    U ovom radu istražuje se prijenosno učenje kroz više jezika s ciljem omogućavanja sintaktičke analize jezika koji nemaju dovoljno označenih podataka za učenje. Najbolji pristupi rješavanju problema uključuju projekciju oznaka sintaktičkih ovisnosti preko paralelnih tekstova, iz jezika koji imaju mnogo označenih podataka za učenje u jezike koji imaju nedovoljno. U prvom poglavlju opisuju se osnovni pojmovi morfološkog označivanja rečenica i parsanja njihovih ovisnosnih stabala kao i način označivanja sintaktičkih ovisnosti. Prvi pristup rješavanja problema projekcije oznaka sintaktičkih ovisnosti je opisan u drugom poglavlju. Zasnovan je na algoritmu predstavljenom u znanstvenom radu Multilingual Projection for Parsing Truly Low-Resource Languages [10]. Predložene su prilagodbe algoritma koje vode poboljšanju rezultata. U trećem poglavlju predstavljena je ideja o upotrebi neuronskih mreža za projekcije oznaka sintaktičkih ovisnosti te nekoliko ideja kojima rad u budućnosti može biti unaprijeđen.The purpose of this thesis was to explore cross-lingual transfer learning to dependency parsing, with a goal of enabling syntactic analysis for low-resource languages. The best approaches involve annotation projection: the transfer of dependency structures via parallel texts, from resource-rich to low-resource languages. In the first chapter, basic concepts of part of speech tagging and dependency parsing are described as well as the way of annotating texts. The first approach to solving an annotation projection problem is described in the second chapter. It is based on the algorithm presented in the paper Multilingual Projection for Parsing Truly Low-Resource Languages [10]. We propose the way of adjusting the existing algorithm which leads to the improvement of results. In the third chapter, the idea how to use neural networks for annotation projection is presented, and also some of the ideas how the work done in this thesis can be extended in the future

    Copy theory in wh-in-situ languages: Sluicing in Hindi-Urdu

    Get PDF
    Hindi-Urdu is known to be one of the wh-in-situ languages exhibiting a sluicing-like construction. Although many have proposed alternative accounts of such strings in wh-in-situ languages (e.g. Kizu 1997, Toosarvandani 2009, Gribanova 2011, Hankamer 2010), I argue that apparent sluicing in Hindi-Urdu can be analyzed in a manner consistent with the notion that the syntax of a sluice is the syntax of a regular wh-question (Ross 1969, Merchant 2001). Assuming the copy theory of movement (Chomsky & Lasnik 1993, Chomsky 1993, i.a.), we can understand sluicing in Hindi-Urdu as an exceptional instance of the pronunciation of the top copy in a wh-chain, correctly predicting that Hindi-Urdusluiced structures have properties similar to genuine sluices in languages like English. This article pursues a continued refinement in the implementation of copy theory in wh-in-situ languages and importantly, contributes to the current line of work investigating intra-linguistic variation among wh-in-situ languages and the ways in which constellations of properties of wh-dependencies and ellipsis processes in these languages are best understood
    corecore