310 research outputs found

    Disambiguoiva morfologinen jäsennys probabilistisilla sekvenssimalleilla

    Get PDF
    A morphological tagger is a computer program that provides complete morphological descriptions of sentences. Morphological taggers find applications in many NLP fields. For example, they can be used as a pre-processing step for syntactic parsers, in information retrieval and machine translation. The task of morphological tagging is closely related to POS tagging but morphological taggers provide more fine-grained morphological information than POS taggers. Therefore, they are often applied to morphologically complex languages, which extensively utilize inflection, derivation and compounding for encoding structural and semantic information. This thesis presents work on data-driven morphological tagging for Finnish and other morphologically complex languages. There exists a very limited amount of previous work on data-driven morphological tagging for Finnish because of the lack of freely available manually prepared morphologically tagged corpora. The work presented in this thesis is made possible by the recently published Finnish dependency treebanks FinnTreeBank and Turku Dependency Treebank. Additionally, the Finnish open-source morphological analyzer OMorFi is extensively utilized in the experiments presented in the thesis. The thesis presents methods for improving tagging accuracy, estimation speed and tagging speed in presence of large structured morphological label sets that are typical for morphologically complex languages. More specifically, it presents a novel formulation of generative morphological taggers using weighted finite-state machines and applies finite-state taggers to context sensitive spelling correction of Finnish. The thesis also explores discriminative morphological tagging. It presents structured sub-label dependencies that can be used for improving tagging accuracy. Additionally, the thesis presents a cascaded variant of the averaged perceptron tagger. In presence of large label sets, a cascaded design results in substantial reduction of estimation speed compared to a standard perceptron tagger. Moreover, the thesis explores pruning strategies for perceptron taggers. Finally, the thesis presents the FinnPos toolkit for morphological tagging. FinnPos is an open-source state-of-the-art averaged perceptron tagger implemented by the author.Disambiguoiva morfologinen jäsennin on ohjelma, joka tuottaa yksikäsitteisiä morfologisia kuvauksia virkkeen sanoille. Tällaisia jäsentimiä voidaan hyödyntää monilla kielenkäsittelyn osa-alueilla, esimerkiksi syntaktisen jäsentimen tai konekäännösjärjestelmän esikäsittelyvaiheena. Kieliteknologisena tehtävänä disambiguoiva morfologinen jäsennys muistuttaa perinteistä sanaluokkajäsennystä, mutta se tuottaa hienojakoisempaa morfologista informaatiota kuin perinteinen sanaluokkajäsennin. Tämän takia disambiguoivia morfologisia jäsentimiä hyödynnetäänkin pääsääntöisesti morfologisesti monimutkaisten kielten, kuten suomen kielen, kieliteknologiassa. Tällaisissa kielissä käytetään paljon sananmuodostuskeinoja kuten taivutusta, johtamista ja yhdyssananmuodostusta. Väitöskirjan esittelemä tutkimus liittyy morfologisesti rikkaiden kielten disambiguoivaan morfologiseen jäsentämiseen koneoppimismenetelmin. Vaikka suomen disambiguoivaa morfologista jäsentämistä on tutkittu aiemmin (esim. Constraint Grammar -formalismin avulla), koneoppimismenetelmiä ei ole aiemmin juurikaan sovellettu. Tämä johtuu siitä että jäsentimen oppimiseen tarvittavia korkealuokkaisia morfologisesti annotoituja korpuksia ei ole ollut avoimesti saatavilla. Tässä väitöskirjassa esitelty tutkimus hyödyntää vastikään julkaistuja suomen kielen dependenssijäsennettyjä FinnTreeBank ja Turku Dependency Treebank korpuksia. Lisäksi tutkimus hyödyntää suomen kielen avointa morfologista OMorFi-jäsennintä. Väitöskirja esittelee menetelmiä jäsennystarkkuuden parantamiseen ja jäsentimen opetusnopeuden sekä jäsennysnopeuden kasvattamiseen. Väitöskirja esittää uuden tavan rakentaa generatiivisia jäsentimiä hyödyntäen painollisia äärellistilaisia koneita ja soveltaa tällaisia jäsentimiä suomen kielen kontekstisensitiiviseen oikeinkirjoituksentarkistukseen. Lisäksi väitöskirja käsittelee diskriminatiivisia jäsennysmalleja. Se esittelee tapoja hyödyntää morfologisten analyysien osia jäsennystarkkuuden parantamiseen. Lisäksi se esittää kaskadimallin, jonka avulla jäsentimen opetusaika lyhenee huomattavasi. Väitöskirja esittää myös tapoja jäsenninmallien pienentämiseen. Lopuksi esitellään FinnPos, joka on kirjoittaman toteuttama avoimen lähdekoodin työkalu disambiguoivien morfologisten jäsentimien opettamiseen

    Feasibility report: Delivering case-study based learning using artificial intelligence and gaming technologies

    Get PDF
    This document describes an investigation into the technical feasibility of a game to support learning based on case studies. Information systems students using the game will conduct fact-finding interviews with virtual characters. We survey relevant technologies in computational linguistics and games. We assess the applicability of the various approaches and propose an architecture for the game based on existing techniques. We propose a phased development plan for the development of the game

    Evidence and Formal Models in the Linguistic Sciences

    Get PDF
    This dissertation contains a collection of essays centered on the relationship between theoretical model-building and empirical evidence-gathering in linguistics and related language sciences. The first chapter sets the stage by demonstrating that the subject matter of linguistics is manifold, and contending that discussion of relationships between linguistic models, evidence, and language itself depends on the subject matter at hand. The second chapter defends a restrictive account of scientific evidence. I make use of this account in the third chapter, in which I argue that if my account of scientific evidence is correct, then linguistic intuitions do not generally qualify as scientific evidence. Drawing on both extant and original empirical work on linguistic intuitions, I explore the consequences of this conclusion for scientific practice. In the fourth and fifth chapters I examine two distinct ways in which theoretical models relate to the evidence. Chapter four looks at the way in which empirical evidence can support computer simulations in evolutionary linguistics by informing and constraining them. Chapter five, on the other hand, probes the limits of how models are constrained by the data, taking as a case study empirically-suspect but theoretically-useful intentionalist models of meaning

    Intervention effects in wh-chains: the combined effect of syntax and processing

    Get PDF
    This study in experimental syntax investigates the factors affecting the acceptability of embedded clauses featuring a left-dislocated phrase below a fronted wh-phrase. Sixty native speakers of French took part in an on-line acceptability judgment task including 45 critical items (with an intervening XP) and 20 baseline items (including grammatical and ungrammatical sentences with an embedded wh-dependency). Using Random Forest and Ordinal Regression analyses we demonstrate that Clitic Left Dislocated (CLLD) objects yield stronger intervention effects (except when they are pronouns) than CLLDed subjects. We argue this is due to excessive processing demands incurred when a wh-dependency features a CLLD chain that is not fully within its scope. A processing account also explains why pronouns are not disruptive of wh-chains

    Analyzing meaning: An introduction to semantics and pragmatics

    Get PDF
    An updated edition of this book is available from  http://langsci-press.org/catalog/book/231 This book provides an introduction to the study of meaning in human language, from a linguistic perspective. It covers a fairly broad range of topics, including lexical semantics, compositional semantics, and pragmatics. The chapters are organized into six units: (1) Foundational concepts; (2) Word meanings; (3) Implicature (including indirect speech acts); (4) Compositional semantics; (5) Modals, conditionals, and causation; (6) Tense & aspect. Most of the chapters include exercises which can be used for class discussion and/or homework assignments, and each chapter contains references for additional reading on the topics covered. As the title indicates, this book is truly an INTRODUCTION: it provides a solid foundation which will prepare students to take more advanced and specialized courses in semantics and/or pragmatics. It is also intended as a reference for fieldworkers doing primary research on under-documented languages, to help them write grammatical descriptions that deal carefully and clearly with semantic issues. The approach adopted here is largely descriptive and non-formal (or, in some places, semi-formal), although some basic logical notation is introduced. The book is written at level which should be appropriate for advanced undergraduate or beginning graduate students. It presupposes some previous coursework in linguistics, but does not presuppose any background in formal logic or set theory

    Neural Techniques for German Dependency Parsing

    Get PDF
    Syntactic parsing is the task of analyzing the structure of a sentence based on some predefined formal assumption. It is a key component in many natural language processing (NLP) pipelines and is of great benefit for natural language understanding (NLU) tasks such as information retrieval or sentiment analysis. Despite achieving very high results with neural network techniques, most syntactic parsing research pays attention to only a few prominent languages (such as English or Chinese) or language-agnostic settings. Thus, we still lack studies that focus on just one language and design specific parsing strategies for that language with regards to its linguistic properties. In this thesis, we take German as the language of interest and develop more accurate methods for German dependency parsing by combining state-of-the-art neural network methods with techniques that address the specific challenges posed by the language-specific properties of German. Compared to English, German has richer morphology, semi-free word order, and case syncretism. It is the combination of those characteristics that makes parsing German an interesting and challenging task. Because syntactic parsing is a task that requires many levels of language understanding, we propose to study and improve the knowledge of parsing models at each level in order to improve syntactic parsing for German. These levels are: (sub)word level, syntactic level, semantic level, and sentence level. At the (sub)word level, we look into a surge in out-of-vocabulary words in German data caused by compounding. We propose a new type of embeddings for compounds that is a compositional model of the embeddings of individual components. Our experiments show that character-based embeddings are superior to word and compound embeddings in dependency parsing, and compound embeddings only outperform word embeddings when the part-of-speech (POS) information is unavailable. Thus, we conclude that it is the morpho-syntactic information of unknown compounds, not the semantic one, that is crucial for parsing German. At the syntax level, we investigate challenges for local grammatical function labeler that are caused by case syncretism. In detail, we augment the grammatical function labeling component in a neural dependency parser that labels each head-dependent pair independently with a new labeler that includes a decision history, using Long Short-Term Memory networks (LSTMs). All our proposed models significantly outperformed the baseline on three languages: English, German and Czech. However, the impact of the new models is not the same for all languages: the improvement for English is smaller than for the non-configurational languages (German and Czech). Our analysis suggests that the success of the history-based models is not due to better handling of long dependencies but that they are better in dealing with the uncertainty in head direction. We study the interaction of syntactic parsing with the semantic level via the problem of PP attachment disambiguation. Our motivation is to provide a realistic evaluation of the task where gold information is not available and compare the results of disambiguation systems against the output of a strong neural parser. To our best knowledge, this is the first time that PP attachment disambiguation is evaluated and compared against neural dependency parsing on predicted information. In addition, we present a novel approach for PP attachment disambiguation that uses biaffine attention and utilizes pre-trained contextualized word embeddings as semantic knowledge. Our end-to-end system outperformed the previous pipeline approach on German by a large margin simply by avoiding error propagation caused by predicted information. In the end, we show that parsing systems (with the same semantic knowledge) are in general superior to systems specialized for PP attachment disambiguation. Lastly, we improve dependency parsing at the sentence level using reranking techniques. So far, previous work on neural reranking has been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. We re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). Our proposed reranker not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. Our analysis points out that the failure is due to the lower quality of the k-best lists, where the gold tree ratio and the diversity of the list play an important role

    Analyzing meaning: An introduction to semantics and pragmatics. Second corrected and slightly revised edition

    Get PDF
    This book provides an introduction to the study of meaning in human language, from a linguistic perspective. It covers a fairly broad range of topics, including lexical semantics, compositional semantics, and pragmatics. The chapters are organized into six units: (1) Foundational concepts; (2) Word meanings; (3) Implicature (including indirect speech acts); (4) Compositional semantics; (5) Modals, conditionals, and causation; (6) Tense & aspect. Most of the chapters include exercises which can be used for class discussion and/or homework assignments, and each chapter contains references for additional reading on the topics covered. As the title indicates, this book is truly an INTRODUCTION: it provides a solid foundation which will prepare students to take more advanced and specialized courses in semantics and/or pragmatics. It is also intended as a reference for fieldworkers doing primary research on under-documented languages, to help them write grammatical descriptions that deal carefully and clearly with semantic issues. The approach adopted here is largely descriptive and non-formal (or, in some places, semi-formal), although some basic logical notation is introduced. The book is written at level which should be appropriate for advanced undergraduate or beginning graduate students. It presupposes some previous coursework in linguistics, but does not presuppose any background in formal logic or set theory.   This is a revised version of http://langsci-press.org/catalog/book/14
    • …
    corecore