101 research outputs found

    Multiword expressions at length and in depth

    Get PDF
    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

    Cappadocian kinship

    Get PDF
    Cappadocian kinship systems are very interesting from a sociolinguistic and anthropological perspective because of the mixture of inherited Greek and borrowed Turkish kinship terms. Precisely because the number of Turkish kinship terms differs from one variety to another, it is necessary to talk about Cappadocian kinship systems in the plural rather than about the Cappadocian kinship system in the singular. Although reference will be made to other Cappadocian varieties, this paper will focus on the kinship systems of Mišotika and Aksenitika, the two Central Cappadocian dialects still spoken today in several communities in Greece. Particular attention will be given to the use of borrowed Turkish kinship terms, which sometimes seem to co-exist together with their inherited Greek counterparts, e.g. mána vs. néne ‘mother’, ailfó/aelfó vs. γardáš ‘brother’ etc. In the final part of the paper some kinship terms with obscure or hitherto unknown etymology will be discussed, e.g. káka ‘grandmother’, ižá ‘aunt’, lúva ‘uncle (father’s brother)’ etc

    Challenges in Categorization : Corpus-based Studies of Adjectival Premodifiers in English

    Get PDF
    This thesis draws together a series of articles on premodifying -ing participles and adjectives in English (e.g. "interesting", "advancing"). The studies are intended to contribute to our understanding of a variety of topics, including the meaning and function of participles and other adjectival premodifiers, their use in different registers, and their change over time. The overarching topic that connects all the articles thematically is linguistic categorization, which is here understood as a process of abstraction through which language users group linguistic elements together according to their form, meaning, function and patterns of use. Some of the articles discuss categories and categorization in terms of word classes (adjectives/verbs), while the focus of others is on semantic categorization (subjective/objective premodifiers) or the categorization of linguistic registers based on the distribution of premodified noun phrases. On the one hand, then, this thesis bears on the general discussion of the nature of linguistic categorization and category change. On the other hand, it continues a series of descriptions and analyses of adjectival premodifiers in contemporary research and the large reference grammars of Present-day English. One of the main findings of this thesis concerns the tendency of subjective adjectives, adjective phrases and nouns to be used with indefinite determination and in a complement role in discourse. This tendency is explained by a preferential mapping between subjectivity and new information, and the correlation is shown to have interesting uses in more practical tasks, such as semantic disambiguation, corpus annotation and the study of semantic change. Another important result is the tendency of degree modifiers to be used proportionally more often in predication than in attribution. These kinds of results support a usage-based approach to word classes, where categories like Verb or Adjective are regarded as emergent schemas that arise from actual patterns of use. The thesis also includes a wide-ranging survey of the relevant philosophical and linguistic literature on categorization.Tarkastelen väitöskirjassani englannin kielen eri adjektiivimääritteitä sekä synkronisesta että diakronisesta näkökulmasta. Päähuomio kiinnittyy erilaisten partisiippimääritteiden, etenkin -ing-partisiippien (esim. "interesting", "advancing") kategorisointiin, merkitykseen ja käyttöön, mutta paneudun väitöskirjan osatutkimuksissa myös -ed-partisiippien (esim. "scared") sekä tavallisten adjektiivien käyttöön. Tärkeimpiä teemoja työssäni ovat adjektiivisten sanojen merkityksen subjektiivisuus ja subjektifikaatio, sanaluokkien astemaisuus sekä sanojen vähittäinen kategorian muutos (esim. verbintapaisen -ing-partisiipin astemainen muutos adjektiiviksi). Tutkimukseni pohjaa englannin kielen korpusaineistoon, ja se kattaa ajanjakson aina varhaisuusenglannista nykyenglantiin. Väitöskirjatyöni on vahvasti empiirinen, ja sen tärkeimpiä yleisiä tuloksia on havainto korrelaatiosta subjektiivisten merkitysten ja tietynlaisten rakenteiden välillä. Olen korpusaineiston avulla mm. osoittanut, että vahvasti subjektiivisia merkityksiä ilmaistaan englannin kielessä tyypillisesti indefiniittisissä rakenteissa. Esimerkiksi "a much better result" on aineistossa huomattavasti yleisempi kuin "the much better result". Samoin astemaisuutta kuvaavat adverbit, kuten "very" ja "extremely", esiintyvät aineistossa merkittävästi useammin predikaatiossa kuin attribuutiossa (esim. "this is very nice" on yleisempi kuin "a very nice idea"). Esitän väitöskirjassani, että tällaiset havainnot ovat relevantteja sekä kielen muutoksen selittämisessä että siinä tavassa, jolla sanaluokat tulisi ymmärtää kielitieteen teoriassa: tutkimuksessani sanaluokat käsitetään kielen käyttäjän kokemuksiin perustuvina abstraktioina (skeemoina), jotka ovat dynaamisia ja jotka voivat muuttua sekä pitkällä että lyhyemmällä aikavälillä. Tämä ajatus on erityisen tärkeä konstruktiokieliopin teorian kannalta viitekehyksen, jota sovellan väitöskirjani viimeisessä osatutkimuksessa

    Information Technology and Lawyers. Advanced Technology in the Legal Domain, from Challenges to Daily Routine

    Get PDF

    Representation and Processing of Composition, Variation and Approximation in Language Resources and Tools

    Get PDF
    In my habilitation dissertation, meant to validate my capacity of and maturity for directingresearch activities, I present a panorama of several topics in computational linguistics, linguisticsand computer science.Over the past decade, I was notably concerned with the phenomena of compositionalityand variability of linguistic objects. I illustrate the advantages of a compositional approachto the language in the domain of emotion detection and I explain how some linguistic objects,most prominently multi-word expressions, defy the compositionality principles. I demonstratethat the complex properties of MWEs, notably variability, are partially regular and partiallyidiosyncratic. This fact places the MWEs on the frontiers between different levels of linguisticprocessing, such as lexicon and syntax.I show the highly heterogeneous nature of MWEs by citing their two existing taxonomies.After an extensive state-of-the art study of MWE description and processing, I summarizeMultiflex, a formalism and a tool for lexical high-quality morphosyntactic description of MWUs.It uses a graph-based approach in which the inflection of a MWU is expressed in function ofthe morphology of its components, and of morphosyntactic transformation patterns. Due tounification the inflection paradigms are represented compactly. Orthographic, inflectional andsyntactic variants are treated within the same framework. The proposal is multilingual: it hasbeen tested on six European languages of three different origins (Germanic, Romance and Slavic),I believe that many others can also be successfully covered. Multiflex proves interoperable. Itadapts to different morphological language models, token boundary definitions, and underlyingmodules for the morphology of single words. It has been applied to the creation and enrichmentof linguistic resources, as well as to morphosyntactic analysis and generation. It can be integratedinto other NLP applications requiring the conflation of different surface realizations of the sameconcept.Another chapter of my activity concerns named entities, most of which are particular types ofMWEs. Their rich semantic load turned them into a hot topic in the NLP community, which isdocumented in my state-of-the art survey. I present the main assumptions, processes and resultsissued from large annotation tasks at two levels (for named entities and for coreference), parts ofthe National Corpus of Polish construction. I have also contributed to the development of bothrule-based and probabilistic named entity recognition tools, and to an automated enrichment ofProlexbase, a large multilingual database of proper names, from open sources.With respect to multi-word expressions, named entities and coreference mentions, I pay aspecial attention to nested structures. This problem sheds new light on the treatment of complexlinguistic units in NLP. When these units start being modeled as trees (or, more generally, asacyclic graphs) rather than as flat sequences of tokens, long-distance dependencies, discontinu-ities, overlapping and other frequent linguistic properties become easier to represent. This callsfor more complex processing methods which control larger contexts than what usually happensin sequential processing. Thus, both named entity recognition and coreference resolution comesvery close to parsing, and named entities or mentions with their nested structures are analogous3to multi-word expressions with embedded complements.My parallel activity concerns finite-state methods for natural language and XML processing.My main contribution in this field, co-authored with 2 colleagues, is the first full-fledged methodfor tree-to-language correction, and more precisely for correcting XML documents with respectto a DTD. We have also produced interesting results in incremental finite-state algorithmics,particularly relevant to data evolution contexts such as dynamic vocabularies or user updates.Multilingualism is the leitmotif of my research. I have applied my methods to several naturallanguages, most importantly to Polish, Serbian, English and French. I have been among theinitiators of a highly multilingual European scientific network dedicated to parsing and multi-word expressions. I have used multilingual linguistic data in experimental studies. I believethat it is particularly worthwhile to design NLP solutions taking declension-rich (e.g. Slavic)languages into account, since this leads to more universal solutions, at least as far as nominalconstructions (MWUs, NEs, mentions) are concerned. For instance, when Multiflex had beendeveloped with Polish in mind it could be applied as such to French, English, Serbian and Greek.Also, a French-Serbian collaboration led to substantial modifications in morphological modelingin Prolexbase in its early development stages. This allowed for its later application to Polishwith very few adaptations of the existing model. Other researchers also stress the advantages ofNLP studies on highly inflected languages since their morphology encodes much more syntacticinformation than is the case e.g. in English.In this dissertation I am also supposed to demonstrate my ability of playing an active rolein shaping the scientific landscape, on a local, national and international scale. I describemy: (i) various scientific collaborations and supervision activities, (ii) roles in over 10 regional,national and international projects, (iii) responsibilities in collective bodies such as program andorganizing committees of conferences and workshops, PhD juries, and the National UniversityCouncil (CNU), (iv) activity as an evaluator and a reviewer of European collaborative projects.The issues addressed in this dissertation open interesting scientific perspectives, in whicha special impact is put on links among various domains and communities. These perspectivesinclude: (i) integrating fine-grained language data into the linked open data, (ii) deep parsingof multi-word expressions, (iii) modeling multi-word expression identification in a treebank as atree-to-language correction problem, and (iv) a taxonomy and an experimental benchmark fortree-to-language correction approaches

    Extended papers from the MWE 2017 workshop

    Get PDF
    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

    Artificial Neural Networks in Agriculture

    Get PDF
    Modern agriculture needs to have high production efficiency combined with a high quality of obtained products. This applies to both crop and livestock production. To meet these requirements, advanced methods of data analysis are more and more frequently used, including those derived from artificial intelligence methods. Artificial neural networks (ANNs) are one of the most popular tools of this kind. They are widely used in solving various classification and prediction tasks, for some time also in the broadly defined field of agriculture. They can form part of precision farming and decision support systems. Artificial neural networks can replace the classical methods of modelling many issues, and are one of the main alternatives to classical mathematical models. The spectrum of applications of artificial neural networks is very wide. For a long time now, researchers from all over the world have been using these tools to support agricultural production, making it more efficient and providing the highest-quality products possible

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF
    corecore