236 research outputs found

    The automatic processing of multiword expressions in Irish

    Get PDF
    It is well-documented that Multiword Expressions (MWEs) pose a unique challenge to a variety of NLP tasks such as machine translation, parsing, information retrieval, and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to improve handling of MWEs in various NLP tasks for Irish, this thesis will address both the lack of resources specifically targeting MWEs in Irish, and examine how these resources can be applied to said NLP tasks. We report on the creation and analysis of a number of lexical resources as part of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract- ing MWEs from other lexical resources such as dictionaries. A corpus annotated with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based on combining linguistic analysis on these types of constructions and cross-lingual frameworks for defining MWEs. A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation systems. Finally, the topic of automatic identification of Irish MWEs is explored, documenting the training of a system capable of automatically identifying Irish MWEs from a variety of categories, and the challenges associated with developing such a system. This research contributes towards a greater understanding of Irish MWEs and their applications in NLP, and provides a foundation for future work in exploring other methods for the automatic discovery and identification of Irish MWEs, and further developing the MWE resources described above

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Biomedical Term Extraction: NLP Techniques in Computational Medicine

    Get PDF
    Artificial Intelligence (AI) and its branch Natural Language Processing (NLP) in particular are main contributors to recent advances in classifying documentation and extracting information from assorted fields, Medicine being one that has gathered a lot of attention due to the amount of information generated in public professional journals and other means of communication within the medical profession. The typical information extraction task from technical texts is performed via an automatic term recognition extractor. Automatic Term Recognition (ATR) from technical texts is applied for the identification of key concepts for information retrieval and, secondarily, for machine translation. Term recognition depends on the subject domain and the lexical patterns of a given language, in our case, Spanish, Arabic and Japanese. In this article, we present the methods and techniques for creating a biomedical corpus of validated terms, with several tools for optimal exploitation of the information therewith contained in said corpus. This paper also shows how these techniques and tools have been used in a prototype

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    Assisting Independent Seniors with Morning Care:how care workers and seniors negotiate physical cooperation through multimodal interaction

    Get PDF
    This dissertation reports two studies into the interactional conduct of care workers and seniors in institutional morning care. The first study focuses on natural data of the communication between care workers and seniors during care activities, investigated from a multimodal conversation-analytical perspective. Two phenomena were analyzed in detail: the distinctively shaped progression requests prompting the senior to perform a movement and the specific use of the particle nou during transitions between activities. The analyses revealed how care workers and seniors use these practices to organize and co-ordinate their interactional conduct. In the complex interweaving of verbal and physical actions during these transitions, a number of interactional multimodal patterns were identified; the orientation towards the senior's autonomy seems to be their underlying organizational principle. The second study is a discourse analysis of three policy documents about Dutch senior care – wherein attention to the senior’s autonomy is a key principle. The texts were analyzed with regard to the articulation of views on how care worker and senior are expected to interact and relate to each other during care activities. Remarkably, the policy documents hardly refer to the physical nature of care interactions and suggest that attention to the autonomy of the senior is only practiced in (thematic) conversational activities. This outcome indicates that knowledge about interpersonal communication in (senior) care is limited, in particular about the interactional course of care routines. The findings of the first interactional study provide a nuanced understanding of interventions during care activities, relevant to the education of care workers

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Russian verbal prefixation

    Get PDF
    This book addresses the complexity of Russian verbal prefixation system that has been extensively studied but yet not explained. Traditionally, different meanings have been investigated and listed in the dictionaries and grammars and more recently linguists attempted to unify various prefix usages under more general descriptions. The existent semantic approaches, however, do not aim to use semantic representations in order to account for the problems of prefix stacking and aspect determination. This task has been so far undertaken by syntactic approaches to prefixation, that divide verbal prefixes in classes and limit complex verb formation by restricting structural positions available for the members of each class. I show that these approaches have two major drawbacks: the implicit prediction of the non-existence of complex biaspectual verbs and the absence of uniformly accepted formal criteria for the underlying prefix classification. In this book the reader can find an implementable formal semantic approach to prefixation that covers five prefixes: za-, na-, po-, pere-, and do-. It is shown how to predict the existence, semantics, and aspect of a given complex verb with the help of the combination of an LTAG and frame semantics. The task of identifying the possible affix combinations is distributed between three modules: syntax, which is kept simple (only basic structural assumptions), frame semantics, which ensures that the constraints are respected, and pragmatics, which rules out some prefixed verbs and restricts the range of available interpretations. For the purpose of the evaluation of the theory, an implementation of the proposed analysis for a grammar fragment using a metagrammar description is provided. It is shown that the proposed analysis delivers more accurate and complete predictions with respect to the existence of complex verbs than the most precise syntactic account

    Calculs de représentations sémantiques et syntaxe générative : les grammaires minimalistes catégorielles

    Get PDF
    Les travaux de cette thèse se situent dans le cadre de la linguistique computationnelle. La problématique est de définir une interface syntaxe / sémantique basée sur les théories de la grammaire générative.Une première partie, concernant le problème de l'analyse syntaxique, présente tout d'abord, la syntaxe générative, puis un formalisme la réalisant: les grammaires minimalistes de Stabler. À partir de ces grammaires, nous réalisons une étude sur les propriétés de l'opération de fusion pour laquelle nous définissons des notions d'équivalence, ainsi qu'une modélisation abstraite des lexiques.Une seconde partie revient sur le problème de l'interface. Pour cela, nous proposons un formalisme de type logique, basé sur la logique mixte (possédant des connecteurs commutatifs et non-commutatifs), qui équivaut, sous certaines conditions, aux grammaires de Stabler. Dans ce but, nous introduisons une normalisation des preuves de cette logique, normalisation permettant de vérifier la propriété de la sous-formule. Ces propriétés sont également étendues au calcul de Lambek avec produit.À partir de l'isomorphisme de Curry-Howard, nous synchronisons un calcul sémantique avec les preuves réalisant l'analyse syntaxique. Les termes de notre calcul font appel aux propriétés du lambda mu-calcul, ainsi qu'à celles de la DRT (Discourse Representative Theory).Une dernière partie applique ces formalismes à des cas concrets. Nous établissons des fragments d'une grammaire du français autour du problème des clitiques
    corecore