12 research outputs found

    The Parallel Meaning Bank:A Framework for Semantically Annotating Multiple Languages

    Get PDF
    This paper gives a general description of the ideas behind the Parallel Meaning Bank, a framework with the aim to provide an easy way to annotate compositional semantics for texts written in languages other than English. The annotation procedure is semi-automatic, and comprises seven layers of linguistic information: segmentation, symbolisation, semantic tagging, word sense disambiguation, syntactic structure, thematic role labelling, and co-reference. New languages can be added to the meaning bank as long as the documents are based on translations from English, but also introduce new interesting challenges on the linguistics assumptions underlying the Parallel Meaning Bank.Comment: 13 pages, 5 figures, 1 tabl

    Discovering multiword expressions

    Get PDF
    In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods

    Competition, selection and communicative need in language change: an investigation using corpora, computational modelling and experimentation

    Get PDF
    Constant change is one of the few truly universal cross-linguistic properties of living languages. In this thesis I focus on lexical change, and ask why the introduction and spread of some words leads to competition and eventual extinction of words with similar functions, while in other cases semantically similar words are able to companionably co-exist for decades. I start out by using extensive computational simulations to evaluate a recently published method for differentiating selection and drift in language change. While I conclude this particular method still requires improvement to be reliably applicable to historical corpus data, my findings suggest that the approach in general, when properly evaluated, could have considerable future potential for better understanding the interplay of drift, selection and therefore competition in language change. In a series of corpus studies, I argue that the communicative needs of speakers play a significant role in how languages change, as they continue to be moulded to meet the needs of linguistic communities. I developed and evaluated computational methods for inferring a number of linguistic processes – changes in communicative need, competition between lexical items, and changes in colexification – directly from diachronic corpus data. Applying these new methods to massive historical corpora of multiple languages spanning several centuries, I show that communicative need modulates the outcome of competition between lexical items, and the colexification of concepts in semantic subspaces. I also conducted an experiment in the form of a dyadic artificial language communication game, the results of which demonstrate how speakers adapt their lexicons to the communicative needs of the situation. This combination of methods allows me to link actions of individual speakers at short timescales to population-level findings in large corpora at historical timescales, in order to show that language change is driven by communicative need

    Automatic Structured Text Summarization with Concept Maps

    Get PDF
    Efficiently exploring a collection of text documents in order to answer a complex question is a challenge that many people face. As abundant information on almost any topic is electronically available nowadays, supporting tools are needed to ensure that people can profit from the information's availability rather than suffer from the information overload. Structured summaries can help in this situation: They can be used to provide a concise overview of the contents of a document collection, they can reveal interesting relationships and they can be used as a navigation structure to further explore the documents. A concept map, which is a graph representing concepts and their relationships, is a specific form of a structured summary that offers these benefits. However, despite its appealing properties, only a limited amount of research has studied how concept maps can be automatically created to summarize documents. Automating that task is challenging and requires a variety of text processing techniques including information extraction, coreference resolution and summarization. The goal of this thesis is to better understand these challenges and to develop computational models that can address them. As a first contribution, this thesis lays the necessary ground for comparable research on computational models for concept map--based summarization. We propose a precise definition of the task together with suitable evaluation protocols and carry out experimental comparisons of previously proposed methods. As a result, we point out limitations of existing methods and gaps that have to be closed to successfully create summary concept maps. Towards that end, we also release a new benchmark corpus for the task that has been created with a novel, scalable crowdsourcing strategy. Furthermore, we propose new techniques for several subtasks of creating summary concept maps. First, we introduce the usage of predicate-argument analysis for the extraction of concept and relation mentions, which greatly simplifies the development of extraction methods. Second, we demonstrate that a predicate-argument analysis tool can be ported from English to German with low effort, indicating that the extraction technique can also be applied to other languages. We further propose to group concept mentions using pairwise classifications and set partitioning, which significantly improves the quality of the created summary concept maps. We show similar improvements for a new supervised importance estimation model and an optimal subgraph selection procedure. By combining these techniques in a pipeline, we establish a new state-of-the-art for the summarization task. Additionally, we study the use of neural networks to model the summarization problem as a single end-to-end task. While such approaches are not yet competitive with pipeline-based approaches, we report several experiments that illustrate the challenges - mostly related to training data - that currently limit the performance of this technique. We conclude the thesis by presenting a prototype system that demonstrates the use of automatically generated summary concept maps in practice and by pointing out promising directions for future research on the topic of this thesis

    Protocoles d'Ă©valuation pour l'extraction d'information libre

    Get PDF
    On voudrait apprendre Ă  "lire automatiquement". L'extraction d'information consiste Ă  transformer des paragraphes de texte Ă©crits en langue naturelle en une liste d'Ă©lĂ©ments d'information autosuffisants, de façon Ă  pouvoir comparer et colliger l'information extraite de plusieurs sources. Les Ă©lĂ©ments d'information sont ici reprĂ©sentĂ©s comme des relations entre entitĂ©s : (AthĂ©na ; est la fille de ; Zeus). L'extraction d'information libre (EIL) est un paradigme rĂ©cent, visant Ă  extraire un grand nombre de relations contenues dans le texte analysĂ©, dĂ©couvertes au fur et Ă  mesure, par opposition Ă  un nombre restreint de relations prĂ©dĂ©terminĂ©es comme il est plus courant. Cette thĂšse porte sur l'Ă©valuation des mĂ©thodes d'EIL. Dans les deux premiers chapitres, on Ă©value automatiquement les extractions d'un systĂšme d'EIL, en les comparant Ă  des rĂ©fĂ©rences Ă©crites Ă  la main, mettant respectivement l'accent sur l'informativitĂ© de l'extraction, puis sur son exhaustivitĂ©. Dans les deux chapitres suivants, on Ă©tudie et propose des alternatives Ă  la fonction de confiance, qui juge des productions d'un systĂšme. En particulier, on y analyse et remet en question les mĂ©thodologies suivant lesquelles cette fonction est Ă©valuĂ©e : d'abord comme modĂšle de validation de requĂȘtes, puis en comparaison du cadre bien Ă©tabli de la complĂ©tion de bases de connaissances.Information extraction consists in the processing of natural language documents into a list of self-sufficient informational elements, which allows for cross collection into Knowledge Bases, and automatic processing. The facts that result from this process are in the form of relationships between entities : (Athena ; is the daughter of ; Zeus). Open Information Extraction (OIE) is a recent paradigm the purpose of which is to extract an order of magnitude more relations from the input corpus than classical IE methods, what is achieved by encoding or learning more general patterns, in a less supervised fashion. In this thesis, I study and propose new evaluation protocols for the task of Open Information Extraction, with links to that of Knowledge Base Completion. In the first two chapters, I propose to automatically score the output of an OIE system, against a manually established reference, with particular attention paid to the informativity and exhaustivity of the extractions. I then turn my focus to the confidence function that qualifies all extracted elements, to evaluate it in a variety of settings, and propose alternative models

    AIUCD2018 - Book of Abstracts

    Get PDF
    Questo volume raccoglie gli abstract dei paper presentati al Settimo Convegno Annuale AIUCD 2018 (Bari, 31 gennaio – 2 febbraio 2018) dal titolo "Patrimoni culturali nell’era digitale. Memorie, culture umanistiche e tecnologia" (Cultural Heritage in the Digital Age. Memory, Humanities and Technologies). Gli abstract pubblicati in questo volume hanno ottenuto il parere favorevole da parte di valutatori esperti della materia, attraverso un processo di revisione anonima mediante double-blind peer review sotto la responsabilità del Comitato Scientifico di AIUCD. Il programma della conferenza AIUCD 2018 Ăš disponibile online all'indirizzo http://www.aiucd2018.uniba.it/
    corecore