Search CORE

26 research outputs found

PoCoS – Potsdam Coreference Scheme

Author: Chiarcos Christian
Krasavina Olga
Publication venue
Publication date: 22/05/2023
Field of study

Inter-Coder Agreement for Computational Linguistics

Author: Atkins Sue
Carletta Jean
Carletta Jean
Grosz Barbara J
Hearst Marti A
Krippendorff Klaus
Krippendorff Klaus
Marcus Mitchell P
Massimo Poesio
Passonneau Rebecca J
Poesio Massimo
Reinhart T.
Ron Artstein
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2008
Field of study

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

University of Essex Research Repository

CiteSeerX

Crossref

Topic-Continuity and Topic-Shift Effects in Spanish Discourse: A Comparative Analysis of Referring Expressions

Author: Zulaica-Hernández Iker
Publication venue: 'Brill'
Publication date: 01/01/2016
Field of study

Differences in use among referring expressions are usually explained on the basis of the cognitive accessibility of their antecedents, where antecedent accessibility has been operationalized differently in the literature; i.e. as a grammatical role, as syntactic prominence or as antecedent distance. On these grounds, it has been proposed that personal pronouns prefer topical antecedents whereas demonstratives prefer non-topical antecedents. This paper investigates the referring properties of Spanish demonstratives and direct object personal pronouns with the aim to unveil their differences and similarities. My analysis shows that these two expressions are very similar referentially when a narrow view of discourse context is considered. However, important differences show up when a broader notion of context is thrown into the picture; i.e. contexts that extend beyond the immediate previous sentence and beyond the immediate local topic of discourse. Based on my corpus evidence and on previous research on the pragmatic interpretation of referring expressions, I claim that direct object personal pronouns and demonstrative noun phrases crucially differ in the way they contribute to discourse coherence; the former playing the role of topic continuity markers and the latter focalising referents that reintroduce suspended or declining topics and marking (sub)-topic shifts in the discourse

IUPUIScholarWorks

Towards interoperable discourse annotation: discourse features in the Ontologies of Linguistic Annotation

Author: Chiarcos Christian
Publication venue
Publication date: 03/05/2023
Field of study

This paper describes the extension of the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in the most important corpora available to the community, including OntoNotes, the RST Discourse Treebank and the Penn Discourse Treebank. Along with selected schemes for information structure and coreference, discourse relations are discussed with special emphasis on the Penn Discourse Treebank and the RST Discourse Treebank. For an example contained in the intersection of both corpora, I show how ontologies can be employed to generalize over divergent annotation schemes

OPUS Augsburg

Review of coreference resolution in English and Persian

Author: Aznaveh Ahmad Mahmoudi
Mohammadi Hassan Haji
Talebpour Alireza
Yazdani Samaneh
Publication venue
Publication date: 08/11/2022
Field of study

Coreference resolution (CR) is one of the most challenging areas of natural language processing. This task seeks to identify all textual references to the same real-world entity. Research in this field is divided into coreference resolution and anaphora resolution. Due to its application in textual comprehension and its utility in other tasks such as information extraction systems, document summarization, and machine translation, this field has attracted considerable interest. Consequently, it has a significant effect on the quality of these systems. This article reviews the existing corpora and evaluation metrics in this field. Then, an overview of the coreference algorithms, from rule-based methods to the latest deep learning techniques, is provided. Finally, coreference resolution and pronoun resolution systems in Persian are investigated.Comment: 44 pages, 11 figures, 5 table

arXiv.org e-Print Archive

Iarg-AnCora: Spanish corpus annotated with implicit arguments

Author: Peris Morant Aina
Rodríguez Hontoria Horacio
Taulé Delor Mariona
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2020
Field of study

This article presents the Spanish Iarg-AnCora corpus (400 k-words, 13,883 sentences) annotated with the implicit arguments of deverbal nominalizations (18,397 occurrences). We describe the methodology used to create it, focusing on the annotation scheme and criteria adopted. The corpus was manually annotated and an interannotator agreement test was conducted (81 % observed agreement) in order to ensure the reliability of the final resource. The annotation of implicit arguments results in an important gain in argument and thematic role coverage (128 % on average). It is the first corpus annotated with implicit arguments for the Spanish language with a wide coverage that is freely available. This corpus can subsequently be used by machine learning-based semantic role labeling systems, and for the linguistic analysis of implicit arguments grounded on real data. Semantic analyzers are essential components of current language technology applications, which need to obtain a deeper understanding of the text in order to make inferences at the highest level to obtain qualitative improvements in the results

Diposit Digital de la Universitat de Barcelona

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

A Corpus-based Evaluation of Centering Theory

Author: Cheng H
di Eugenio B
Hitzeman J
Poesio M
Stevenson R
Publication venue: CSM-369
Publication date: 01/04/2002
Field of study

University of Essex Research Repository

Anaphora resolution for Arabic machine translation :a case study of nafs

Author: Hamouda Wafya
Publication venue: Newcastle Univeristy
Publication date: 01/01/2014
Field of study

PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

Newcastle University eTheses

Harnessing Collective Intelligence on Social Networks

Author: Chamberlain J
Publication venue: University of Essex
Publication date: 01/01/2015
Field of study

Crowdsourcing is an approach to replace the work traditionally done by a single person with the collective action of a group of people via the Internet. It has established itself in the mainstream of research methodology in recent years using a variety of approaches to engage humans in solving problems that computers, as yet, cannot solve. Several common approaches to crowdsourcing have been successful, including peer production (in which the participants are inherently interested in contributing), microworking (in which participants are paid small amounts of money per task) and games or gamification (in which the participants are entertained as they complete the tasks). An alternative approach to crowdsourcing using social networks is proposed here. Social networks offer access to large user communities through integrated software applications and, as they mature, are utilised in different ways, with decentralised and unevenly-distributed organisation of content. This research investigates whether collective intelligence systems are facilitated better on social networks and how the contributed human effort can be optimised. These questions are investigated using two case studies of problem solving: anaphoric coreference in text documents and classifying images in the marine biology domain. Social networks themselves can be considered inherent, self-organised problem solving systems, an approach defined here as ?groupsourcing?, sharing common features with other crowdsourcing approaches; however, the benefits are tempered with the many challenges this approach presents. In comparison to other methods of crowdsourcing, harnessing collective intelligence on social networks offers a high-accuracy, data-driven and low-cost approach

University of Essex Research Repository