Search CORE

18 research outputs found

Abstract Pronominal Anaphora in Three Registers of English

Author: O\u27Donnell Dominique H
Publication venue: PDXScholar
Publication date: 15/08/2019
Field of study

Identifying the expressions in a text that refer to the same entity, or coreference resolution, is an important problem in natural language processing. Abstract anaphora are distinct from other types of reference because they refer to abstract entities in discourse such as events, facts, and propositions, and their antecedents can have non-nominal phrase structure. Non-nominal antecedents are an interesting challenge in coreference resolution because the pronoun provides little information about the syntactic structure or semantics of the antecedent. A great deal of work in corpus annotation for coreference and coreference resolution has focused on newspaper text, and the goal of this study is to investigate how patterns in the use of abstract pronominal anaphora vary in three text types. I compiled a corpus of newswire text, spontaneous dialog and planned speech and annotated all instances of the pronouns ‘it’, this’, and ‘that’. I also annotated any non-nominal antecedents used with these pronouns. I compared frequencies of these pronouns, their referential functions, and characteristics of their non-nominal antecedents. I found variation in the frequencies of referential functions, the choice of pronoun and its referential function, the grammatical structure of non-nominal antecedents and the difficulty of the annotation task. The results indicate that the range of pronominal reference, pronominal anaphora and non-nominal antecedents in spoken discourse may not be retrievable from even very large collections of newswire texts

PDXScholar (Portland State University)

ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures

Author: Antoine Jean-Yves
Eshkol Iris
Lefeuvre Anaïs
Maurel Denis
Muzerelle Judith
Pelletier Aurore
Schang Emmanuel
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 26/05/2014
Field of study

International audienceThis article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Université de Tours

HAL-Rennes 1

Annotation de la temporalité en corpus : contribution à l'amélioration de la norme TimeML

Author: Abouda Lotfi
Antoine Jean-Yves
Eshkol Iris
Lefeuvre Anaïs
Maurel Denis
Savary Agata
Schang Emmanuel
Publication venue: HAL CCSD
Publication date: 01/07/2014
Field of study

National audienceThis paper reports a critical analysis of the TimeML standard, in the light of a temporal annotation that was conducted on spoken French. It shows that the norm suffers from weaknesses that must be corrected to fit the needs of NLP and corpus linguistics. These limitations concern mainly 1) the separation of different levels of linguistic annotation, 2) the delimitation in the text of the events, and 3) the absence of a bridging temporal relation in the norm.Cet article propose une analyse critique de la norme TimeML à la lumière de l'expérience d'annotation temporelle d'un corpus de français parlé. Il montre que certaines adaptations de la norme seraient conseillées pour répondre aux besoins du TAL et des sciences du langage. Sont étudiées ici les questions de séparation des niveaux d'annotation, de délimitation des éventualités dans le texte et de l'ajout d'une relation temporelle de type associative

HAL Université de Tours

Anaphora Annotation in Hindi Dependency TreeBank

Author: Dakwale Praveen
Himanshu Dipti M
Sharma Himanshu
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Annotation en relations anaphoriques d'un corpus de discours oral spontané en français

Author: Antoine Jean-Yves
Boyer-Pelletier Aurore
Eshkol Iris
Maurel Denis
Muzerelle Judith
Nouvel Damien
Schang Emmanuel
Publication venue: HAL CCSD
Publication date: 04/07/2013
Field of study

International audienceCet article présente une analyse des relations anaphoriques d'un corpus de dialogue oral spontané en français. Il exposera plus particulièrement l'étude pilote CO2, qui a conduit à une procédure d'annotation de corpus, puis deux expériences issues du corpus (accord en genre et en nombre, descriptions des définis en première mention), et enfin les travaux à venir du projet ANCOR. L'objectif de celui-ci est d'évaluer la pertinence et de modéliser les processus de résolution de ces anaphores complexes en discours spontan

HAL Université de Tours

Event versus entity co-reference: Effects of context and form of referring expression

Author: Bevacqua Luca
Hardmeier Christian
Loáiciga Sharid
Rohde Hannah
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Crossref

Edinburgh Research Explorer

Inter-Coder Agreement for Computational Linguistics

Author: Atkins Sue
Carletta Jean
Carletta Jean
Grosz Barbara J
Hearst Marti A
Krippendorff Klaus
Krippendorff Klaus
Marcus Mitchell P
Massimo Poesio
Passonneau Rebecca J
Poesio Massimo
Reinhart T.
Ron Artstein
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2008
Field of study

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

University of Essex Research Repository

CiteSeerX

Crossref

Understanding demonstrative reference in text: A new taxonomy based on a new corpus

Author: Krahmer E.
Maes A.
Peeters D.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2022
Field of study

Endophoric demonstratives such as this and that are among the most frequently used words in written texts. Nevertheless, it remains unclear how exactly they should be subdivided and classified in terms of their different types of use. Here, we develop a new taxonomy of endophoric demonstratives based on a large-scale corpus including three written genres: news items, encyclopedic texts, and book reviews. The taxonomy enables analysts to reliably code endophoric demonstratives based on objectively applicable criteria, while at the same time making them aware of many subtle borderline cases. We consider the taxonomy as a theoretical foundation for future theoretical and empirical work into endophoric demonstratives, and as an analytical tool allowing researchers to unify and compare the results of studies on endophoric demonstratives coming from different genres and languages

MPG.PuRe

Tilburg University Repository

Abstract pronominal anaphors and label nouns in German and English: Selected case studies and quantitative investigations

Author: Dipper Stefanie
Seiss Melanie
Zinsmeister Heike
Publication venue: Language Science Press
Publication date
Field of study

Abstract anaphors refer to abstract referents, such as facts or events. This paper presents a corpus-based comparative study of German and English abstract anaphors. Parallel bi-directional texts from the Europarl Corpus were annotated with functional and morpho-syntactic information, focusing on the pronouns ‘it’, ‘this’, and ‘that’, as well as demonstrative noun phrases headed by “label nouns”, such as ‘this event’, ‘that issue’, etc., and their German counterparts. We induce information about the cross-linguistic realization of abstract anaphors from the parallel texts. The contrastive findings are then controlled for translation-specific characteristics by examination of the differences between the original text and the translated text in each of the languages. In selected case studies, we investigate in detail “translation mismatches”, including changes in grammatical category (from pronouns to full noun phrases, and vice versa), grammatical function, or clausal position, addition or omission of modifying adjectives, changes in the lexical realization of head nouns, and transpositions of the demonstrative determiner. In some of these cases, the specificity of the abstract noun phrase is altered by the translation process

ZENODO

Demonstrative im Diskurs

Author: Weeber Frederike
Publication venue
Publication date: 01/01/2016
Field of study

Die Arbeit vergleicht das Diskursverhalten von deutschen D-Pronomen und dem Pronomen dieser. Anhand von Korpusdaten wird die These, dass nur D-Pronomen auf generische NPs referieren, aufgestellt und in einer Online-Studie geprüft

Kölner UniversitätsPublikationsServer