18 research outputs found
Abstract Pronominal Anaphora in Three Registers of English
Identifying the expressions in a text that refer to the same entity, or coreference resolution, is an important problem in natural language processing. Abstract anaphora are distinct from other types of reference because they refer to abstract entities in discourse such as events, facts, and propositions, and their antecedents can have non-nominal phrase structure. Non-nominal antecedents are an interesting challenge in coreference resolution because the pronoun provides little information about the syntactic structure or semantics of the antecedent. A great deal of work in corpus annotation for coreference and coreference resolution has focused on newspaper text, and the goal of this study is to investigate how patterns in the use of abstract pronominal anaphora vary in three text types. I compiled a corpus of newswire text, spontaneous dialog and planned speech and annotated all instances of the pronouns âitâ, thisâ, and âthatâ. I also annotated any non-nominal antecedents used with these pronouns. I compared frequencies of these pronouns, their referential functions, and characteristics of their non-nominal antecedents. I found variation in the frequencies of referential functions, the choice of pronoun and its referential function, the grammatical structure of non-nominal antecedents and the difficulty of the annotation task. The results indicate that the range of pronominal reference, pronominal anaphora and non-nominal antecedents in spoken discourse may not be retrievable from even very large collections of newswire texts
ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures
International audienceThis article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource
Annotation de la temporalité en corpus : contribution à l'amélioration de la norme TimeML
National audienceThis paper reports a critical analysis of the TimeML standard, in the light of a temporal annotation that was conducted on spoken French. It shows that the norm suffers from weaknesses that must be corrected to fit the needs of NLP and corpus linguistics. These limitations concern mainly 1) the separation of different levels of linguistic annotation, 2) the delimitation in the text of the events, and 3) the absence of a bridging temporal relation in the norm.Cet article propose une analyse critique de la norme TimeML à la lumiÚre de l'expérience d'annotation temporelle d'un corpus de français parlé. Il montre que certaines adaptations de la norme seraient conseillées pour répondre aux besoins du TAL et des sciences du langage. Sont étudiées ici les questions de séparation des niveaux d'annotation, de délimitation des éventualités dans le texte et de l'ajout d'une relation temporelle de type associative
Annotation en relations anaphoriques d'un corpus de discours oral spontané en français
International audienceCet article présente une analyse des relations anaphoriques d'un corpus de dialogue oral spontané en français. Il exposera plus particuliÚrement l'étude pilote CO2, qui a conduit à une procédure d'annotation de corpus, puis deux expériences issues du corpus (accord en genre et en nombre, descriptions des définis en premiÚre mention), et enfin les travaux à venir du projet ANCOR. L'objectif de celui-ci est d'évaluer la pertinence et de modéliser les processus de résolution de ces anaphores complexes en discours spontan
Inter-Coder Agreement for Computational Linguistics
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasksâbut that their use makes the interpretation of the value of the coefficient even harder. </jats:p
Understanding demonstrative reference in text: A new taxonomy based on a new corpus
Endophoric demonstratives such as this and that are among the most frequently used words in written texts. Nevertheless, it remains unclear how exactly they should be subdivided and classified in terms of their different types of use. Here, we develop a new taxonomy of endophoric demonstratives based on a large-scale corpus including three written genres: news items, encyclopedic texts, and book reviews. The taxonomy enables analysts to reliably code endophoric demonstratives based on objectively applicable criteria, while at the same time making them aware of many subtle borderline cases. We consider the taxonomy as a theoretical foundation for future theoretical and empirical work into endophoric demonstratives, and as an analytical tool allowing researchers to unify and compare the results of studies on endophoric demonstratives coming from different genres and languages
Abstract pronominal anaphors and label nouns in German and English: Selected case studies and quantitative investigations
Abstract anaphors refer to abstract referents, such as facts or events. This paper presents a corpus-based comparative study of German and English abstract
anaphors. Parallel bi-directional texts from the Europarl Corpus were annotated
with functional and morpho-syntactic information, focusing on the pronouns âitâ,
âthisâ, and âthatâ, as well as demonstrative noun phrases headed by âlabel nounsâ,
such as âthis eventâ, âthat issueâ, etc., and their German counterparts. We induce
information about the cross-linguistic realization of abstract anaphors from the
parallel texts. The contrastive findings are then controlled for translation-specific
characteristics by examination of the differences between the original text and the
translated text in each of the languages. In selected case studies, we investigate in
detail âtranslation mismatchesâ, including changes in grammatical category (from
pronouns to full noun phrases, and vice versa), grammatical function, or clausal
position, addition or omission of modifying adjectives, changes in the lexical realization of head nouns, and transpositions of the demonstrative determiner. In
some of these cases, the specificity of the abstract noun phrase is altered by the
translation process
Demonstrative im Diskurs
Die Arbeit vergleicht das Diskursverhalten von deutschen D-Pronomen und dem Pronomen dieser. Anhand von Korpusdaten wird die These, dass nur D-Pronomen auf generische NPs referieren, aufgestellt und in einer Online-Studie geprĂŒft