18 research outputs found

    Abstract Pronominal Anaphora in Three Registers of English

    Get PDF
    Identifying the expressions in a text that refer to the same entity, or coreference resolution, is an important problem in natural language processing. Abstract anaphora are distinct from other types of reference because they refer to abstract entities in discourse such as events, facts, and propositions, and their antecedents can have non-nominal phrase structure. Non-nominal antecedents are an interesting challenge in coreference resolution because the pronoun provides little information about the syntactic structure or semantics of the antecedent. A great deal of work in corpus annotation for coreference and coreference resolution has focused on newspaper text, and the goal of this study is to investigate how patterns in the use of abstract pronominal anaphora vary in three text types. I compiled a corpus of newswire text, spontaneous dialog and planned speech and annotated all instances of the pronouns ‘it’, this’, and ‘that’. I also annotated any non-nominal antecedents used with these pronouns. I compared frequencies of these pronouns, their referential functions, and characteristics of their non-nominal antecedents. I found variation in the frequencies of referential functions, the choice of pronoun and its referential function, the grammatical structure of non-nominal antecedents and the difficulty of the annotation task. The results indicate that the range of pronominal reference, pronominal anaphora and non-nominal antecedents in spoken discourse may not be retrievable from even very large collections of newswire texts

    ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures

    Get PDF
    International audienceThis article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource

    Annotation de la temporalité en corpus : contribution à l'amélioration de la norme TimeML

    Get PDF
    National audienceThis paper reports a critical analysis of the TimeML standard, in the light of a temporal annotation that was conducted on spoken French. It shows that the norm suffers from weaknesses that must be corrected to fit the needs of NLP and corpus linguistics. These limitations concern mainly 1) the separation of different levels of linguistic annotation, 2) the delimitation in the text of the events, and 3) the absence of a bridging temporal relation in the norm.Cet article propose une analyse critique de la norme TimeML à la lumiÚre de l'expérience d'annotation temporelle d'un corpus de français parlé. Il montre que certaines adaptations de la norme seraient conseillées pour répondre aux besoins du TAL et des sciences du langage. Sont étudiées ici les questions de séparation des niveaux d'annotation, de délimitation des éventualités dans le texte et de l'ajout d'une relation temporelle de type associative

    Anaphora Annotation in Hindi Dependency TreeBank

    Get PDF

    Annotation en relations anaphoriques d'un corpus de discours oral spontané en français

    Get PDF
    International audienceCet article présente une analyse des relations anaphoriques d'un corpus de dialogue oral spontané en français. Il exposera plus particuliÚrement l'étude pilote CO2, qui a conduit à une procédure d'annotation de corpus, puis deux expériences issues du corpus (accord en genre et en nombre, descriptions des définis en premiÚre mention), et enfin les travaux à venir du projet ANCOR. L'objectif de celui-ci est d'évaluer la pertinence et de modéliser les processus de résolution de ces anaphores complexes en discours spontan

    Inter-Coder Agreement for Computational Linguistics

    Get PDF
    This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

    Understanding demonstrative reference in text: A new taxonomy based on a new corpus

    Get PDF
    Endophoric demonstratives such as this and that are among the most frequently used words in written texts. Nevertheless, it remains unclear how exactly they should be subdivided and classified in terms of their different types of use. Here, we develop a new taxonomy of endophoric demonstratives based on a large-scale corpus including three written genres: news items, encyclopedic texts, and book reviews. The taxonomy enables analysts to reliably code endophoric demonstratives based on objectively applicable criteria, while at the same time making them aware of many subtle borderline cases. We consider the taxonomy as a theoretical foundation for future theoretical and empirical work into endophoric demonstratives, and as an analytical tool allowing researchers to unify and compare the results of studies on endophoric demonstratives coming from different genres and languages

    Abstract pronominal anaphors and label nouns in German and English: Selected case studies and quantitative investigations

    Get PDF
    Abstract anaphors refer to abstract referents, such as facts or events. This paper presents a corpus-based comparative study of German and English abstract anaphors. Parallel bi-directional texts from the Europarl Corpus were annotated with functional and morpho-syntactic information, focusing on the pronouns ‘it’, ‘this’, and ‘that’, as well as demonstrative noun phrases headed by “label nouns”, such as ‘this event’, ‘that issue’, etc., and their German counterparts. We induce information about the cross-linguistic realization of abstract anaphors from the parallel texts. The contrastive findings are then controlled for translation-specific characteristics by examination of the differences between the original text and the translated text in each of the languages. In selected case studies, we investigate in detail “translation mismatches”, including changes in grammatical category (from pronouns to full noun phrases, and vice versa), grammatical function, or clausal position, addition or omission of modifying adjectives, changes in the lexical realization of head nouns, and transpositions of the demonstrative determiner. In some of these cases, the specificity of the abstract noun phrase is altered by the translation process

    Demonstrative im Diskurs

    Get PDF
    Die Arbeit vergleicht das Diskursverhalten von deutschen D-Pronomen und dem Pronomen dieser. Anhand von Korpusdaten wird die These, dass nur D-Pronomen auf generische NPs referieren, aufgestellt und in einer Online-Studie geprĂŒft
    corecore