10,422 research outputs found

    A framework for annotating information structure in discourse

    Get PDF
    We present a framework for the integrated analysis of the textual and prosodic characteristics of information structure in the Switchboard corpus of conversational English. Information structure describes the availability, organisation and salience of entities in a discourse model. We present standards for the annotation of information status (old, mediated and new), and give guidelines for annotating information structure, i.e. theme/rheme and background/kontrast. We show that information structure in English can only be analysed concurrently with prosodic prominence and phrasing. Along with existing annotations which we have integrated using NXT technology, the corpus will be unique in the field of conversational speech in terms of size and richness of annotation, vital for many NLP applications

    Towards unification of discourse annotation frameworks

    Get PDF
    Funding: The author is funded by University of St Andrews-China Scholarship Council joint scholarship (NO.202008300012).Discourse information is difficult to represent and annotate. Among the major frameworks for annotating discourse information, RST, PDTB and SDRT are widely discussed and used, each having its own theoretical foundation and focus. Corpora annotated under different frameworks vary considerably. To make better use of the existing discourse corpora and achieve the possible synergy of different frameworks, it is worthwhile to investigate the systematic relations between different frameworks and devise methods of unifying the frameworks. Although the issue of framework unification has been a topic of discussion for a long time, there is currently no comprehensive approach which considers unifying both discourse structure and discourse relations and evaluates the unified framework intrinsically and extrinsically. We plan to use automatic means for the unification task and evaluate the result with structural complexity and downstream tasks. We will also explore the application of the unified framework in multi-task learning and graphical models.Publisher PD

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    Vagueness and referential ambiguity in a large-scale annotated corpus

    Get PDF
    In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions

    Introducing a corpus of conversational stories. Construction and annotation of the Narrative Corpus

    Get PDF
    Although widely seen as critical both in terms of its frequency and its social significance as a prime means of encoding and perpetuating moral stance and configuring self and identity, conversational narrative has received little attention in corpus linguistics. In this paper we describe the construction and annotation of a corpus that is intended to advance the linguistic theory of this fundamental mode of everyday social interaction: the Narrative Corpus (NC). The NC contains narratives extracted from the demographically-sampled sub-corpus of the British National Corpus (BNC) (XML version). It includes more than 500 narratives, socially balanced in terms of participant sex, age, and social class. We describe the extraction techniques, selection criteria, and sampling methods used in constructing the NC. Further, we describe four levels of annotation implemented in the corpus: speaker (social information on speakers), text (text Ids, title, type of story, type of embedding etc.), textual components (pre-/post-narrative talk, narrative, and narrative-initial/final utterances), and utterance (participation roles, quotatives and reporting modes). A brief rationale is given for each level of annotation, and possible avenues of research facilitated by the annotation are sketched out

    Clue: Cross-modal Coherence Modeling for Caption Generation

    Full text link
    We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning. Using an annotation protocol specifically devised for capturing image--caption coherence relations, we annotate 10,000 instances from publicly-available image--caption pairs. We introduce a new task for learning inferences in imagery and text, coherence relation prediction, and show that these coherence annotations can be exploited to learn relation classifiers as an intermediary step, and also train coherence-aware, controllable image captioning models. The results show a dramatic improvement in the consistency and quality of the generated captions with respect to information needs specified via coherence relations.Comment: Accepted as a long paper to ACL 202
    corecore