1,103 research outputs found

    Vagueness and referential ambiguity in a large-scale annotated corpus

    Get PDF
    In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions

    Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR

    Get PDF
    International audienceWe present CROC (Coreference Resolution for Oral Corpus), the first machine learning system for coreference resolution in French. One specific aspect of the system is that it has been trained on data that come exclusively from transcribed speech, namely ANCOR (ANaphora and Coreference in ORal corpus), the first large-scale French corpus with anaphorical relation annotations. In its current state, the CROC system requires pre-annotated mentions. We detail the features used for the learning algorithms, and we present a set of experiments with these features. The scores we obtain are close to those of state-of-the-art systems for written English

    "A Framework for Descriptive Grammars"

    Get PDF

    Demonstratives in discourse

    Get PDF
    This volume explores the use of demonstratives in the structuring and management of discourse, and their role as engagement expressions, from a crosslinguistic perspective. It seeks to establish which types of discourse-related functions are commonly encoded by demonstratives, beyond the well-established reference-tracking and deictic uses, and also investigates which members of demonstrative paradigms typically take on certain functions. Moreover, it looks at the roles of non-deictic demonstratives, that is, members of the paradigm which are dedicated e.g. to contrastive, recognitional, or anaphoric functions and do not express deictic distinctions. Several of the studies also focus on manner demonstratives, which have been little studied from a crosslinguistic perspective. The volume thus broadens the scope of investigation of demonstratives to look at how their core functions interact with a wider range of discourse functions in a number of different languages. The volume covers languages from a range of geographical locations and language families, including Cushitic and Mande languages in Africa, Oceanic and Papuan languages in the Pacific region, Algonquian and Guaykuruan in the Americas, and Germanic, Slavic and Finno-Ugric languages in the Eurasian region. It also includes two papers taking a broader typological approach to specific discourse functions of demonstratives

    Optimization issues in machine learning of coreference resolution

    Get PDF

    Incremental Coreference Resolution for German

    Full text link
    The main contributions of this thesis are as follows: 1. We introduce a general model for coreference and explore its application to German. ‱ The model features an incremental discourse processing algorithm which allows it to coherently address issues caused by underspecification of mentions, which is an especially pressing problem regarding certain German pronouns. ‱ We introduce novel features relevant for the resolution of German pronouns. A subset of these features are made accessible through the incremental architecture of the discourse processing model. ‱ In evaluation, we show that the coreference model combined with our features provides new state-of-the-art results for coreference and pronoun resolution for German. 2. We elaborate on the evaluation of coreference and pronoun resolution. ‱ We discuss evaluation from the view of prospective downstream applications that benefit from coreference resolution as a preprocessing component. Addressing the shortcomings of the general evaluation framework in this regard, we introduce an alternative framework, the Application Related Coreference Scores (ARCS). ‱ The ARCS framework enables a thorough comparison of different system outputs and the quantification of their similarities and differences beyond the common coreference evaluation. We demonstrate how the framework is applied to state-of-the-art coreference systems. This provides a method to track specific differences in system outputs, which assists researchers in comparing their approaches to related work in detail. 3. We explore semantics for pronoun resolution. ‱ Within the introduced coreference model, we explore distributional approaches to estimate the compatibility of an antecedent candidate and the occurrence context of a pronoun. We compare a state-of-the-art approach for word embeddings to syntactic co-occurrence profiles to this end. ‱ In comparison to related work, we extend the notion of context and thereby increase the applicability of our approach. We find that a combination of both compatibility models, coupled with the coreference model, provides a large potential for improving pronoun resolution performance. We make available all our resources, including a web demo of the system, at: http://pub.cl.uzh.ch/purl/coreference-resolutio
    • 

    corecore