1,103 research outputs found
Vagueness and referential ambiguity in a large-scale annotated corpus
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevensonâs Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions
Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR
International audienceWe present CROC (Coreference Resolution for Oral Corpus), the first machine learning system for coreference resolution in French. One specific aspect of the system is that it has been trained on data that come exclusively from transcribed speech, namely ANCOR (ANaphora and Coreference in ORal corpus), the first large-scale French corpus with anaphorical relation annotations. In its current state, the CROC system requires pre-annotated mentions. We detail the features used for the learning algorithms, and we present a set of experiments with these features. The scores we obtain are close to those of state-of-the-art systems for written English
Definite objects in the wild: A converging evidence approach to scrambling in the Dutch middle-field
Recommended from our members
Left edge topics in Russian and the processing of anaphoric dependencies
This paper investigates the cost of processing syntactic vs. extra-syntactic dependencies. The results support the hypothesis that syntactic dependencies
require less processing effort than discourse-derived dependencies do (Koornneef 2008, Reuland 2001, 2011). The point is made through the analysis of a novel
paradigm in Russian in which a preposed nominal stranding a numeral can show number connectivity (PAUCAL) with a gap following the numeral or can appear in
a non-agreeing (PLURAL) form:
(1) cathedral-PAUCAL/PLURAL, there were three.PAUCAL__
Numerous syntactic diagnostics confirm that when there is number connectivity, the nominal has been fronted via A'-movement, creating a syntactic A'-chain dependency. In the absence of connectivity, the construction involves a hanging topic related via discourse mechanisms to a base-generated null pronoun. The constructions constitute a minimal pair and Reulandâs proposals correctly predict that the A'-movement construction will require less processing effort compared to the hanging topic construction. A self-paced reading study for contrasting pairs as in (1) showed a statistically significant slow down after the gap with the hanging topic as opposed to the moved nominal. We take this to support the claim that a syntactic A'-chain is more easily processed than an anaphoric dependency involving a null pronoun, which must be resolved by discourse-based mechanisms.Linguistic
Demonstratives in discourse
This volume explores the use of demonstratives in the structuring and management of discourse, and their role as engagement expressions, from a crosslinguistic perspective. It seeks to establish which types of discourse-related functions are commonly encoded by demonstratives, beyond the well-established reference-tracking and deictic uses, and also investigates which members of demonstrative paradigms typically take on certain functions. Moreover, it looks at the roles of non-deictic demonstratives, that is, members of the paradigm which are dedicated e.g. to contrastive, recognitional, or anaphoric functions and do not express deictic distinctions. Several of the studies also focus on manner demonstratives, which have been little studied from a crosslinguistic perspective. The volume thus broadens the scope of investigation of demonstratives to look at how their core functions interact with a wider range of discourse functions in a number of different languages. The volume covers languages from a range of geographical locations and language families, including Cushitic and Mande languages in Africa, Oceanic and Papuan languages in the Pacific region, Algonquian and Guaykuruan in the Americas, and Germanic, Slavic and Finno-Ugric languages in the Eurasian region. It also includes two papers taking a broader typological approach to specific discourse functions of demonstratives
Incremental Coreference Resolution for German
The main contributions of this thesis are as follows:
1. We introduce a general model for coreference and explore its application to German.
âą The model features an incremental discourse processing algorithm which allows it to coherently address issues caused by underspecification of mentions, which is an especially pressing problem regarding certain German pronouns.
âą We introduce novel features relevant for the resolution of German pronouns. A subset of these features are made accessible through the incremental architecture of the discourse processing model.
âą In evaluation, we show that the coreference model combined with our features provides new state-of-the-art results for coreference and pronoun resolution for German.
2. We elaborate on the evaluation of coreference and pronoun resolution.
âą We discuss evaluation from the view of prospective downstream applications that benefit from coreference resolution as a preprocessing component. Addressing the shortcomings of the general evaluation framework in this regard, we introduce an alternative framework, the Application Related Coreference Scores (ARCS).
âą The ARCS framework enables a thorough comparison of different system outputs and the quantification of their similarities and differences beyond the common coreference evaluation. We demonstrate how the framework is applied to state-of-the-art coreference systems. This provides a method to track specific differences in system outputs, which assists researchers in comparing their approaches to related work in detail.
3. We explore semantics for pronoun resolution.
âą Within the introduced coreference model, we explore distributional approaches to estimate the compatibility of an antecedent candidate and the occurrence context of a pronoun. We compare a state-of-the-art approach for word embeddings to syntactic co-occurrence profiles to this end.
âą In comparison to related work, we extend the notion of context and thereby increase the applicability of our approach. We find that a combination of both compatibility models, coupled with the coreference model, provides a large potential for improving pronoun resolution performance.
We make available all our resources, including a web demo of the system, at: http://pub.cl.uzh.ch/purl/coreference-resolutio
- âŠ